TikTok Watch Time and Completion Rate via Public API

By TikLiveAPI Team · Published on May 29, 2026

Watch time and completion rate are the two metrics every TikTok strategist names when asked what the algorithm actually rewards. They are also the two metrics you cannot read from the public API. Not from post-detail, not from user-posts, not from any third-party scraper. Average watch duration and completion percentage live on TikTok's internal analytics surface and inside the Creator Studio CSV export. Everything outside that wall is a proxy.

This post is for data engineers and performance marketers who already accepted that limitation and want to know what to do anyway. We will look at exactly which fields the public API exposes, why play count alone is misleading, how to derive an engagement velocity proxy that correlates with watch behavior, why collect_count is the most underrated field in the response, and how to bridge the gap by joining scraped competitor data with your own Creator Studio CSV exports.

What the public API actually returns

Before we model anything, set expectations on the raw material. The /post-detail/ endpoint returns a flat object with these counters at the top level:

play_count - total views
digg_count - likes
comment_count - comments
share_count - shares to other platforms
download_count - saves to device
collect_count - bookmarks inside TikTok
duration - video length in seconds
create_time - Unix timestamp

You also get the playback URLs (play, wmplay, hdplay), music_info, and a minimal author block. The /user-posts/ endpoint wraps the same counters per video inside a videos[] array along with cursor and hasMore. That is the entire surface you have to work with. There is no avg_watch_time, no completion_rate, no fyp_impressions, no retention_curve. If a vendor claims otherwise, they are either reselling Creator Studio scrapes that require a logged-in cookie, or they are making numbers up.

Why play count alone lies to you

A 15-second video with 1M views and a 60-second video with 1M views are not comparable. TikTok counts a view at roughly the first frame render, which means a 60-second post collecting 1M views could have a median watch time of 6 seconds and still report the same headline number as a 15-second post that was watched to completion. Play count without duration normalization is vanity. The first job is to bucket every video by duration before comparing anything.

The second job is to stop comparing across creators of different sizes without normalization. A 50k-view post on a 200k-follower account is a different beast than a 50k-view post on a 5M-follower account, even at identical duration. Bucket first, then normalize within the creator's recent baseline. We will cover the baseline trick a few sections down, but if you skip both steps you will keep ranking the wrong videos at the top of your dashboard.

Creator Marketing API vs scraping: a candid comparison

TikTok runs an official Creator Marketing API and a Research API. Both can return watch-related metrics for specific surfaces, and both come with limits that matter:

Creator Marketing API - returns average watch time and full video views, but only for creators who have explicitly authorized your app through OAuth. You cannot point it at a competitor's profile.
Research API - academic access only, US/EU institutions, application review, no commercial use, queries capped per day.
Public scraping APIs like TikLiveAPI - work on any public profile or video, no OAuth, no application process, but limited to fields TikTok renders to anonymous viewers: counts, durations, metadata. No private analytics.

The honest framing is that these are complementary, not substitutes. You use OAuth for your own owned accounts, and you use the public API for the rest of the universe - competitor benchmarking, trend discovery, influencer screening, hashtag research. Trying to use one tool for both problems is where teams burn budget.

One more nuance for performance marketers: the Spark Ads dashboard shows retention curves for posts you have boosted, but only for the boosted period and only for the impressions paid through the ad platform. Organic completion data is still walled off. If you are running an influencer campaign and the creator will not share Studio screenshots, your only options are the proxy approach below or asking them to authorize the Creator Marketing API for the duration of the deal.

Pulling a clean post-level dataset

Here is the minimum viable extraction. We pull the last N posts for a username, bucket by duration, and compute the proxy metrics we will rely on. The endpoint uses the X-Api-Key header and the user-posts route from the users section of the documentation.

import os, time, requests, pandas as pd

API_KEY = os.environ["TIKLIVEAPI_KEY"]
BASE = "https://api.tikliveapi.com"
HEAD = {"X-Api-Key": API_KEY}

def get_uid(username):
    r = requests.get(f"{BASE}/userid/",
                     params={"username": username},
                     headers=HEAD, timeout=20)
    r.raise_for_status()
    return r.json()["id"]

def fetch_posts(username, max_posts=200):
    uid = get_uid(username)
    rows, cursor, fetched = [], "0", 0
    while fetched < max_posts:
        r = requests.get(f"{BASE}/user-posts/",
            params={"userid": uid, "count": 35, "cursor": cursor},
            headers=HEAD, timeout=30)
        r.raise_for_status()
        data = r.json()
        for v in data.get("videos", []):
            rows.append({
                "aweme_id": v["aweme_id"],
                "create_time": v["create_time"],
                "duration": v["duration"],
                "play_count": v["play_count"],
                "digg_count": v["digg_count"],
                "comment_count": v["comment_count"],
                "share_count": v["share_count"],
                "download_count": v.get("download_count", 0),
                "collect_count": v.get("collect_count", 0),
                "title": v.get("title", ""),
            })
        fetched += len(data.get("videos", []))
        if not data.get("hasMore"):
            break
        cursor = str(data.get("cursor"))
        time.sleep(0.35)
    return pd.DataFrame(rows)

Two field-level details to double check against the truth map. The user-posts response mixes casing: hasMore is camelCase at the top level while per-video fields are snake_case. And pagination here uses cursor, not time. The followers and following endpoints behave differently - they paginate with a time Unix timestamp and the following endpoint keys the array as followings (plural). Mixing those up is the most common bug we see in customer support tickets.

Engagement velocity by duration bucket

Once you have a few hundred rows, the first useful proxy is engagement velocity, normalized by duration bucket. The idea is simple: the algorithm rewards videos where the active engagement events (likes, comments, shares) happen at a rate that suggests the audience watched long enough to react. A 7-second loop and a 55-second monologue do not generate engagement at the same rate even when both are doing well, so we bucket first, then rank within bucket.

def bucket_duration(sec):
    if sec <= 10:   return "0-10s"
    if sec <= 20:   return "11-20s"
    if sec <= 35:   return "21-35s"
    if sec <= 60:   return "36-60s"
    return "60s+"

def add_proxies(df):
    df = df.copy()
    df["bucket"] = df["duration"].apply(bucket_duration)
    plays = df["play_count"].clip(lower=1)
    df["engagement_velocity"] = (
        df["digg_count"] + df["comment_count"] + df["share_count"]
    ) / plays
    df["save_rate"] = df["collect_count"] / plays
    df["share_rate"] = df["share_count"] / plays
    df["comment_rate"] = df["comment_count"] / plays
    return df

def bucket_benchmarks(df):
    return df.groupby("bucket").agg(
        n=("aweme_id", "count"),
        median_velocity=("engagement_velocity", "median"),
        median_save=("save_rate", "median"),
        median_share=("share_rate", "median"),
        p90_velocity=("engagement_velocity",
                      lambda x: x.quantile(0.9)),
    ).round(5)

What you are looking for in bucket_benchmarks is the p90 line per bucket. Anything above the 90th percentile inside its own bucket is, in our experience working with creator analytics customers, a strong candidate for an above-average watch-through rate. It is not proof, but it is a signal worth investigating before you spend Spark Ads budget boosting it. For the deeper normalization math, see our guide to engagement rate beyond likes over followers.

Why collect_count is the field nobody talks about

Likes are cheap. Shares are public. Comments are loud. Saves are quiet, and that is exactly why they are the closest available proxy for "felt watchable". A user only taps the bookmark when they want to come back, which strongly implies they watched enough to know they want to. collect_count in the API maps to that bookmark action. In our customer datasets, save rate correlates with self-reported completion rate from Creator Studio more reliably than any other public field, including share rate. If you build only one ranking signal beyond engagement velocity, build collect_count / play_count by bucket. We unpack that asymmetry further in what shares and saves actually predict.

Bridging the gap with Creator Studio CSV

For your own owned accounts, Creator Studio (now TikTok Studio) lets you export a CSV with average watch time per video. This is the missing variable. The bridge pattern is to join that CSV against the scraped public dataset and fit a per-bucket model so you can score competitor posts using the same coefficients.

import numpy as np
from sklearn.linear_model import LinearRegression

# studio.csv columns assumed:
#   aweme_id, avg_watch_time_s, full_video_views, total_views
studio = pd.read_csv("studio.csv")
own = add_proxies(fetch_posts("your_handle", 500))
joined = own.merge(studio, on="aweme_id", how="inner")
joined["completion_rate"] = (
    joined["avg_watch_time_s"] / joined["duration"].clip(lower=1)
).clip(0, 1)

models = {}
for bucket, g in joined.groupby("bucket"):
    if len(g) < 12:
        continue
    X = g[["engagement_velocity", "save_rate",
           "share_rate", "comment_rate"]].values
    y = g["completion_rate"].values
    m = LinearRegression().fit(X, y)
    models[bucket] = m
    print(bucket, "R^2=", round(m.score(X, y), 3),
          "coef=", np.round(m.coef_, 3))

Two warnings. First, the model only generalizes to creators in roughly the same niche, format, and audience size as you. A fashion creator's coefficients will not score a finance creator's videos correctly. Second, R-squared above 0.45 per bucket is realistic; if you see 0.85 you have probably overfit a small sample. Re-fit monthly and never trust a bucket with fewer than 20 joined rows.

Benchmarking against competitors

With the per-bucket model in hand, scoring a competitor is a matter of pulling their posts through the same pipeline. Use the user-posts endpoint with their numeric user id (resolve via /userid/ first), feed the resulting frame through add_proxies, then predict per bucket:

def score_competitor(handle, models):
    df = add_proxies(fetch_posts(handle, 300))
    df["predicted_completion"] = np.nan
    for bucket, m in models.items():
        mask = df["bucket"] == bucket
        if mask.sum() == 0:
            continue
        X = df.loc[mask, ["engagement_velocity", "save_rate",
                          "share_rate", "comment_rate"]].values
        df.loc[mask, "predicted_completion"] = m.predict(X).clip(0, 1)
    return df.sort_values("predicted_completion", ascending=False)

The top of that sorted frame is your prioritized list of competitor videos worth dissecting on hook, pacing, and edit. That is the entire reason to build the pipeline: you cannot read their analytics, but you can predict them well enough to focus your manual review.

Operational notes. The TikLiveAPI rate limit is 200 requests per minute by default and a single user-posts page returns up to 35 videos, so 500 posts is roughly 15 requests. For a daily competitor sweep across 30 handles, you want about 450 requests, which costs 450 credits on the pay-as-you-go pricing on the pricing page. Build your scheduler around midnight UTC so cursor pagination stays consistent with the create_time timestamps the API returns.

Layering hashtag and music context

The proxy model gets sharper when you join two more data sources from the same API. The music_info object on every post tells you which audio track the video uses, and the challenge endpoints under the challenge documentation let you score how saturated a hashtag is by day. A video that uses an audio track on the way up and a hashtag that is not yet saturated tends to over-perform its own proxy score, and a video that piles onto an already-peaked sound under-performs. Add two booleans to the feature matrix - sound_rising and tag_saturated - re-fit, and the R-squared per bucket usually picks up two to five points. The cost is one extra /music-posts/ sample per unique track and one /challenge-posts/ sample per unique hashtag per day. Cache aggressively, because audio and hashtag context does not change minute by minute. To keep saturation scoring honest, our guide to reading hashtag view counts without the noise covers which tag metrics are signal.

Sanity checks before you trust the dashboard

Before any of this output goes in front of a marketing lead, run three sanity checks. First, drop any post younger than 72 hours from the benchmark calculation, because TikTok distributes impressions on a slow curve and very new posts will look artificially weak. Second, drop posts flagged with is_ad: true in the response, because paid amplification breaks the organic-engagement assumption every proxy in this post depends on. Third, watch for the is_top flag in user-posts - pinned videos accumulate engagement over months and will dominate any per-creator median if you forget to exclude them. None of these filters are expensive; all three are easy to forget.

Frequently asked questions

Can any third-party API return real watch time or completion rate?

Not for accounts you do not own. Any vendor claiming to return avg_watch_time for arbitrary public videos is either making it up, reselling stolen sessions, or returning a derived proxy under a misleading name. Audit any field you cannot explain the source of.

Is save rate really better than like rate as a proxy?

In our datasets, yes, but verify it on yours. Run the linear regression block above with and without collect_count / play_count in the feature set and compare R-squared per bucket. If saves do not improve the fit for your niche, drop them.

Why bucket by duration at all? Cannot I just regress on duration as a feature?

You can, and for some niches it works. We prefer buckets because the relationship between duration and the proxy fields is not linear. A 9-second loop and a 12-second loop behave nearly identically; a 12-second loop and a 45-second monologue do not. Buckets capture that shape cleanly with one extra column.

How fresh is the data?

The API fetches from TikTok on demand, so counters reflect the live state at request time. There is no cache layer adding lag, but TikTok itself sometimes batches counter updates server-side, so do not be surprised by step changes between two consecutive pulls a few seconds apart.

Where can I try this without writing code first?

The interactive playground on the dashboard lets you fire user-posts and post-detail requests against your own key and inspect the JSON shape. After that the snippets above drop in unchanged. For account questions, the contact form reaches the team within one business day, and your key lives on your profile page. More algorithm-adjacent posts are indexed on the blog.

Watch time and completion rate are not in the public API, and pretending otherwise wastes your team's quarter. Bucketed engagement velocity, save rate as a watchability proxy, and a per-bucket regression fitted on your own Creator Studio CSV is the honest stack. It is also enough to ship.