Watch time and completion rate are the two metrics every TikTok strategist names when asked what the algorithm actually rewards. They are also the two metrics you cannot read from the public API. Not from post-detail, not from user-posts, not from any third-party scraper. Average watch duration and completion percentage live on TikTok's internal analytics surface and inside the Creator Studio CSV export. Everything outside that wall is a proxy.
This post is for data engineers and performance marketers who already accepted that limitation and want to know what to do anyway. We will look at exactly which fields the public API exposes, why play count alone is misleading, how to derive an engagement velocity proxy that correlates with watch behavior, why collect_count is the most underrated field in the response, and how to bridge the gap by joining scraped competitor data with your own Creator Studio CSV exports.
Before we model anything, set expectations on the raw material. The /post-detail/ endpoint returns a flat object with these counters at the top level:
play_count - total viewsdigg_count - likescomment_count - commentsshare_count - shares to other platformsdownload_count - saves to devicecollect_count - bookmarks inside TikTokduration - video length in secondscreate_time - Unix timestampYou also get the playback URLs (play, wmplay, hdplay), music_info, and a minimal author block. The /user-posts/ endpoint wraps the same counters per video inside a videos[] array along with cursor and hasMore. That is the entire surface you have to work with. There is no avg_watch_time, no completion_rate, no fyp_impressions, no retention_curve. If a vendor claims otherwise, they are either reselling Creator Studio scrapes that require a logged-in cookie, or they are making numbers up.
A 15-second video with 1M views and a 60-second video with 1M views are not comparable. TikTok counts a view at roughly the first frame render, which means a 60-second post collecting 1M views could have a median watch time of 6 seconds and still report the same headline number as a 15-second post that was watched to completion. Play count without duration normalization is vanity. The first job is to bucket every video by duration before comparing anything.
The second job is to stop comparing across creators of different sizes without normalization. A 50k-view post on a 200k-follower account is a different beast than a 50k-view post on a 5M-follower account, even at identical duration. Bucket first, then normalize within the creator's recent baseline. We will cover the baseline trick a few sections down, but if you skip both steps you will keep ranking the wrong videos at the top of your dashboard.
TikTok runs an official Creator Marketing API and a Research API. Both can return watch-related metrics for specific surfaces, and both come with limits that matter:
The honest framing is that these are complementary, not substitutes. You use OAuth for your own owned accounts, and you use the public API for the rest of the universe - competitor benchmarking, trend discovery, influencer screening, hashtag research. Trying to use one tool for both problems is where teams burn budget.
One more nuance for performance marketers: the Spark Ads dashboard shows retention curves for posts you have boosted, but only for the boosted period and only for the impressions paid through the ad platform. Organic completion data is still walled off. If you are running an influencer campaign and the creator will not share Studio screenshots, your only options are the proxy approach below or asking them to authorize the Creator Marketing API for the duration of the deal.
Here is the minimum viable extraction. We pull the last N posts for a username, bucket by duration, and compute the proxy metrics we will rely on. The endpoint uses the X-Api-Key header and the user-posts route from /documentation/users/.
import os, time, requests, pandas as pd
API_KEY = os.environ["TIKLIVEAPI_KEY"]
BASE = "https://api.tikliveapi.com"
HEAD = {"X-Api-Key": API_KEY}
def get_uid(username):
r = requests.get(f"{BASE}/userid/",
params={"username": username},
headers=HEAD, timeout=20)
r.raise_for_status()
return r.json()["id"]
def fetch_posts(username, max_posts=200):
uid = get_uid(username)
rows, cursor, fetched = [], "0", 0
while fetched < max_posts:
r = requests.get(f"{BASE}/user-posts/",
params={"userid": uid, "count": 35, "cursor": cursor},
headers=HEAD, timeout=30)
r.raise_for_status()
data = r.json()
for v in data.get("videos", []):
rows.append({
"aweme_id": v["aweme_id"],
"create_time": v["create_time"],
"duration": v["duration"],
"play_count": v["play_count"],
"digg_count": v["digg_count"],
"comment_count": v["comment_count"],
"share_count": v["share_count"],
"download_count": v.get("download_count", 0),
"collect_count": v.get("collect_count", 0),
"title": v.get("title", ""),
})
fetched += len(data.get("videos", []))
if not data.get("hasMore"):
break
cursor = str(data.get("cursor"))
time.sleep(0.35)
return pd.DataFrame(rows)
Two field-level details to double check against the truth map. The user-posts response mixes casing: hasMore is camelCase at the top level while per-video fields are snake_case. And pagination here uses cursor, not time. The followers and following endpoints behave differently - they paginate with a time Unix timestamp and the following endpoint keys the array as followings (plural). Mixing those up is the most common bug we see in customer support tickets.
Once you have a few hundred rows, the first useful proxy is engagement velocity, normalized by duration bucket. The idea is simple: the algorithm rewards videos where the active engagement events (likes, comments, shares) happen at a rate that suggests the audience watched long enough to react. A 7-second loop and a 55-second monologue do not generate engagement at the same rate even when both are doing well, so we bucket first, then rank within bucket.
def bucket_duration(sec):
if sec <= 10: return "0-10s"
if sec <= 20: return "11-20s"
if sec <= 35: return "21-35s"
if sec <= 60: return "36-60s"
return "60s+"
def add_proxies(df):
df = df.copy()
df["bucket"] = df["duration"].apply(bucket_duration)
plays = df["play_count"].clip(lower=1)
df["engagement_velocity"] = (
df["digg_count"] + df["comment_count"] + df["share_count"]
) / plays
df["save_rate"] = df["collect_count"] / plays
df["share_rate"] = df["share_count"] / plays
df["comment_rate"] = df["comment_count"] / plays
return df
def bucket_benchmarks(df):
return df.groupby("bucket").agg(
n=("aweme_id", "count"),
median_velocity=("engagement_velocity", "median"),
median_save=("save_rate", "median"),
median_share=("share_rate", "median"),
p90_velocity=("engagement_velocity",
lambda x: x.quantile(0.9)),
).round(5)
What you are looking for in bucket_benchmarks is the p90 line per bucket. Anything above the 90th percentile inside its own bucket is, in our experience working with creator analytics customers, a strong candidate for an above-average watch-through rate. It is not proof, but it is a signal worth investigating before you spend Spark Ads budget boosting it.
Likes are cheap. Shares are public. Comments are loud. Saves are quiet, and that is exactly why they are the closest available proxy for "felt watchable". A user only taps the bookmark when they want to come back, which strongly implies they watched enough to know they want to. collect_count in the API maps to that bookmark action. In our customer datasets, save rate correlates with self-reported completion rate from Creator Studio more reliably than any other public field, including share rate. If you build only one ranking signal beyond engagement velocity, build collect_count / play_count by bucket.
For your own owned accounts, Creator Studio (now TikTok Studio) lets you export a CSV with average watch time per video. This is the missing variable. The bridge pattern is to join that CSV against the scraped public dataset and fit a per-bucket model so you can score competitor posts using the same coefficients.
import numpy as np
from sklearn.linear_model import LinearRegression
# studio.csv columns assumed:
# aweme_id, avg_watch_time_s, full_video_views, total_views
studio = pd.read_csv("studio.csv")
own = add_proxies(fetch_posts("your_handle", 500))
joined = own.merge(studio, on="aweme_id", how="inner")
joined["completion_rate"] = (
joined["avg_watch_time_s"] / joined["duration"].clip(lower=1)
).clip(0, 1)
models = {}
for bucket, g in joined.groupby("bucket"):
if len(g) < 12:
continue
X = g[["engagement_velocity", "save_rate",
"share_rate", "comment_rate"]].values
y = g["completion_rate"].values
m = LinearRegression().fit(X, y)
models[bucket] = m
print(bucket, "R^2=", round(m.score(X, y), 3),
"coef=", np.round(m.coef_, 3))
Two warnings. First, the model only generalizes to creators in roughly the same niche, format, and audience size as you. A fashion creator's coefficients will not score a finance creator's videos correctly. Second, R-squared above 0.45 per bucket is realistic; if you see 0.85 you have probably overfit a small sample. Re-fit monthly and never trust a bucket with fewer than 20 joined rows.
With the per-bucket model in hand, scoring a competitor is a matter of pulling their posts through the same pipeline. Use the user-posts endpoint with their numeric user id (resolve via /userid/ first), feed the resulting frame through add_proxies, then predict per bucket:
def score_competitor(handle, models):
df = add_proxies(fetch_posts(handle, 300))
df["predicted_completion"] = np.nan
for bucket, m in models.items():
mask = df["bucket"] == bucket
if mask.sum() == 0:
continue
X = df.loc[mask, ["engagement_velocity", "save_rate",
"share_rate", "comment_rate"]].values
df.loc[mask, "predicted_completion"] = m.predict(X).clip(0, 1)
return df.sort_values("predicted_completion", ascending=False)
The top of that sorted frame is your prioritized list of competitor videos worth dissecting on hook, pacing, and edit. That is the entire reason to build the pipeline: you cannot read their analytics, but you can predict them well enough to focus your manual review.
Operational notes. The TikLiveAPI rate limit is 200 requests per minute by default and a single user-posts page returns up to 35 videos, so 500 posts is roughly 15 requests. For a daily competitor sweep across 30 handles, you want about 450 requests, which costs 450 credits on the pay-as-you-go pricing on the pricing page. Build your scheduler around midnight UTC so cursor pagination stays consistent with the create_time timestamps the API returns.
The proxy model gets sharper when you join two more data sources from the same API. The music_info object on every post tells you which audio track the video uses, and the challenge endpoints under the challenge documentation let you score how saturated a hashtag is by day. A video that uses an audio track on the way up and a hashtag that is not yet saturated tends to over-perform its own proxy score, and a video that piles onto an already-peaked sound under-performs. Add two booleans to the feature matrix - sound_rising and tag_saturated - re-fit, and the R-squared per bucket usually picks up two to five points. The cost is one extra /music-posts/ sample per unique track and one /challenge-posts/ sample per unique hashtag per day. Cache aggressively, because audio and hashtag context does not change minute by minute.
Before any of this output goes in front of a marketing lead, run three sanity checks. First, drop any post younger than 72 hours from the benchmark calculation, because TikTok distributes impressions on a slow curve and very new posts will look artificially weak. Second, drop posts flagged with is_ad: true in the response, because paid amplification breaks the organic-engagement assumption every proxy in this post depends on. Third, watch for the is_top flag in user-posts - pinned videos accumulate engagement over months and will dominate any per-creator median if you forget to exclude them. None of these filters are expensive; all three are easy to forget.
Not for accounts you do not own. Any vendor claiming to return avg_watch_time for arbitrary public videos is either making it up, reselling stolen sessions, or returning a derived proxy under a misleading name. Audit any field you cannot explain the source of.
In our datasets, yes, but verify it on yours. Run the linear regression block above with and without collect_count / play_count in the feature set and compare R-squared per bucket. If saves do not improve the fit for your niche, drop them.
You can, and for some niches it works. We prefer buckets because the relationship between duration and the proxy fields is not linear. A 9-second loop and a 12-second loop behave nearly identically; a 12-second loop and a 45-second monologue do not. Buckets capture that shape cleanly with one extra column.
The API fetches from TikTok on demand, so counters reflect the live state at request time. There is no cache layer adding lag, but TikTok itself sometimes batches counter updates server-side, so do not be surprised by step changes between two consecutive pulls a few seconds apart.
The interactive playground on the dashboard lets you fire user-posts and post-detail requests against your own key and inspect the JSON shape. After that the snippets above drop in unchanged. For account questions, the contact form reaches the team within one business day, and your key lives on your profile page. More algorithm-adjacent posts are indexed on the blog.
Watch time and completion rate are not in the public API, and pretending otherwise wastes your team's quarter. Bucketed engagement velocity, save rate as a watchability proxy, and a per-bucket regression fitted on your own Creator Studio CSV is the honest stack. It is also enough to ship.
Ready to put what you read into code? Try our endpoints live or grab the full reference.