TikTok comment sections are one of the most underused datasets on the public web. Every viral clip carries thousands of unfiltered reactions: brand mentions, product questions, jokes that become memes, complaints that hint at the next feature competitors will ship. For social listening teams, UX researchers, and growth marketers, that stream is closer to a focus group than to a feed.
The hard part has always been getting the data out. TikTok's web layer is hostile to scraping, comments are paginated behind cursors that change without notice, and reply threads sit one level deeper than the top-level feed. This guide walks through a production-grade pipeline using TikLiveAPI's /post-comments/ and /post-comment-replies/ endpoints: paginate through a full video, fan out into reply threads, throttle concurrency safely, score sentiment, and store everything in a normalized table you can query later.
If you want to follow along interactively, every endpoint below is also available in the playground and documented in full at /documentation/.
Three things make comments uniquely valuable compared to view counts or likes:
To capture all of that, you need both endpoints working together.
TikLiveAPI exposes a flat REST surface. Every request is authenticated with the X-Api-Key header and costs one credit. The two endpoints relevant here are:
/post-comments/ - top-level comments on a video. Params: url (required), count (max 50), cursor (pagination)./post-comment-replies/ - replies under a single top-level comment. Params: video_id (required), comment_id (required), count (max 50), cursor.Both responses use the same top-level key: comments. That is a small but important detail - the replies endpoint does not return replies, it returns comments. The schema is otherwise identical to the top-level call, with one difference: top-level comments include a reply_total field telling you how many replies exist; replies themselves do not carry that field.
Before paginating, get one page working end to end. The request is a simple GET against https://api.tikliveapi.com/post-comments/ with three query parameters and one header.
import os
import requests
API_KEY = os.environ["TIKLIVEAPI_KEY"]
BASE_URL = "https://api.tikliveapi.com"
def fetch_comments_page(video_url, cursor=0, count=50):
headers = {"X-Api-Key": API_KEY}
params = {"url": video_url, "count": count, "cursor": cursor}
r = requests.get(f"{BASE_URL}/post-comments/", headers=headers, params=params, timeout=30)
r.raise_for_status()
return r.json()
data = fetch_comments_page("https://www.tiktok.com/@username/video/7300000000000000000")
for c in data["comments"]:
print(c["create_time"], c["digg_count"], c["reply_total"], c["user"]["unique_id"], c["text"][:80])
Every comment object uses snake_case keys: id, video_id, create_time, digg_count, reply_total, images, status, plus a nested user object containing sec_uid, unique_id, follower_count and the rest. The status field is worth knowing: 1 is the normal state, while 11 shows up on pinned or otherwise flagged comments.
TikTok comment endpoints use a cursor-based pagination scheme. Each response includes the cursor for the next page; when you reach the end, the cursor either stops advancing or the comments array comes back empty. The loop pattern is identical whether you are pulling 200 comments or 200,000:
import time
def fetch_all_comments(video_url, max_pages=None, throttle=0.25):
"""Yield every top-level comment for a video, page by page."""
cursor = 0
page = 0
seen = set()
while True:
data = fetch_comments_page(video_url, cursor=cursor)
batch = data.get("comments") or []
if not batch:
break
for c in batch:
cid = c.get("id")
if cid in seen:
continue
seen.add(cid)
yield c
# Advance cursor. APIs sometimes echo it back; treat missing/zero as terminal.
next_cursor = data.get("cursor")
if not next_cursor or next_cursor == cursor:
break
cursor = next_cursor
page += 1
if max_pages and page >= max_pages:
break
time.sleep(throttle)
A few notes on this loop:
id) in a set guarantees you never persist a duplicate.max_pages for exploratory runs so a typo does not burn through your credit balance.Replies live one level deeper, and you only want them where they matter. A heuristic that works well in practice: fetch replies for any top-level comment where reply_total >= 5 or where the comment is pinned (status == 11). That keeps your credit spend proportional to the signal.
The reply endpoint requires both video_id and comment_id as snake_case query params. The video_id is on every comment object you already fetched, and the comment_id is the comment's own id.
def fetch_reply_page(video_id, comment_id, cursor=0, count=50):
headers = {"X-Api-Key": API_KEY}
params = {
"video_id": video_id,
"comment_id": comment_id,
"count": count,
"cursor": cursor,
}
r = requests.get(f"{BASE_URL}/post-comment-replies/", headers=headers, params=params, timeout=30)
r.raise_for_status()
return r.json()
def fetch_all_replies(video_id, comment_id, throttle=0.25):
cursor = 0
while True:
data = fetch_reply_page(video_id, comment_id, cursor=cursor)
batch = data.get("comments") or [] # Same key as top-level: 'comments'
if not batch:
break
for reply in batch:
yield reply
next_cursor = data.get("cursor")
if not next_cursor or next_cursor == cursor:
break
cursor = next_cursor
time.sleep(throttle)
Remember: the response key here is still comments, not replies, and the reply objects do not carry the reply_total field. Treat that absence as your signal that you have reached a leaf in the tree.
You will be tempted to fan out as wide as possible. Resist that impulse for a single video - cursor pagination is inherently sequential, and TikTok's backend is friendlier when one video's comments are pulled in order. Where you can safely parallelize is across videos.
A clean pattern uses a small worker pool: each worker owns one video and paginates it sequentially, while multiple videos run in parallel.
from concurrent.futures import ThreadPoolExecutor, as_completed
def harvest_video(video_url, max_pages=20):
"""Sequential within a video: comments first, then targeted reply fetches."""
comments = list(fetch_all_comments(video_url, max_pages=max_pages))
replies = []
for c in comments:
if (c.get("reply_total") or 0) >= 5 or c.get("status") == 11:
replies.extend(fetch_all_replies(c["video_id"], c["id"]))
return {"video_url": video_url, "comments": comments, "replies": replies}
def harvest_many(video_urls, workers=4):
results = []
with ThreadPoolExecutor(max_workers=workers) as pool:
futures = {pool.submit(harvest_video, u): u for u in video_urls}
for fut in as_completed(futures):
results.append(fut.result())
return results
Four workers at 250 ms per request stays comfortably under the 200 rpm rate limit. If you need higher throughput, request a lift on /contact/ rather than racing the limiter.
Once the comments are in memory, the analysis layer is yours. For a quick exploratory pass, a small rule-based scorer is enough to surface positivity and negativity gradients. For production you will want a real model - a fine-tuned distilbert or a multilingual transformer such as xlm-roberta.
POSITIVE = {"love", "amazing", "best", "perfect", "fire", "iconic", "obsessed", "queen", "goat"}
NEGATIVE = {"hate", "worst", "trash", "boring", "cringe", "scam", "fake", "annoying"}
def rule_score(text):
if not text:
return 0.0
tokens = text.lower().split()
pos = sum(1 for t in tokens if t.strip(".,!?") in POSITIVE)
neg = sum(1 for t in tokens if t.strip(".,!?") in NEGATIVE)
if pos + neg == 0:
return 0.0
return (pos - neg) / (pos + neg)
# Production path - swap in a transformer:
# from transformers import pipeline
# clf = pipeline("sentiment-analysis", model="cardiffnlp/twitter-xlm-roberta-base-sentiment")
# score = clf(comment_text)[0]
Two practical points. First, TikTok comments are often non-English, so any production model should be multilingual or paired with a language detector such as fasttext-langdetect. Second, emoji carry signal: a thread that scores neutral on words alone may be overwhelmingly positive once you count the fire and heart emojis. Keep them in the input rather than stripping them.
Flat, normalized storage beats nested JSON for everything you will want to do later (joins, aggregates, longitudinal sentiment). A single comments table covers both top-level entries and replies if you reserve a nullable parent_id column.
CREATE TABLE tiktok_comments (
id BIGINT PRIMARY KEY,
video_id BIGINT NOT NULL,
parent_id BIGINT NULL, -- NULL = top-level, set = reply
user_id VARCHAR(64),
username VARCHAR(64),
text TEXT,
digg_count INT,
reply_total INT, -- NULL on replies
status SMALLINT, -- 1 normal, 11 pinned, etc.
create_time BIGINT, -- TikTok unix timestamp
sentiment FLOAT,
language VARCHAR(8),
fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX (video_id),
INDEX (parent_id),
INDEX (create_time)
);
The Python writer maps each API object to a row. Keep the original payload around in a JSON column or object store if you can afford it - schemas drift, and being able to re-derive fields without re-fetching is worth the disk.
def to_row(c, parent_id=None):
user = c.get("user") or {}
return {
"id": int(c["id"]),
"video_id": int(c["video_id"]),
"parent_id": int(parent_id) if parent_id else None,
"user_id": user.get("sec_uid"),
"username": user.get("unique_id"),
"text": c.get("text"),
"digg_count": c.get("digg_count", 0),
"reply_total": c.get("reply_total"), # None for replies
"status": c.get("status"),
"create_time": c.get("create_time"),
"sentiment": rule_score(c.get("text")),
}
create_time to see how a video's reception evolved during its viral window.if next_cursor == cursor: break handles it.Up to 50 on both /post-comments/ and /post-comment-replies/. Always pass count=50 to minimize requests per page.
replies key?No. It returns the same comments array as the top-level endpoint. The only schema difference is that replies do not carry the reply_total field.
Either the comments array comes back empty or the cursor stops advancing between pages. Guard for both in your loop.
No. The endpoint only returns publicly available data; private accounts and removed videos return an empty payload.
Use the playground to validate a single page response without writing code, then run the loop with max_pages=2 on one video before scaling out. Credits never expire, so a small test budget goes a long way.
Comments are the part of TikTok that machines have historically had the hardest time reading and humans have always known to be the most interesting. With paginated /post-comments/, targeted /post-comment-replies/, a sensible concurrency model, and a flat storage schema, you can turn that section into a queryable dataset in an afternoon.
Ready to put what you read into code? Try our endpoints live or grab the full reference.