How to Fetch TikTok Comments at Scale with Pagination

Published on May 29, 2026

TikTok comment sections are one of the most underused datasets on the public web. Every viral clip carries thousands of unfiltered reactions: brand mentions, product questions, jokes that become memes, complaints that hint at the next feature competitors will ship. For social listening teams, UX researchers, and growth marketers, that stream is closer to a focus group than to a feed.

The hard part has always been getting the data out. TikTok's web layer is hostile to scraping, comments are paginated behind cursors that change without notice, and reply threads sit one level deeper than the top-level feed. This guide walks through a production-grade pipeline using TikLiveAPI's /post-comments/ and /post-comment-replies/ endpoints: paginate through a full video, fan out into reply threads, throttle concurrency safely, score sentiment, and store everything in a normalized table you can query later.

If you want to follow along interactively, every endpoint below is also available in the playground and documented in full at /documentation/.

Why TikTok comments are an analytical goldmine

Three things make comments uniquely valuable compared to view counts or likes:

  • Sentiment density. A comment is a sentence; a like is a bit. One thousand comments contain more usable signal than one million likes.
  • FAQ mining. When a product video goes viral, the comment section becomes a real-time FAQ. "Where can I buy this?", "Does it work on curly hair?", "Is the link in bio?" - the questions people actually ask, in the language they actually use. That is gold for landing-page copy, paid-ad creative, and support macros.
  • Community texture. Reply threads expose how a community talks to itself. Top-level comments are often performative; the replies underneath are where the real debate happens.

To capture all of that, you need both endpoints working together.

The two endpoints you will use

TikLiveAPI exposes a flat REST surface. Every request is authenticated with the X-Api-Key header and costs one credit. The two endpoints relevant here are:

  • /post-comments/ - top-level comments on a video. Params: url (required), count (max 50), cursor (pagination).
  • /post-comment-replies/ - replies under a single top-level comment. Params: video_id (required), comment_id (required), count (max 50), cursor.

Both responses use the same top-level key: comments. That is a small but important detail - the replies endpoint does not return replies, it returns comments. The schema is otherwise identical to the top-level call, with one difference: top-level comments include a reply_total field telling you how many replies exist; replies themselves do not carry that field.

Step 1: Fetch a single page

Before paginating, get one page working end to end. The request is a simple GET against https://api.tikliveapi.com/post-comments/ with three query parameters and one header.

import os
import requests

API_KEY = os.environ["TIKLIVEAPI_KEY"]
BASE_URL = "https://api.tikliveapi.com"

def fetch_comments_page(video_url, cursor=0, count=50):
    headers = {"X-Api-Key": API_KEY}
    params = {"url": video_url, "count": count, "cursor": cursor}
    r = requests.get(f"{BASE_URL}/post-comments/", headers=headers, params=params, timeout=30)
    r.raise_for_status()
    return r.json()

data = fetch_comments_page("https://www.tiktok.com/@username/video/7300000000000000000")
for c in data["comments"]:
    print(c["create_time"], c["digg_count"], c["reply_total"], c["user"]["unique_id"], c["text"][:80])

Every comment object uses snake_case keys: id, video_id, create_time, digg_count, reply_total, images, status, plus a nested user object containing sec_uid, unique_id, follower_count and the rest. The status field is worth knowing: 1 is the normal state, while 11 shows up on pinned or otherwise flagged comments.

Step 2: Paginate through every comment

TikTok comment endpoints use a cursor-based pagination scheme. Each response includes the cursor for the next page; when you reach the end, the cursor either stops advancing or the comments array comes back empty. The loop pattern is identical whether you are pulling 200 comments or 200,000:

import time

def fetch_all_comments(video_url, max_pages=None, throttle=0.25):
    """Yield every top-level comment for a video, page by page."""
    cursor = 0
    page = 0
    seen = set()
    while True:
        data = fetch_comments_page(video_url, cursor=cursor)
        batch = data.get("comments") or []
        if not batch:
            break
        for c in batch:
            cid = c.get("id")
            if cid in seen:
                continue
            seen.add(cid)
            yield c
        # Advance cursor. APIs sometimes echo it back; treat missing/zero as terminal.
        next_cursor = data.get("cursor")
        if not next_cursor or next_cursor == cursor:
            break
        cursor = next_cursor
        page += 1
        if max_pages and page >= max_pages:
            break
        time.sleep(throttle)

A few notes on this loop:

  • Deduplicate. Comment endpoints occasionally repeat an entry on adjacent pages. Tracking the id (id) in a set guarantees you never persist a duplicate.
  • Bound the run. Top videos can have a million comments. Always pass max_pages for exploratory runs so a typo does not burn through your credit balance.
  • Throttle gently. The published rate limit is 200 requests per minute. A 250 ms sleep keeps you well below that with margin to spare. You can monitor your usage on /profile/.

Step 3: Pull replies for high-engagement comments

Replies live one level deeper, and you only want them where they matter. A heuristic that works well in practice: fetch replies for any top-level comment where reply_total >= 5 or where the comment is pinned (status == 11). That keeps your credit spend proportional to the signal.

The reply endpoint requires both video_id and comment_id as snake_case query params. The video_id is on every comment object you already fetched, and the comment_id is the comment's own id.

def fetch_reply_page(video_id, comment_id, cursor=0, count=50):
    headers = {"X-Api-Key": API_KEY}
    params = {
        "video_id": video_id,
        "comment_id": comment_id,
        "count": count,
        "cursor": cursor,
    }
    r = requests.get(f"{BASE_URL}/post-comment-replies/", headers=headers, params=params, timeout=30)
    r.raise_for_status()
    return r.json()

def fetch_all_replies(video_id, comment_id, throttle=0.25):
    cursor = 0
    while True:
        data = fetch_reply_page(video_id, comment_id, cursor=cursor)
        batch = data.get("comments") or []   # Same key as top-level: 'comments'
        if not batch:
            break
        for reply in batch:
            yield reply
        next_cursor = data.get("cursor")
        if not next_cursor or next_cursor == cursor:
            break
        cursor = next_cursor
        time.sleep(throttle)

Remember: the response key here is still comments, not replies, and the reply objects do not carry the reply_total field. Treat that absence as your signal that you have reached a leaf in the tree.

Step 4: Concurrency, carefully

You will be tempted to fan out as wide as possible. Resist that impulse for a single video - cursor pagination is inherently sequential, and TikTok's backend is friendlier when one video's comments are pulled in order. Where you can safely parallelize is across videos.

A clean pattern uses a small worker pool: each worker owns one video and paginates it sequentially, while multiple videos run in parallel.

from concurrent.futures import ThreadPoolExecutor, as_completed

def harvest_video(video_url, max_pages=20):
    """Sequential within a video: comments first, then targeted reply fetches."""
    comments = list(fetch_all_comments(video_url, max_pages=max_pages))
    replies = []
    for c in comments:
        if (c.get("reply_total") or 0) >= 5 or c.get("status") == 11:
            replies.extend(fetch_all_replies(c["video_id"], c["id"]))
    return {"video_url": video_url, "comments": comments, "replies": replies}

def harvest_many(video_urls, workers=4):
    results = []
    with ThreadPoolExecutor(max_workers=workers) as pool:
        futures = {pool.submit(harvest_video, u): u for u in video_urls}
        for fut in as_completed(futures):
            results.append(fut.result())
    return results

Four workers at 250 ms per request stays comfortably under the 200 rpm rate limit. If you need higher throughput, request a lift on /contact/ rather than racing the limiter.

Step 5: Sentiment analysis

Once the comments are in memory, the analysis layer is yours. For a quick exploratory pass, a small rule-based scorer is enough to surface positivity and negativity gradients. For production you will want a real model - a fine-tuned distilbert or a multilingual transformer such as xlm-roberta.

POSITIVE = {"love", "amazing", "best", "perfect", "fire", "iconic", "obsessed", "queen", "goat"}
NEGATIVE = {"hate", "worst", "trash", "boring", "cringe", "scam", "fake", "annoying"}

def rule_score(text):
    if not text:
        return 0.0
    tokens = text.lower().split()
    pos = sum(1 for t in tokens if t.strip(".,!?") in POSITIVE)
    neg = sum(1 for t in tokens if t.strip(".,!?") in NEGATIVE)
    if pos + neg == 0:
        return 0.0
    return (pos - neg) / (pos + neg)

# Production path - swap in a transformer:
# from transformers import pipeline
# clf = pipeline("sentiment-analysis", model="cardiffnlp/twitter-xlm-roberta-base-sentiment")
# score = clf(comment_text)[0]

Two practical points. First, TikTok comments are often non-English, so any production model should be multilingual or paired with a language detector such as fasttext-langdetect. Second, emoji carry signal: a thread that scores neutral on words alone may be overwhelmingly positive once you count the fire and heart emojis. Keep them in the input rather than stripping them.

Step 6: Persistence

Flat, normalized storage beats nested JSON for everything you will want to do later (joins, aggregates, longitudinal sentiment). A single comments table covers both top-level entries and replies if you reserve a nullable parent_id column.

CREATE TABLE tiktok_comments (
    id            BIGINT PRIMARY KEY,
    video_id      BIGINT NOT NULL,
    parent_id     BIGINT NULL,          -- NULL = top-level, set = reply
    user_id       VARCHAR(64),
    username      VARCHAR(64),
    text          TEXT,
    digg_count    INT,
    reply_total   INT,                  -- NULL on replies
    status        SMALLINT,             -- 1 normal, 11 pinned, etc.
    create_time   BIGINT,               -- TikTok unix timestamp
    sentiment     FLOAT,
    language      VARCHAR(8),
    fetched_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX (video_id),
    INDEX (parent_id),
    INDEX (create_time)
);

The Python writer maps each API object to a row. Keep the original payload around in a JSON column or object store if you can afford it - schemas drift, and being able to re-derive fields without re-fetching is worth the disk.

def to_row(c, parent_id=None):
    user = c.get("user") or {}
    return {
        "id": int(c["id"]),
        "video_id": int(c["video_id"]),
        "parent_id": int(parent_id) if parent_id else None,
        "user_id": user.get("sec_uid"),
        "username": user.get("unique_id"),
        "text": c.get("text"),
        "digg_count": c.get("digg_count", 0),
        "reply_total": c.get("reply_total"),   # None for replies
        "status": c.get("status"),
        "create_time": c.get("create_time"),
        "sentiment": rule_score(c.get("text")),
    }

Use cases

  • UGC mining. Filter for top-engagement comments per video, deduplicate by user, and surface the best lines for testimonial-style ad creative.
  • Customer feedback. Pipe comments from your brand's own videos and competitor videos through sentiment scoring, then route negative spikes to support.
  • Content idea generation. Cluster question-shaped comments (those ending in "?") to find the topics your audience wants you to make next.
  • Trend detection. Aggregate sentiment over create_time to see how a video's reception evolved during its viral window.

Common pitfalls

  • Rate limits. The default ceiling is 200 requests per minute. Build for that envelope from day one; do not assume you can burst.
  • Deleted comments. Between two pages, TikTok may delete entries. Treat any drop in count as expected and never error out.
  • Mixed casing. Most fields are snake_case, but other endpoints in the catalogue mix camelCase inside the same response (see the notes in the user-info docs). Always inspect a real payload before writing parsers.
  • Cursor stalling. Some videos return the same cursor twice at the end. The guard if next_cursor == cursor: break handles it.
  • Language detection. Multilingual videos are common. Detect language per comment, not per video.
  • Cost math. One request equals one credit; a million-comment video pulled 50 at a time is 20,000 credits. Plan your sampling on the pricing page before you start.

FAQ

How many comments can I fetch per request?

Up to 50 on both /post-comments/ and /post-comment-replies/. Always pass count=50 to minimize requests per page.

Does the reply endpoint return a replies key?

No. It returns the same comments array as the top-level endpoint. The only schema difference is that replies do not carry the reply_total field.

How do I know when pagination is finished?

Either the comments array comes back empty or the cursor stops advancing between pages. Guard for both in your loop.

Can I fetch comments from a private or deleted video?

No. The endpoint only returns publicly available data; private accounts and removed videos return an empty payload.

What is the cheapest way to test the pipeline?

Use the playground to validate a single page response without writing code, then run the loop with max_pages=2 on one video before scaling out. Credits never expire, so a small test budget goes a long way.

Comments are the part of TikTok that machines have historically had the hardest time reading and humans have always known to be the most interesting. With paginated /post-comments/, targeted /post-comment-replies/, a sensible concurrency model, and a flat storage schema, you can turn that section into a queryable dataset in an afternoon.

Build with the TikTok API

Ready to put what you read into code? Try our endpoints live or grab the full reference.

Open Playground Read Documentation