TikTok Analytics Dashboard: Architecture and Data Model

Q: Q: How do I detect "viral" posts automatically?

Compute the ratio of play_count on day N over play_count on day 1, partitioned by account. A 5x jump in the first 48 hours combined with an above-account-average like ratio is a strong viral signal. SQL window functions do this in one query.

Q: Q: What about the TikTok Live data and stories?

Stories come from /user-stories/ and have a hasMore/cursor shape similar to posts. They expire after 24 hours, so snapshot them in a separate hourly job rather than the daily one.

By TikLiveAPI Team · Published on May 29, 2026

TikTok Analytics Dashboard: Architecture and Data Model

A TikTok analytics dashboard turns the raw firehose of a creator's public footprint into something a marketing lead, a creator, or a brand-deals manager can actually act on. The typical screen shows three things at once: a live snapshot of the tracked account (followers, hearts, video count), a time series of how those numbers moved over the last 30 or 90 days, and a sortable list of recent posts with engagement metrics. Layered on top you usually find a hashtag cloud, a top-music widget, and an audience tab that samples followers to flag bot ratios.

Off-the-shelf tools such as Sprout Social, Hootsuite Insights, or Iconosquare give you most of this in a managed UI, but they share three drawbacks for niche use cases. They charge per-seat regardless of how many accounts you track, they hide the raw data behind their dashboards (no SQL access), and they will not let you pivot on custom dimensions such as "videos that use a sound from artist X" or "comments that mention competitor Y". Building your own dashboard against a flat-rate API such as TikLiveAPI flips the economics: you own the warehouse, you can join TikTok data with your CRM or your ad spend, and you only pay for the requests you actually fire.

This guide walks through the data model, ingestion pipeline, widget-to-endpoint mapping, snapshot strategy, frontend choices, multi-tenant auth, cost modelling, and the pitfalls that bite every team building one of these for the first time. Code examples are Python; DDL is Postgres flavored.

The data model

The dashboard is mostly a read model over five core tables: accounts (the TikTok handles you track), account_snapshots (a daily row of follower and like counts per account), posts (one row per video), post_snapshots (daily play and digg counts per video), comments, and two many-to-many tables for hashtags and music. Everything else (top-hashtag widgets, growth charts, leaderboards) is a query over these tables.

CREATE TABLE accounts (
  id              BIGSERIAL PRIMARY KEY,
  tenant_id       BIGINT NOT NULL,
  username        TEXT NOT NULL,
  tiktok_user_id  TEXT NOT NULL,
  sec_uid         TEXT,
  nickname        TEXT,
  avatar_url      TEXT,
  verified        BOOLEAN DEFAULT FALSE,
  first_tracked   TIMESTAMPTZ DEFAULT NOW(),
  UNIQUE (tenant_id, tiktok_user_id)
);

CREATE TABLE account_snapshots (
  account_id      BIGINT REFERENCES accounts(id),
  captured_on     DATE NOT NULL,
  follower_count  BIGINT,
  following_count BIGINT,
  heart_count     BIGINT,
  video_count     INT,
  digg_count      BIGINT,
  PRIMARY KEY (account_id, captured_on)
);

CREATE TABLE posts (
  id              BIGSERIAL PRIMARY KEY,
  account_id      BIGINT REFERENCES accounts(id),
  aweme_id        TEXT NOT NULL,
  title           TEXT,
  cover_url       TEXT,
  duration_sec    INT,
  created_at      TIMESTAMPTZ,
  music_id        TEXT,
  first_seen      TIMESTAMPTZ DEFAULT NOW(),
  UNIQUE (account_id, aweme_id)
);

CREATE TABLE post_snapshots (
  post_id         BIGINT REFERENCES posts(id),
  captured_on     DATE NOT NULL,
  play_count      BIGINT,
  digg_count      BIGINT,
  comment_count   BIGINT,
  share_count     BIGINT,
  collect_count   BIGINT,
  PRIMARY KEY (post_id, captured_on)
);

CREATE TABLE comments (
  id              TEXT PRIMARY KEY,
  post_id         BIGINT REFERENCES posts(id),
  text            TEXT,
  digg_count      INT,
  reply_total     INT,
  author_username TEXT,
  created_at      TIMESTAMPTZ
);

CREATE TABLE hashtags (
  cha_name        TEXT PRIMARY KEY,
  challenge_id    TEXT,
  view_count      BIGINT,
  user_count      BIGINT,
  last_refreshed  TIMESTAMPTZ
);

CREATE TABLE post_hashtags (
  post_id    BIGINT REFERENCES posts(id),
  cha_name   TEXT REFERENCES hashtags(cha_name),
  PRIMARY KEY (post_id, cha_name)
);

Two non-obvious points. First, account_snapshots and post_snapshots use a composite primary key on (id, date) rather than a synthetic id. That lets you upsert idempotently when a cron job retries. Second, the comments primary key is TEXT because TikLiveAPI's /post-comments/ endpoint returns the comment id in a field literally called id (not cid), and the values are long alphanumeric strings.

Ingestion pipeline architecture

The pipeline has five moving parts. A scheduler kicks off jobs on a cron cadence. A queue holds the work units. A pool of workers pulls jobs, calls TikLiveAPI, normalises the JSON, and writes to Postgres. Redis caches per-account responses for short read bursts. The frontend reads only from Postgres and Redis - never from the upstream API directly.

+-------------+     +---------+     +-----------+     +----------+
| Airflow/    | --> | Redis   | --> | Python    | --> | Postgres |
| cron tick   |     | queue   |     | workers   |     |          |
+-------------+     +---------+     +-----+-----+     +----+-----+
                                          |                |
                                          v                v
                                   +-------------+   +-----------+
                                   | TikLiveAPI  |   | Frontend  |
                                   | (X-Api-Key) |   | (React)   |
                                   +-------------+   +-----------+

Airflow is overkill if you only run two or three DAGs - a plain crontab firing python ingest.py --task=daily_snapshot at 02:00 UTC is fine until you grow past about 1,000 tracked accounts. Above that, a scheduler with retries, backfill, and lineage pays for itself.

A bare-bones worker for the daily snapshot looks like this. Note the auth header is X-Api-Key, never Bearer.

import os, time, requests, psycopg2
from datetime import date

BASE = "https://api.tikliveapi.com"
HEADERS = {"X-Api-Key": os.environ["TIKLIVE_KEY"]}

def fetch_user_info(username: str) -> dict:
    r = requests.get(f"{BASE}/userinfo-by-username/",
                     params={"username": username},
                     headers=HEADERS, timeout=15)
    r.raise_for_status()
    return r.json()

def snapshot_account(cur, account_id: int, username: str):
    data = fetch_user_info(username)
    stats = data["stats"]   # camelCase counters
    user  = data["user"]
    cur.execute("""
      INSERT INTO account_snapshots
        (account_id, captured_on, follower_count, following_count,
         heart_count, video_count, digg_count)
      VALUES (%s, %s, %s, %s, %s, %s, %s)
      ON CONFLICT (account_id, captured_on) DO UPDATE
        SET follower_count = EXCLUDED.follower_count,
            heart_count    = EXCLUDED.heart_count,
            video_count    = EXCLUDED.video_count
    """, (account_id, date.today(),
          stats["followerCount"], stats["followingCount"],
          stats["heartCount"],    stats["videoCount"],
          stats["diggCount"]))
    cur.execute("""UPDATE accounts SET nickname=%s, avatar_url=%s,
                  verified=%s, sec_uid=%s WHERE id=%s""",
                (user["nickname"], user["avatarMedium"],
                 user.get("verified", False), user["secUid"], account_id))

Worker concurrency wants to stay below the rate limit. TikLiveAPI's documented default is 200 requests per minute, raisable on request - see the pricing page. A safe starting point is 4 worker processes, each making 1 request every 200ms, which keeps you at 1,200 per minute headroom unused and avoids the spike-throttle that punishes "everyone fires at 02:00:00 UTC" patterns.

Mapping TikLiveAPI endpoints to dashboard widgets

Each widget on the dashboard maps to one or two endpoints. Cache the response in Redis under tla:{tenant}:{endpoint}:{params_hash} with a TTL that matches the widget's expected freshness.

Account header tile. Hits /userinfo-by-username/ and reads stats.followerCount, stats.heartCount, stats.videoCount, plus user.nickname and user.avatarMedium. Cache 10 minutes. See the users docs.

Recent posts table. Hits /user-posts/ with the resolved userid. The response shape is {videos, cursor, hasMore} - paginate by passing the returned cursor back. Each video row gives you play_count, digg_count, comment_count, share_count, the cover image, and the music block.

Top hashtags widget. Aggregate hashtags from the title field of each post (regex #\w+), then call /challenge-info-name/ once per unique tag to fetch view_count and user_count for sizing the bubbles. Note the field is cha_name, not name.

Music trend widget. Extract music_info objects from the posts array, group by music id, then call /music-info/ only for the top 10 by frequency. The response is a flat object with title, author, play (the direct mp3 URL), and video_count for sizing.

Audience tab. Hits /user-followers/ with a sample size of 100-500. Pagination here is unusual: it uses a time parameter (a Unix timestamp), not cursor. The top key is followers and hasMore remains camelCase. A trivial bot-ratio heuristic flags follower rows with zero posts, a default avatar, and a username matching /^user\d{6,}$/. Pair with /user-following/ if you want a follower-to-following ratio (top key is followings, plural).

def page_followers(userid: str, pages: int = 5):
    cursor_time = 0
    out = []
    for _ in range(pages):
        r = requests.get(f"{BASE}/user-followers/",
                         params={"userid": userid, "count": 50,
                                 "time": cursor_time},
                         headers=HEADERS, timeout=15).json()
        out.extend(r["followers"])
        if not r.get("hasMore"):
            break
        cursor_time = r["time"]
    return out

Daily snapshot pattern

The growth chart is the single most-requested widget in any analytics tool, and it lives or dies on snapshot discipline. The pattern is:

At 02:00 UTC daily, write one row per tracked account into account_snapshots.
At 02:30 UTC, walk recent posts (last 30 days of each tracked account) and write one row per video into post_snapshots.
The frontend reads account_snapshots WHERE captured_on >= NOW() - INTERVAL '90 days' and renders the line chart directly. No materialised views needed below 100k accounts.

Deltas (today's followers minus yesterday's followers) are a one-line window function:

SELECT
  captured_on,
  follower_count,
  follower_count - LAG(follower_count) OVER (
    PARTITION BY account_id ORDER BY captured_on
  ) AS follower_delta
FROM account_snapshots
WHERE account_id = $1
ORDER BY captured_on DESC
LIMIT 30;

Compute deltas at query time rather than storing them - storage is cheap, but rewriting a stored delta when a backfill arrives is annoying.

Real-time vs scheduled refreshes

Every metric has a natural freshness budget. Treat freshness as a knob you tune per widget, not a global setting.

Hourly or finer: nothing, by default. Real-time on a creator dashboard burns credits with little payoff because TikTok counters jitter.
Daily (02:00 UTC): account snapshots and recent-post snapshots. This drives every chart.
On-demand (user clicks Refresh): the account header tile and the recent posts table. Cap to one refresh per account per 5 minutes via Redis to defend the credit pool.
Weekly: hashtag and music metadata. View counts on a hashtag move slowly enough that refreshing more often is wasteful.

You can demo any of the above against your own key on the playground before wiring it into a worker. That cuts the iteration loop on getting the response shape right.

Frontend layer

React is the obvious choice because the ecosystem around dashboards is huge. Three libraries cover 90% of the surface area:

Recharts or Chart.js for the time-series line, bar, and area charts. Recharts has the smaller learning curve; Chart.js draws faster on data sets above 10k points.
TanStack Table (formerly react-table) with row virtualisation via react-virtual for the recent-posts list. A high-volume creator can have 5,000 posts and a non-virtualised table will jank the scroll bar.
SWR or TanStack Query for cache-and-revalidate semantics against your backend. Both handle the "show stale data instantly, refresh in background" pattern without ceremony.

Server-side, expose Postgres queries via a thin REST or GraphQL layer. The frontend should never call TikLiveAPI directly - that would leak your key into the browser. Keep the API key server-side and proxy where needed.

Auth and multi-tenancy

Two patterns work; pick based on whether your customers know they are using TikLiveAPI.

Pattern A: shared key, tenant column. Your application has one TikLiveAPI key, every table has a tenant_id, and your billing logic decides who can track how many accounts. This is right for an internal tool or a white-labelled product. Credit cost is your COGS line. Easier to operate; you control the credit budget.

Pattern B: per-tenant key. Each customer brings their own key (BYOK). You store it encrypted in a tenants.tiktok_api_key column, decrypt at worker pickup, and pass it as the X-Api-Key header. This is right when customers want their own usage isolation and refund eligibility. It also means rate limits are per-tenant, which sidesteps the noisy-neighbour problem.

# Pattern B worker prelude
from cryptography.fernet import Fernet
fernet = Fernet(os.environ["TENANT_KEY_FERNET"])

def headers_for(tenant_id: int, cur) -> dict:
    cur.execute("SELECT encrypted_key FROM tenants WHERE id=%s", (tenant_id,))
    enc, = cur.fetchone()
    return {"X-Api-Key": fernet.decrypt(enc).decode()}

Either way, never log the key. Strip it from request logs at the worker level and from any error reports sent to Sentry. Both patterns, along with schema-level tenant isolation and per-tenant usage metering, get a fuller treatment in our multi-tenant SaaS architecture guide.

Cost modelling

TikLiveAPI's pricing is flat: 1 request = 1 credit, no tiered overage. That makes per-account-per-day cost trivial to estimate. A baseline tracked-account costs:

1 credit/day for /userinfo-by-username/ (account snapshot)
1 credit/day for the first page of /user-posts/ (gives you the 20-30 most recent videos)
1-2 credits/day if you also pull /post-comments/ on the latest video for sentiment

So a single creator costs around 3-4 credits per day, or 90-120 credits per month. At the published volume pricing on the pricing page, that's pennies per account per month. Add the audience-sample tab (5 pages of 50 followers = 5 credits) on first load only, cached for a week, and you stay well under 150 credits/account/month for a full-featured dashboard.

The cost spikes happen on backfills. Pulling 12 months of history for a new account means walking /user-posts/ until hasMore is false (often 20-50 page calls) plus a /post-detail/ per video for full engagement counts. Budget 100-300 credits as a one-time enrolment cost per tracked account. If that budget feels tight, these credit-saving patterns can trim it substantially.

Common pitfalls

Stale follower counts. TikTok's own counters can lag by hours. If yesterday's snapshot shows fewer followers than today's, that's fine; if today's shows fewer than yesterday's, it usually is not a real drop - it's the upstream cache. Hide deltas smaller than 0.1% in the UI.

Deleted videos. A video that appeared in /user-posts/ last week can vanish this week. Don't hard-delete from the posts table - flag it with deleted_at instead so historical reports still resolve.

Rate-limit spikes during cron. If 800 accounts all snapshot at 02:00:00 the first request will succeed and the 201st will 429. Jitter the schedule: spread the work over the full hour by hashing account_id into a 0-3599 second offset. For retry, backoff, and queue patterns beyond jitter, see our guide to building resilient pipelines around rate limits.

Pagination shape gotchas. /user-posts/ uses cursor. /user-followers/ and /user-following/ use a time timestamp. /user-following/'s top key is the plural followings. Wrong-key bugs are a top source of "why is my chart empty" tickets.

Comment id field. /post-comments/ returns comment id under id, not cid. Hard-code the wrong field and your idempotency check silently inserts duplicates every run.

Avatar URL rot. The avatarMedium URLs expire. Either re-fetch on each snapshot, or mirror to your own object storage on first ingest.

FAQ

Q: Do I need a graph database to track follower relationships?
No. Followers are heavy and largely uninteresting in aggregate. Postgres handles the relationship table fine up to tens of millions of edges; a graph database only pays off if you intend to run multi-hop influence queries.

Q: How do I detect "viral" posts automatically?
Compute the ratio of play_count on day N over play_count on day 1, partitioned by account. A 5x jump in the first 48 hours combined with an above-account-average like ratio is a strong viral signal. SQL window functions do this in one query.

Q: Can I use this same architecture for monitoring competitor accounts?
Yes, that is exactly the most common B2B use case. Just add a tracked_reason column to accounts with values like 'own', 'competitor', 'partner' and filter the dashboard accordingly. The ingestion code is identical. We unpack that variant in our walkthrough on monitoring competitor TikTok accounts at scale.

Q: What about the TikTok Live data and stories?
Stories come from /user-stories/ and have a hasMore/cursor shape similar to posts. They expire after 24 hours, so snapshot them in a separate hourly job rather than the daily one.

Q: Do I have to host the workers in the same region as Postgres?
Co-locating workers with Postgres saves a round-trip on every batch insert. The upstream call to TikLiveAPI dominates latency anyway (the documented average is 750ms), so worker-to-API geography matters more than worker-to-DB. Pick a region close to your API provider's edge.

For deeper questions on endpoint shapes, head to the documentation index - in particular the posts and challenge pages, which carry the field-by-field response notes you will need when normalising into the schema above. For sales or volume questions on credit packages, use the contact form.