Build an Influencer Discovery Tool with the TikTok API

Published on May 29, 2026

Influencer marketing has graduated from a nice-to-have line item into a measurable acquisition channel. The problem: finding the right creators is still painful. Scrolling TikTok for hours, copy-pasting handles into a spreadsheet, eyeballing engagement rates - this does not scale, and it certainly does not survive a quarterly campaign cadence where you need 50 net-new candidates every week. Agency founders feel it as a margin squeeze; in-house marketing engineers feel it as a Jira ticket that never closes.

This guide walks through building a working influencer discovery tool on top of the TikLiveAPI endpoints. By the end you will have a system that ingests seed hashtags, fans out into candidate creators, enriches each profile, scores niche fit, estimates audience quality, and lets a non-technical user filter and export the result as CSV. Stack is Python for the workers, Postgres for storage, Redis for the queue, and a thin React UI on top. Every code sample is production-shaped, not pseudocode.

Product overview

The end product is a single-page app where a campaign manager picks a niche (skincare, gym, finance, gaming), a follower tier (10K-100K, 100K-1M, 1M+), a region (ISO-2 country code), an optional language, and an engagement-rate threshold. They click Search. Within seconds they get a ranked list of creators with avatar, handle, follower count, ER%, last-30-day post cadence, niche-fit score, and an estimated audience-quality score.

From the result table they can shortlist, push to outreach status (Pending, Contacted, Replied, Booked, Declined), or export to CSV. Saved searches re-run on a schedule and surface only new candidates, so the team treats discovery as an inbox rather than a chore. Done right, this replaces the "intern with a spreadsheet" pattern that 80% of agencies still run.

Architecture

Five components, all boring on purpose:

  • Postgres - tables: creators, creator_posts, niches, searches, search_results, outreach.
  • Redis - work queue (RQ or Celery) for the enrichment jobs; also short-lived cache for hot profiles.
  • Worker fleet - 4 to 16 Python workers consuming jobs. Each worker holds an X-Api-Key from your TikLiveAPI dashboard.
  • FastAPI gateway - the React UI talks to this; it never calls TikLiveAPI directly so your key never ships to the browser.
  • React UI - filter panel, results table, export button, saved-searches list.

The reason the workers fan out instead of running synchronously inside the request is rate-limit headroom. Discovery jobs typically chew through hundreds of API calls per search; a request-scoped flow would time out long before completion. Push jobs into Redis, return a job ID, poll from the UI. The standard worker pattern is well-trodden and your future self will thank you.

Step 1: Seed sourcing

You need a way to discover creators that did not already exist in your database. Two endpoints carry the load: /challenge-posts/ for hashtag fan-out and /search-video/ for keyword fan-out.

Workflow: a niche has 10-20 seed terms ("skincare routine", "drugstore makeup", "retinol", "korean beauty"). For each term, hit /search-video/ with sort_by=2 (date posted) and publish_time=30 (last month), then collect every author.uniqueId from the returned videos. Same for hashtag-based seeds via /challenge-posts/, which accepts a region filter using ISO-2 codes from /region-list/.

import os, requests, time

BASE = "https://api.tikliveapi.com"
HEADERS = {"X-Api-Key": os.environ["TIKLIVEAPI_KEY"]}

def search_seed_creators(keyword, region="US", pages=3):
    seen = set()
    cursor = 0
    for _ in range(pages):
        r = requests.get(
            f"{BASE}/search-video/",
            params={
                "keyword": keyword,
                "count": 30,
                "cursor": cursor,
                "publish_time": 30,
                "sort_by": 2,
                "region": region,
            },
            headers=HEADERS,
            timeout=30,
        )
        data = r.json()
        for v in data.get("videos", []):
            author = v.get("author") or {}
            handle = author.get("uniqueId") or author.get("unique_id")
            if handle:
                seen.add(handle)
        if not data.get("hasMore"):
            break
        cursor = data.get("cursor", 0)
        time.sleep(0.2)
    return seen

Note the camelCase hasMore on paginated responses - this trips up engineers expecting snake_case across the board. Enqueue each new handle as a Redis job tagged enrich:{handle}. Deduplicate aggressively: a popular hashtag will surface the same 20 creators on every run, so a Bloom filter on seen handles saves real money.

Step 2: Candidate enrichment

For every seeded handle, the enrichment worker runs three calls. First, /userinfo-by-username/ to get follower count, video count, signature (bio), verified flag, and the user.id (numeric). Second, /user-posts/ with count=30 to grab the last 30 posts for engagement-rate math. Third, optionally, /userid/ if you only have a handle and need the numeric ID separately.

def enrich(handle):
    info = requests.get(
        f"{BASE}/userinfo-by-username/",
        params={"username": handle},
        headers=HEADERS, timeout=30,
    ).json()
    user = info.get("user", {})
    stats = info.get("stats", {})

    userid = user.get("id")
    posts = requests.get(
        f"{BASE}/user-posts/",
        params={"userid": userid, "count": 30, "cursor": 0},
        headers=HEADERS, timeout=30,
    ).json()
    videos = posts.get("videos", [])

    plays = sum(v.get("play_count", 0) for v in videos) or 1
    likes = sum(v.get("digg_count", 0) for v in videos)
    comments = sum(v.get("comment_count", 0) for v in videos)
    shares = sum(v.get("share_count", 0) for v in videos)

    er = (likes + comments + shares) / plays
    return {
        "handle": user.get("uniqueId"),
        "userid": userid,
        "nickname": user.get("nickname"),
        "signature": user.get("signature", ""),
        "verified": user.get("verified", False),
        "followers": stats.get("followerCount", 0),
        "videos": stats.get("videoCount", 0),
        "hearts": stats.get("heartCount", 0),
        "er_30": round(er, 4),
        "avg_views": plays // max(len(videos), 1),
        "last_posts": videos,
    }

Persist to creators. Note the camelCase counters - followerCount, heartCount, videoCount - and that /user-posts/ paginates via a numeric cursor plus a hasMore flag. If you also fetch comments on the top-performing post for sentiment analysis, the /post-comments/ response uses an id field for each comment (not cid) - small detail, two-hour debugging session if you miss it.

Step 3: Niche fit scoring

Follower count alone is not enough; a 500K travel influencer is the wrong fit for a B2B SaaS. Score every creator on textual similarity to your target niche so the table sorts by relevance, not vanity.

Build a corpus per creator by concatenating signature (bio) and the title/desc field of their last 30 posts. Build a niche corpus by concatenating 20-50 seed phrases that describe the niche. Run TF-IDF on the union, cosine-similarity each creator vector against the niche vector, store as niche_fit in [0, 1].

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def niche_score(creator_text, niche_text):
    v = TfidfVectorizer(stop_words="english", ngram_range=(1, 2))
    m = v.fit_transform([niche_text, creator_text])
    return float(cosine_similarity(m[0:1], m[1:2])[0][0])

For multi-language niches, swap to a language-aware tokenizer or run a tiny embedding model (MiniLM is enough) instead of TF-IDF. The architecture stays identical - only the scorer changes. Weighting matters: bio text usually signals niche better than per-post titles, which can be clickbait. A 3:1 bio-to-posts weight is a sensible default.

Step 4: Audience quality estimate

Real engagement is partly a follower-quality problem. A creator with 800K followers and an inflated count from a 2023 bot wave will look great on the spreadsheet and disappoint on the campaign. To sanity-check, sample the follower list with /user-followers/ and compute an alive-rate.

Two non-obvious details: pagination on /user-followers/ uses a time parameter (a timestamp), not the cursor param the other list endpoints use. And /user-following/ returns the top key followings (plural with the trailing s) rather than following. Get this wrong and your worker silently drops rows.

def sample_followers(userid, target=200):
    out, t = [], 0
    while len(out) < target:
        r = requests.get(
            f"{BASE}/user-followers/",
            params={"userid": userid, "count": 50, "time": t},
            headers=HEADERS, timeout=30,
        ).json()
        out.extend(r.get("followers", []))
        if not r.get("hasMore"):
            break
        t = r.get("time", 0)
    return out[:target]

def alive_rate(followers):
    if not followers:
        return 0.0
    alive = 0
    for f in followers:
        # heuristics: has avatar, has any videos, not 0-following 0-followers
        stats = f.get("stats", {})
        if (f.get("avatarThumb") and stats.get("videoCount", 0) > 0
                and stats.get("followingCount", 0) > 5):
            alive += 1
    return alive / len(followers)

A 200-follower sample is a noisy estimate but stable enough at scale; rolling it up across a creator's audience flags the worst offenders (alive_rate under 0.3 is a strong fake-follower signal). For a deeper treatment of why this matters, see our blog post on detecting fake follower waves; the same heuristics carry over.

Step 5: Filter UI

The React side is intentionally thin. A filter panel, a sortable table, and a saved-searches sidebar. Resist the urge to put scoring logic in the browser - the API key has to stay server-side and the user does not want a 5-second client-side TF-IDF.

function FilterPanel({ value, onChange }) {
  return (
    <form onSubmit={(e) => { e.preventDefault(); onChange(value); }}>
      <select name="tier" defaultValue={value.tier}>
        <option value="nano">10K - 100K</option>
        <option value="mid">100K - 1M</option>
        <option value="macro">1M+</option>
      </select>
      <NicheMultiSelect value={value.niches} />
      <RegionSelect value={value.region} />
      <input type="number" name="erMin" step="0.001"
             defaultValue={value.erMin} placeholder="ER >=" />
      <button type="submit">Search</button>
    </form>
  );
}

On submit, the React app hits your FastAPI /search endpoint which queries Postgres - not TikLiveAPI - because all the heavy work happened during enrichment. Live API traffic only happens when a saved search refreshes or a user requests a single-handle deep-dive from the row context menu. That separation keeps the user experience snappy even when the worker fleet is busy.

Step 6: Output, CSV and outreach

Add a CSV export button that streams from the same query, plus an outreach panel that updates a status column per shortlisted creator. Saved searches are stored as JSON filter blobs; a nightly cron re-runs them, diffs against last run, and pings Slack with "12 new creators match Skincare-EU-Mid". Track outreach status with a simple state machine: Pending to Contacted to Replied to Booked, with a Declined terminal state.

For the team, this turns a 4-hour scrolling session into a 30-second filter change. For the agency owner, it turns a vague "we have a roster" into a queryable asset. Quarterly board decks pull from the same Postgres - no more screenshots from a SaaS dashboard you do not control.

Step 7: Caching strategy

You will burn credits fast if you re-enrich the same creators. Three layers of cache:

  • Profile cache, 24h. Keyed on handle, store the /userinfo-by-username/ result. Refresh on demand from the UI if a user clicks "Re-fetch".
  • Posts cache, 1h. For the ER recomputation; post counts change fast on viral hits.
  • Hashtag seeds, 6h. A trending hashtag will surface the same top creators within a six-hour window; longer than that and you miss new entrants.

Implement with Redis SETEX and a thin wrapper around your request function. Cache misses hit the API; cache hits do not. Add a single metrics counter for hit-rate and you will see it climb from 0% on day one to north of 70% within a week as your candidate database matures.

Step 8: Cost projection

Every TikLiveAPI request costs one credit. For 1,000 audited creators:

  • 1 /userinfo-by-username/ per creator = 1,000 credits
  • 1 /user-posts/ per creator = 1,000 credits
  • 4 paginated /user-followers/ for audience sampling = 4,000 credits
  • Seed sourcing overhead, amortized = ~500 credits

Total ~6,500 credits per 1,000 audited creators, before caching. With the 24h profile cache hitting on warm runs, the marginal cost of a re-search drops to roughly the new-creator delta - typically 10-15% of the cold-run cost. Check current pricing on the pricing page; new accounts get 100 free credits to prototype before committing.

Compliance

Three rules you cannot skip:

  • FTC #ad disclosure. Surface a column on each candidate flagging whether their recent posts include #ad, #sponsored, or #partner. Brands need creators who already disclose correctly - it is a campaign-killer when a sponsored post gets pulled because the creator forgot disclosure.
  • GDPR for EU creators. Store only what you need (handle, follower count, ER, public bio). Add a hard delete endpoint. Do not persist follower-list raw data beyond the alive-rate computation - the aggregate is fine; the list is risky.
  • ToS compliance. Only consume public data. Do not attempt to bypass private accounts. The endpoints in this guide return only publicly available information, but your downstream use still has to respect platform rules.

How this differs from Modash and Upfluence

Off-the-shelf influencer platforms are excellent if your needs match their schema. They fall short on three axes:

  • Speed of iteration. You want a custom niche-fit scorer that weights bio over post text 3:1? On a SaaS platform that is a feature request. On your own stack it is one line in niche_score.
  • Cost at scale. Per-seat pricing on Modash starts mid four figures monthly. A self-hosted system on pay-as-you-go credits scales with usage, not seats.
  • Data ownership. Your shortlists, outreach status, and historical ER trends live in your Postgres, not a vendor's. Migration risk drops to zero.

The trade-off: you maintain the system. For an agency running more than 20 campaigns a year, the math favors building. For an agency running three, buy the SaaS. There is no shame in either path - just pick the one your team can actually staff.

Try it

The minimum viable version of this tool is a Python script plus a 200-line FastAPI service. You can have seed sourcing and enrichment working in an afternoon. Spin up an account, grab your API key, and start with the interactive playground to feel out the response shapes before wiring them into code. Questions? Contact us - real humans answer.

FAQ

How many credits to audit a single creator end-to-end?

Roughly 6-7 credits with audience sampling: 1 for the profile, 1 for recent posts, and 4 for follower pagination. Skip the follower sample and it drops to 2.

Can I run niche fit without TF-IDF?

Yes. Swap in a sentence-embedding model (MiniLM, E5-small) and store a vector per creator. Cosine-similarity at query time. Same architecture, different math.

How fresh is the data?

Every TikLiveAPI response is fetched live from TikTok at request time with sub-second latency on most endpoints. Cache TTL is yours to choose; the upstream is real-time.

Do I need to handle authentication on the React side?

No. Keep the X-Api-Key header server-side in your FastAPI gateway. The browser only ever talks to your gateway, never directly to api.tikliveapi.com.

Can I support Instagram and YouTube creators too?

The architecture is portable - swap the seed-sourcing and enrichment workers per platform. TikLiveAPI covers the TikTok side; other platforms need their own data sources but the Postgres schema, scoring, and UI stay the same.

Build with the TikTok API

Ready to put what you read into code? Try our endpoints live or grab the full reference.

Open Playground Read Documentation