Influencer marketing has graduated from a nice-to-have line item into a measurable acquisition channel. The problem: finding the right creators is still painful. Scrolling TikTok for hours, copy-pasting handles into a spreadsheet, eyeballing engagement rates - this does not scale, and it certainly does not survive a quarterly campaign cadence where you need 50 net-new candidates every week. Agency founders feel it as a margin squeeze; in-house marketing engineers feel it as a Jira ticket that never closes.
This guide walks through building a working influencer discovery tool on top of the TikLiveAPI endpoints. By the end you will have a system that ingests seed hashtags, fans out into candidate creators, enriches each profile, scores niche fit, estimates audience quality, and lets a non-technical user filter and export the result as CSV. Stack is Python for the workers, Postgres for storage, Redis for the queue, and a thin React UI on top. Every code sample is production-shaped, not pseudocode.
The end product is a single-page app where a campaign manager picks a niche (skincare, gym, finance, gaming), a follower tier (10K-100K, 100K-1M, 1M+), a region (ISO-2 country code), an optional language, and an engagement-rate threshold. They click Search. Within seconds they get a ranked list of creators with avatar, handle, follower count, ER%, last-30-day post cadence, niche-fit score, and an estimated audience-quality score.
From the result table they can shortlist, push to outreach status (Pending, Contacted, Replied, Booked, Declined), or export to CSV. Saved searches re-run on a schedule and surface only new candidates, so the team treats discovery as an inbox rather than a chore. Done right, this replaces the "intern with a spreadsheet" pattern that 80% of agencies still run.
Five components, all boring on purpose:
creators, creator_posts, niches, searches, search_results, outreach.X-Api-Key from your TikLiveAPI dashboard.The reason the workers fan out instead of running synchronously inside the request is rate-limit headroom. Discovery jobs typically chew through hundreds of API calls per search; a request-scoped flow would time out long before completion. Push jobs into Redis, return a job ID, poll from the UI. The standard worker pattern is well-trodden and your future self will thank you.
You need a way to discover creators that did not already exist in your database. Two endpoints carry the load: /challenge-posts/ for hashtag fan-out and /search-video/ for keyword fan-out.
Workflow: a niche has 10-20 seed terms ("skincare routine", "drugstore makeup", "retinol", "korean beauty"). For each term, hit /search-video/ with sort_by=2 (date posted) and publish_time=30 (last month), then collect every author.uniqueId from the returned videos. Same for hashtag-based seeds via /challenge-posts/, which accepts a region filter using ISO-2 codes from /region-list/.
import os, requests, time
BASE = "https://api.tikliveapi.com"
HEADERS = {"X-Api-Key": os.environ["TIKLIVEAPI_KEY"]}
def search_seed_creators(keyword, region="US", pages=3):
seen = set()
cursor = 0
for _ in range(pages):
r = requests.get(
f"{BASE}/search-video/",
params={
"keyword": keyword,
"count": 30,
"cursor": cursor,
"publish_time": 30,
"sort_by": 2,
"region": region,
},
headers=HEADERS,
timeout=30,
)
data = r.json()
for v in data.get("videos", []):
author = v.get("author") or {}
handle = author.get("uniqueId") or author.get("unique_id")
if handle:
seen.add(handle)
if not data.get("hasMore"):
break
cursor = data.get("cursor", 0)
time.sleep(0.2)
return seen
Note the camelCase hasMore on paginated responses - this trips up engineers expecting snake_case across the board. Enqueue each new handle as a Redis job tagged enrich:{handle}. Deduplicate aggressively: a popular hashtag will surface the same 20 creators on every run, so a Bloom filter on seen handles saves real money.
For every seeded handle, the enrichment worker runs three calls. First, /userinfo-by-username/ to get follower count, video count, signature (bio), verified flag, and the user.id (numeric). Second, /user-posts/ with count=30 to grab the last 30 posts for engagement-rate math. Third, optionally, /userid/ if you only have a handle and need the numeric ID separately.
def enrich(handle):
info = requests.get(
f"{BASE}/userinfo-by-username/",
params={"username": handle},
headers=HEADERS, timeout=30,
).json()
user = info.get("user", {})
stats = info.get("stats", {})
userid = user.get("id")
posts = requests.get(
f"{BASE}/user-posts/",
params={"userid": userid, "count": 30, "cursor": 0},
headers=HEADERS, timeout=30,
).json()
videos = posts.get("videos", [])
plays = sum(v.get("play_count", 0) for v in videos) or 1
likes = sum(v.get("digg_count", 0) for v in videos)
comments = sum(v.get("comment_count", 0) for v in videos)
shares = sum(v.get("share_count", 0) for v in videos)
er = (likes + comments + shares) / plays
return {
"handle": user.get("uniqueId"),
"userid": userid,
"nickname": user.get("nickname"),
"signature": user.get("signature", ""),
"verified": user.get("verified", False),
"followers": stats.get("followerCount", 0),
"videos": stats.get("videoCount", 0),
"hearts": stats.get("heartCount", 0),
"er_30": round(er, 4),
"avg_views": plays // max(len(videos), 1),
"last_posts": videos,
}
Persist to creators. Note the camelCase counters - followerCount, heartCount, videoCount - and that /user-posts/ paginates via a numeric cursor plus a hasMore flag. If you also fetch comments on the top-performing post for sentiment analysis, the /post-comments/ response uses an id field for each comment (not cid) - small detail, two-hour debugging session if you miss it.
Follower count alone is not enough; a 500K travel influencer is the wrong fit for a B2B SaaS. Score every creator on textual similarity to your target niche so the table sorts by relevance, not vanity.
Build a corpus per creator by concatenating signature (bio) and the title/desc field of their last 30 posts. Build a niche corpus by concatenating 20-50 seed phrases that describe the niche. Run TF-IDF on the union, cosine-similarity each creator vector against the niche vector, store as niche_fit in [0, 1].
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def niche_score(creator_text, niche_text):
v = TfidfVectorizer(stop_words="english", ngram_range=(1, 2))
m = v.fit_transform([niche_text, creator_text])
return float(cosine_similarity(m[0:1], m[1:2])[0][0])
For multi-language niches, swap to a language-aware tokenizer or run a tiny embedding model (MiniLM is enough) instead of TF-IDF. The architecture stays identical - only the scorer changes. Weighting matters: bio text usually signals niche better than per-post titles, which can be clickbait. A 3:1 bio-to-posts weight is a sensible default.
Real engagement is partly a follower-quality problem. A creator with 800K followers and an inflated count from a 2023 bot wave will look great on the spreadsheet and disappoint on the campaign. To sanity-check, sample the follower list with /user-followers/ and compute an alive-rate.
Two non-obvious details: pagination on /user-followers/ uses a time parameter (a timestamp), not the cursor param the other list endpoints use. And /user-following/ returns the top key followings (plural with the trailing s) rather than following. Get this wrong and your worker silently drops rows.
def sample_followers(userid, target=200):
out, t = [], 0
while len(out) < target:
r = requests.get(
f"{BASE}/user-followers/",
params={"userid": userid, "count": 50, "time": t},
headers=HEADERS, timeout=30,
).json()
out.extend(r.get("followers", []))
if not r.get("hasMore"):
break
t = r.get("time", 0)
return out[:target]
def alive_rate(followers):
if not followers:
return 0.0
alive = 0
for f in followers:
# heuristics: has avatar, has any videos, not 0-following 0-followers
stats = f.get("stats", {})
if (f.get("avatarThumb") and stats.get("videoCount", 0) > 0
and stats.get("followingCount", 0) > 5):
alive += 1
return alive / len(followers)
A 200-follower sample is a noisy estimate but stable enough at scale; rolling it up across a creator's audience flags the worst offenders (alive_rate under 0.3 is a strong fake-follower signal). For a deeper treatment of why this matters, see our blog post on detecting fake follower waves; the same heuristics carry over.
The React side is intentionally thin. A filter panel, a sortable table, and a saved-searches sidebar. Resist the urge to put scoring logic in the browser - the API key has to stay server-side and the user does not want a 5-second client-side TF-IDF.
function FilterPanel({ value, onChange }) {
return (
<form onSubmit={(e) => { e.preventDefault(); onChange(value); }}>
<select name="tier" defaultValue={value.tier}>
<option value="nano">10K - 100K</option>
<option value="mid">100K - 1M</option>
<option value="macro">1M+</option>
</select>
<NicheMultiSelect value={value.niches} />
<RegionSelect value={value.region} />
<input type="number" name="erMin" step="0.001"
defaultValue={value.erMin} placeholder="ER >=" />
<button type="submit">Search</button>
</form>
);
}
On submit, the React app hits your FastAPI /search endpoint which queries Postgres - not TikLiveAPI - because all the heavy work happened during enrichment. Live API traffic only happens when a saved search refreshes or a user requests a single-handle deep-dive from the row context menu. That separation keeps the user experience snappy even when the worker fleet is busy.
Add a CSV export button that streams from the same query, plus an outreach panel that updates a status column per shortlisted creator. Saved searches are stored as JSON filter blobs; a nightly cron re-runs them, diffs against last run, and pings Slack with "12 new creators match Skincare-EU-Mid". Track outreach status with a simple state machine: Pending to Contacted to Replied to Booked, with a Declined terminal state.
For the team, this turns a 4-hour scrolling session into a 30-second filter change. For the agency owner, it turns a vague "we have a roster" into a queryable asset. Quarterly board decks pull from the same Postgres - no more screenshots from a SaaS dashboard you do not control.
You will burn credits fast if you re-enrich the same creators. Three layers of cache:
handle, store the /userinfo-by-username/ result. Refresh on demand from the UI if a user clicks "Re-fetch".Implement with Redis SETEX and a thin wrapper around your request function. Cache misses hit the API; cache hits do not. Add a single metrics counter for hit-rate and you will see it climb from 0% on day one to north of 70% within a week as your candidate database matures.
Every TikLiveAPI request costs one credit. For 1,000 audited creators:
/userinfo-by-username/ per creator = 1,000 credits/user-posts/ per creator = 1,000 credits/user-followers/ for audience sampling = 4,000 creditsTotal ~6,500 credits per 1,000 audited creators, before caching. With the 24h profile cache hitting on warm runs, the marginal cost of a re-search drops to roughly the new-creator delta - typically 10-15% of the cold-run cost. Check current pricing on the pricing page; new accounts get 100 free credits to prototype before committing.
Three rules you cannot skip:
Off-the-shelf influencer platforms are excellent if your needs match their schema. They fall short on three axes:
niche_score.The trade-off: you maintain the system. For an agency running more than 20 campaigns a year, the math favors building. For an agency running three, buy the SaaS. There is no shame in either path - just pick the one your team can actually staff.
The minimum viable version of this tool is a Python script plus a 200-line FastAPI service. You can have seed sourcing and enrichment working in an afternoon. Spin up an account, grab your API key, and start with the interactive playground to feel out the response shapes before wiring them into code. Questions? Contact us - real humans answer.
Roughly 6-7 credits with audience sampling: 1 for the profile, 1 for recent posts, and 4 for follower pagination. Skip the follower sample and it drops to 2.
Yes. Swap in a sentence-embedding model (MiniLM, E5-small) and store a vector per creator. Cosine-similarity at query time. Same architecture, different math.
Every TikLiveAPI response is fetched live from TikTok at request time with sub-second latency on most endpoints. Cache TTL is yours to choose; the upstream is real-time.
No. Keep the X-Api-Key header server-side in your FastAPI gateway. The browser only ever talks to your gateway, never directly to api.tikliveapi.com.
The architecture is portable - swap the seed-sourcing and enrichment workers per platform. TikLiveAPI covers the TikTok side; other platforms need their own data sources but the Postgres schema, scoring, and UI stay the same.
Ready to put what you read into code? Try our endpoints live or grab the full reference.