If you have ever run a TikTok data pipeline at scale, you know the bottleneck is rarely the network. It is the garbage collector pausing your worker pool, the dynamic typing that lets a missing JSON field crash you at 3 AM, and the deployment story of shipping an interpreter plus a dependency tree to every node. Rust solves all three. You get predictable throughput, a compile-time guarantee that your response structs match what you actually use, and a single static binary you can copy onto any Linux box without thinking about runtimes.
This tutorial walks through building a production-ready TikTok user data scraper against the TikLiveAPI service. We will fetch user profiles, resolve numeric user IDs, paginate through posts, handle rate-limited retries, run requests concurrently with a semaphore, and finish with a real example that tracks daily follower counts and writes them to CSV. The base URL throughout is https://api.tikliveapi.com, and authentication is a single X-Api-Key header.
You need Rust 1.75 or newer (any recent stable toolchain works), Cargo, and an API key. New accounts get 100 free credits the moment they verify their email, so you can build this entire tutorial on the free tier before topping up at /pricing/. Sign in at /profile/ to copy your key.
The crates we will use:
reqwest with rustls-tls for HTTPS without OpenSSL headachesserde and serde_json for typed JSONtokio as the async runtimetokio-stream and futures for pagination streamsanyhow for ergonomic error handling in application codecsv for the daily tracker exampleCreate a new binary crate and add the dependencies. The rustls-tls feature on reqwest avoids linking against system OpenSSL, which keeps your single-binary deployment story clean.
[package]
name = "tiktok-scraper"
version = "0.1.0"
edition = "2021"
[dependencies]
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
tokio-stream = "0.1"
futures = "0.3"
anyhow = "1"
csv = "1.3"
chrono = { version = "0.4", features = ["clock"] }
Never hardcode credentials. Read the key once at startup and pass an Arc<Client> around. The Client already pools connections internally, so cloning the Arc is cheap.
use std::env;
use std::sync::Arc;
use reqwest::{Client, header};
pub struct TikLiveClient {
http: Client,
api_key: String,
}
impl TikLiveClient {
pub fn from_env() -> anyhow::Result<Arc<Self>> {
let api_key = env::var("TIKLIVE_API_KEY")
.map_err(|_| anyhow::anyhow!("TIKLIVE_API_KEY not set"))?;
let http = Client::builder()
.user_agent("tiktok-scraper-rs/0.1")
.timeout(std::time::Duration::from_secs(30))
.build()?;
Ok(Arc::new(Self { http, api_key }))
}
fn auth_headers(&self) -> header::HeaderMap {
let mut h = header::HeaderMap::new();
h.insert("X-Api-Key", self.api_key.parse().unwrap());
h
}
}
Export the key in your shell before running: export TIKLIVE_API_KEY="your_key_here".
This is where Rust shines compared to dynamic scripting. We model the real response shapes from TikLiveAPI as typed structs. The TikTok payloads mix camelCase counters (followerCount, hasMore) with snake_case fields on flat objects (create_time, play_count). Use #[serde(rename = "...")] when you want idiomatic Rust naming on a non-matching wire format.
use serde::Deserialize;
#[derive(Debug, Deserialize)]
pub struct UserInfoResponse {
pub user: UserProfile,
pub stats: UserStats,
}
#[derive(Debug, Deserialize)]
pub struct UserProfile {
pub id: String,
#[serde(rename = "uniqueId")]
pub unique_id: String,
pub nickname: String,
pub signature: String,
pub verified: bool,
#[serde(rename = "secUid")]
pub sec_uid: String,
#[serde(rename = "privateAccount")]
pub private_account: bool,
}
#[derive(Debug, Deserialize)]
pub struct UserStats {
#[serde(rename = "followerCount")]
pub follower_count: u64,
#[serde(rename = "followingCount")]
pub following_count: u64,
#[serde(rename = "heartCount")]
pub heart_count: u64,
#[serde(rename = "videoCount")]
pub video_count: u64,
#[serde(rename = "diggCount")]
pub digg_count: u64,
}
The /userinfo-by-username/ endpoint takes a single required username query parameter and returns the nested user + stats shape we just modelled. With reqwest's .json() method, deserialization happens automatically.
const BASE_URL: &str = "https://api.tikliveapi.com";
impl TikLiveClient {
pub async fn user_info(&self, username: &str) -> anyhow::Result<UserInfoResponse> {
let url = format!("{}/userinfo-by-username/", BASE_URL);
let resp = self.http
.get(&url)
.headers(self.auth_headers())
.query(&[("username", username)])
.send()
.await?
.error_for_status()?
.json::<UserInfoResponse>()
.await?;
Ok(resp)
}
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let client = TikLiveClient::from_env()?;
let info = client.user_info("tiktok").await?;
println!("{} - {} followers", info.user.unique_id, info.stats.follower_count);
Ok(())
}
Most downstream endpoints (posts, followers, following) take the numeric userid, not the handle. The /userid/ endpoint returns a flat single-field object. One struct, one field, done.
#[derive(Debug, Deserialize)]
pub struct UserIdResponse {
pub id: String,
}
impl TikLiveClient {
pub async fn resolve_userid(&self, username: &str) -> anyhow::Result<String> {
let url = format!("{}/userid/", BASE_URL);
let resp: UserIdResponse = self.http
.get(&url)
.headers(self.auth_headers())
.query(&[("username", username)])
.send()
.await?
.error_for_status()?
.json()
.await?;
Ok(resp.id)
}
}
The /user-posts/ endpoint returns a videos array plus a cursor string and a hasMore boolean. Notice that hasMore needs an explicit rename, while create_time and play_count inside each video are already snake_case and map automatically. Use #[serde(default)] on fields that might be missing so a stray null does not blow up your run.
#[derive(Debug, Deserialize)]
pub struct PostsResponse {
#[serde(default)]
pub videos: Vec<Video>,
#[serde(default)]
pub cursor: String,
#[serde(rename = "hasMore", default)]
pub has_more: bool,
}
#[derive(Debug, Deserialize)]
pub struct Video {
pub video_id: String,
pub title: String,
pub create_time: i64,
pub play_count: u64,
pub digg_count: u64,
pub comment_count: u64,
pub share_count: u64,
pub duration: u32,
}
impl TikLiveClient {
pub async fn user_posts(&self, userid: &str, count: u32, cursor: &str)
-> anyhow::Result<PostsResponse>
{
let url = format!("{}/user-posts/", BASE_URL);
let count_str = count.to_string();
let resp = self.http
.get(&url)
.headers(self.auth_headers())
.query(&[("userid", userid), ("count", &count_str), ("cursor", cursor)])
.send()
.await?
.error_for_status()?
.json::<PostsResponse>()
.await?;
Ok(resp)
}
}
The maximum count per call is 35. If you ask for 100, you still only get 35 back. Paginate.
Manually looping with a mutable cursor works, but exposing the paginator as an async Stream lets callers use .take(n), .filter(), and .try_collect() without rewriting loop logic. The async_stream crate is one option; here is a minimal hand-rolled version using futures::stream::unfold.
use futures::stream::{self, Stream, StreamExt};
pub fn all_posts(
client: Arc<TikLiveClient>,
userid: String,
) -> impl Stream<Item = anyhow::Result<Video>> {
let init = (client, userid, String::from("0"), true);
stream::unfold(init, |(client, userid, cursor, has_more)| async move {
if !has_more { return None; }
match client.user_posts(&userid, 35, &cursor).await {
Ok(page) => {
let next = (client, userid, page.cursor, page.has_more);
Some((Ok(page.videos), next))
}
Err(e) => Some((Err(e), (client.clone(), String::new(), String::new(), false))),
}
})
.flat_map(|res| match res {
Ok(vids) => stream::iter(vids.into_iter().map(Ok).collect::<Vec<_>>()),
Err(e) => stream::iter(vec![Err(e)]),
})
}
For followers and following, swap the field name. The /user-followers/ endpoint paginates with time (a timestamp), not cursor, and /user-following/ uses the same pagination key plus a top-level followings array (note the plural with the s). A correct following struct looks like this:
#[derive(Debug, Deserialize)]
pub struct FollowingResponse {
#[serde(default)]
pub followings: Vec<serde_json::Value>,
#[serde(default)]
pub total: u64,
#[serde(default)]
pub time: i64,
#[serde(rename = "hasMore", default)]
pub has_more: bool,
}
Networks fail. Upstream rate limits fire. Wrap your fetches in an exponential backoff helper. The pattern is small enough that you do not need a crate, although backoff or tokio-retry work if you prefer.
use std::time::Duration;
use tokio::time::sleep;
pub async fn with_retry<F, Fut, T>(mut op: F) -> anyhow::Result<T>
where
F: FnMut() -> Fut,
Fut: std::future::Future<Output = anyhow::Result<T>>,
{
let mut delay = Duration::from_millis(500);
for attempt in 1..=5 {
match op().await {
Ok(v) => return Ok(v),
Err(e) if attempt == 5 => return Err(e),
Err(_) => {
sleep(delay).await;
delay *= 2;
}
}
}
unreachable!()
}
// usage
let posts = with_retry(|| client.user_posts(&userid, 35, "0")).await?;
Spawning a thousand tokio::spawn tasks against an HTTP API is the fastest way to get your traffic throttled. Bound parallelism with a tokio::sync::Semaphore. The TikLiveAPI service averages 750 ms per response, so a window of 10-20 concurrent requests gets you most of the throughput without overwhelming anything.
use tokio::sync::Semaphore;
use futures::stream::FuturesUnordered;
pub async fn batch_user_info(
client: Arc<TikLiveClient>,
usernames: Vec<String>,
) -> Vec<anyhow::Result<UserInfoResponse>> {
let sem = Arc::new(Semaphore::new(15));
let mut tasks = FuturesUnordered::new();
for u in usernames {
let client = client.clone();
let permit = sem.clone().acquire_owned().await.unwrap();
tasks.push(tokio::spawn(async move {
let _p = permit;
client.user_info(&u).await
}));
}
let mut out = Vec::new();
while let Some(joined) = tasks.next().await {
out.push(joined.unwrap_or_else(|e| Err(anyhow::anyhow!(e))));
}
out
}
Time to tie it all together. This binary takes a list of usernames, hits /userinfo-by-username/ for each in parallel, and appends one row per user per day to a CSV. Drop it in cron at midnight and you have a longitudinal dataset.
use chrono::Utc;
use std::fs::OpenOptions;
#[derive(serde::Serialize)]
struct Row<'a> {
date: String,
username: &'a str,
follower_count: u64,
heart_count: u64,
video_count: u64,
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let client = TikLiveClient::from_env()?;
let targets: Vec<String> = std::env::args().skip(1).collect();
if targets.is_empty() {
anyhow::bail!("usage: tracker username1 username2 ...");
}
let results = batch_user_info(client, targets.clone()).await;
let file = OpenOptions::new()
.create(true).append(true).open("followers.csv")?;
let mut wtr = csv::Writer::from_writer(file);
let today = Utc::now().format("%Y-%m-%d").to_string();
for (username, result) in targets.iter().zip(results) {
match result {
Ok(info) => {
wtr.serialize(Row {
date: today.clone(),
username,
follower_count: info.stats.follower_count,
heart_count: info.stats.heart_count,
video_count: info.stats.video_count,
})?;
}
Err(e) => eprintln!("skip {}: {}", username, e),
}
}
wtr.flush()?;
Ok(())
}
Run it: cargo run --release -- tiktok charlidamelio mrbeast. Each invocation appends rows. After thirty days you have a clean dataset to load into Polars or DuckDB for growth analysis.
The same struct-plus-async pattern extends to every endpoint in the service. For individual videos, /post-detail/ returns a flat snake_case object with three download URLs side by side: play is no-watermark SD, wmplay is watermarked, and hdplay is no-watermark HD, with matching size, wm_size, and hd_size fields. The /post-comments/ endpoint returns each comment with an id field (not cid like some legacy TikTok wrappers use), and replies live behind a separate video_id + comment_id call. Hashtag lookups via /challenge-info-name/ return the hashtag string in cha_name, not name, even though the input parameter is called name. Read the full schema for each of the 37 endpoints in the documentation, kick the tyres without writing code in the playground, or grab the Postman collection if you want a starting point in another tool.
Hyper is the lower-level engine reqwest is built on. Use hyper when you need raw control over connection lifetimes, custom HTTP/2 settings, or are writing infrastructure. For a scraping client where you want JSON deserialization, connection pooling, and TLS to just work, reqwest is the right altitude.
You can apply #[serde(rename_all = "camelCase")] at the struct level for any object that is entirely camelCase, like the stats block. Then field names like follower_count in Rust map to followerCount on the wire automatically. Mixed objects still need per-field renames, but most response sub-objects are internally consistent.
The API returns standard HTTP status codes. A 429 surfaces as an error from .error_for_status(), which your retry wrapper catches and backs off on. The dashboard itself does not proxy traffic or deduct credits, so you are interacting directly with the upstream service and standard HTTP semantics apply. Monitor your remaining balance on /profile/.
Yes, and that is the headline advantage. Build with --target x86_64-unknown-linux-musl, copy the resulting binary into a scratch or distroless image, and you have a sub-10 MB container with no runtime dependencies. The rustls-tls feature on reqwest is what makes this possible because there is no system OpenSSL to link against.
Define a struct for the response shape and a method on TikLiveClient that hits the relevant path. The pattern is identical for every endpoint. Browse categories like search, music, and challenge in the blog for worked examples, or open a ticket via /contact/ if you want guidance on a specific pipeline.
Ready to put what you read into code? Try our endpoints live or grab the full reference.