{:paths ["src" "resources"]
:deps {org.clojure/clojure {:mvn/version "1.11.3"}
clj-http/clj-http {:mvn/version "3.12.3"}
cheshire/cheshire {:mvn/version "5.13.0"}
org.clojure/data.csv {:mvn/version "1.1.0"}
org.clojure/core.async {:mvn/version "1.6.681"}
com.datomic/peer {:mvn/version "1.0.7075"}}
:aliases
{:uberjar {:replace-deps {com.github.seancorfield/depstar {:mvn/version "2.1.303"}}
:exec-fn hf.depstar/uberjar
:exec-args {:jar "target/tiktok-ingest.jar"
:main-class tiktok.ingest}}}}
Export your credentials so they never touch source control:
export TIKLIVE_API_KEY="sk_live_..."
## A tiny HTTP layer
Everything in the API uses `https://api.tikliveapi.com` as the base URL and authenticates via the `X-Api-Key` header. One credit per request, never expires. Wrap that once and you never have to think about it again.
(ns tiktok.client
(:require [clj-http.client :as http]
[cheshire.core :as json]))
(def base-url "https://api.tikliveapi.com")
(defn- api-key []
(or (System/getenv "TIKLIVE_API_KEY")
(throw (ex-info "TIKLIVE_API_KEY not set" {}))))
(defn GET
"Call a TikLiveAPI endpoint. path is e.g. \"/userinfo-by-username/\"."
[path query]
(let [{:keys [status body]}
(http/get (str base-url path)
{:headers {"X-Api-Key" (api-key)}
:query-params query
:throw-exceptions false
:as :string
:conn-timeout 5000
:socket-timeout 15000})]
(if (= 200 status)
(json/parse-string body true)
(throw (ex-info "TikLiveAPI error"
{:status status :path path :body body})))))
Parsing with `:keywordize-keys true` lets you destructure responses as Clojure maps. Note that the API mixes snake_case and camelCase across endpoints, so the keywords you destructure with mirror that exactly.
## Resolving a username to a user id
Most user-scoped endpoints take a numeric `userid`, not a handle. The /userid/ endpoint returns a flat object with a single string `id`:
(ns tiktok.users
(:require [tiktok.client :as c]))
(defn resolve-user-id [username]
(-> (c/GET "/userid/" {:username username})
:id))
;; REPL:
;; (resolve-user-id "tiktok")
;; => "107955"
## Fetching a profile with idiomatic destructuring
`/userinfo-by-username/` returns a nested shape: a `user` map (camelCase: `uniqueId`, `nickname`, `avatarLarger`, `secUid`, `verified`, `privateAccount`, `bioLink`) and a `stats` map (`followerCount`, `followingCount`, `videoCount`, `heartCount`, `diggCount`). Pull what you need with `:keys`:
(defn fetch-profile [username]
(let [{{:keys [id uniqueId nickname secUid verified avatarLarger bioLink]} :user
{:keys [followerCount followingCount videoCount heartCount]} :stats}
(c/GET "/userinfo-by-username/" {:username username})]
{:tiktok/id id
:tiktok/handle uniqueId
:tiktok/nickname nickname
:tiktok/sec-uid secUid
:tiktok/verified? verified
:tiktok/avatar avatarLarger
:tiktok/bio-link bioLink
:tiktok/followers followerCount
:tiktok/following followingCount
:tiktok/videos videoCount
:tiktok/hearts heartCount}))
If you already have the numeric id, hit `/userinfo-by-id/` with `userid` instead. Same nested shape, different lookup key.
## Pagination as a lazy sequence
The user-posts endpoint at /user-posts/ uses a `cursor` (string millisecond timestamp) and a `hasMore` flag. The items in `videos` are flat snake_case maps: `aweme_id`, `region`, `title`, `cover`, `play`, `wmplay`, `play_count`, `digg_count`, `comment_count`, `share_count`, `create_time`, plus a nested `author` and `music_info`. A lazy-seq turns the cursor walk into a single seqable value any downstream transducer can consume:
(defn user-posts
"Lazy seq of all posts by userid, paginated server-side."
([userid] (user-posts userid nil))
([userid cursor]
(lazy-seq
(let [{:keys [videos cursor hasMore]}
(c/GET "/user-posts/" (cond-> {:userid userid :count 30}
cursor (assoc :cursor cursor)))]
(concat videos
(when hasMore
(user-posts userid cursor)))))))
;; (->> (user-posts "107955")
;; (take 100)
;; (map (juxt :aweme_id :play_count :digg_count)))
Followers use a different pagination key. /user-followers/ paginates by `time` (unix seconds) instead of `cursor`, and items are snake_case (`id`, `unique_id`, `sec_uid`, `nickname`, `follower_count`, `aweme_count`):
(defn user-followers [userid]
(letfn [(step [t]
(lazy-seq
(let [{:keys [followers time hasMore]}
(c/GET "/user-followers/"
(cond-> {:userid userid :count 50}
t (assoc :time t)))]
(concat followers (when hasMore (step time))))))]
(step nil)))
The mirror endpoint /user-following/ has the same `time`+`hasMore` pagination, but the top key is `followings` (plural), not `following`. Easy to miss.
## Retries that compose
Network calls fail. Wrap `GET` with a backoff function rather than scattering try/catch through your transforms:
(defn with-retry
"Invoke f up to n times with exponential backoff (ms)."
[f {:keys [max-attempts base-ms] :or {max-attempts 5 base-ms 500}}]
(loop [attempt 1]
(let [result (try {:ok (f)}
(catch Exception e {:err e}))]
(cond
(:ok result) (:ok result)
(>= attempt max-attempts) (throw (:err result))
:else
(do (Thread/sleep (* base-ms (long (Math/pow 2 (dec attempt)))))
(recur (inc attempt)))))))
(defn safe-GET [path query]
(with-retry #(c/GET path query) {:max-attempts 5 :base-ms 500}))
Swap `c/GET` for `safe-GET` inside your pagination functions when you go to production.
## Concurrency: pmap and core.async
If you want to enrich a batch of usernames, `pmap` gives you cheap parallelism without leaving the seq world:
(defn enrich-handles [handles]
(->> handles
(pmap fetch-profile)
(remove nil?)))
`pmap` is fine until you need backpressure or rate limiting. The standard limit is 200 requests per minute (increasable on request, see /contact/), so a bounded `core.async` pipeline is a better fit for sustained throughput:
(require '[clojure.core.async :as a])
(defn pipeline-profiles
"Concurrently fetch profiles with bounded parallelism."
[handles parallelism]
(let [in (a/to-chan! handles)
out (a/chan 64)
xf (map fetch-profile)]
(a/pipeline-blocking parallelism out xf in)
(a/<!! (a/into [] out))))
`pipeline-blocking` is the right primitive here because HTTP calls block a thread. Tune `parallelism` so you stay under your rate budget.
## Persisting to CSV
`clojure.data.csv` works with seqs of vectors. Combine that with a transducer and you can stream millions of rows through constant memory:
(require '[clojure.data.csv :as csv]
'[clojure.java.io :as io])
(def post-cols
[:aweme_id :create_time :region :title
:play_count :digg_count :comment_count :share_count])
(defn dump-posts [userid out-path]
(with-open [w (io/writer out-path)]
(csv/write-csv w [(map name post-cols)])
(csv/write-csv w
(eduction
(map (apply juxt post-cols))
(user-posts userid)))))
The `eduction` keeps the lazy seq lazy across the CSV writer - rows are pulled, transformed, and written one at a time.
## Daily ingestion: cron + uberjar
Build a `-main` entry point that reads handles from a file and writes a dated CSV:
(ns tiktok.ingest
(:gen-class)
(:require [tiktok.users :as u]
[tiktok.client :as c]
[clojure.java.io :as io]))
(defn -main [& handles]
(doseq [h handles]
(let [uid (u/resolve-user-id h)
out (format "out/%s-%tF.csv" h (java.util.Date.))]
(io/make-parents out)
(dump-posts uid out)
(println "wrote" out))))
Build the jar with `clj -T:uberjar` (or your tool of choice) and schedule it from cron:
15 3 * * * /usr/bin/java -jar /opt/tiktok-ingest.jar tiktok charlidamelio
## Snapshotting into Datomic
Daily snapshots are where Datomic shines: every transaction is a point in time, so you get follower-growth history without rolling your own audit table. Define a minimal schema once:
(def schema
[{:db/ident :tiktok/id :db/valueType :db.type/string
:db/cardinality :db.cardinality/one :db/unique :db.unique/identity}
{:db/ident :tiktok/handle :db/valueType :db.type/string
:db/cardinality :db.cardinality/one :db/index true}
{:db/ident :tiktok/followers :db/valueType :db.type/long
:db/cardinality :db.cardinality/one}
{:db/ident :tiktok/videos :db/valueType :db.type/long
:db/cardinality :db.cardinality/one}
{:db/ident :tiktok/hearts :db/valueType :db.type/long
:db/cardinality :db.cardinality/one}])
Then transact a daily snapshot. Because `:tiktok/id` is `:db.unique/identity`, the same entity gets updated across days while history is preserved automatically:
(require '[datomic.api :as d])
(defn snapshot! [conn handles]
@(d/transact conn
(vec (pmap u/fetch-profile handles))))
;; Query growth across all of history:
;; (d/q '[:find ?handle ?inst ?followers
;; :where
;; [?e :tiktok/handle ?handle]
;; [?e :tiktok/followers ?followers ?tx]
;; [?tx :db/txInstant ?inst]]
;; (d/history (d/db conn)))
That single query gives you the full follower timeline per handle - no extra columns, no extra writes.
## Other endpoints worth knowing
The same pattern (GET + destructure + lazy-seq) applies to the rest of the surface. A few worth highlighting:
- /post-detail/ returns a flat snake_case object with `aweme_id`, `play`, `wmplay`, and `hdplay` for watermark-free downloads (no `data{}` wrapper).
- /post-comments/ uses `comments[]` with `id` (not `cid`), `text`, `digg_count`, and `reply_total`; pair with `/post-comment-replies/` for threaded replies.
- `/search-video/` accepts `publish_time` (0/1/7/30/90/180) and `sort_by` (0 relevance / 1 likes / 2 date).
- `/region-list/` returns the supported region codes you can pass to search and challenge endpoints.
You can poke at any of them interactively on the playground before wiring them into Clojure.
## FAQ
**Do I need a TikTok login?** No. Authentication is just the `X-Api-Key` header. You never hand over a TikTok password.
**How are credits billed?** Pay-as-you-go: one request equals one credit, credits never expire, no subscription. See /pricing/.
**Will `pmap` get me rate-limited?** Possibly. The default ceiling is 200 requests per minute. Use `core.async/pipeline-blocking` with a bounded parallelism, or contact support to raise the limit.
**Why lazy-seq instead of reduce?** Lazy seqs let downstream consumers (transducers, CSV writers, `core.async` channels) drive backpressure. You only pay for the pages you actually consume.
**Can I refund unused credits?** Yes, as long as no credits from that purchase have been used. See the blog or reach out via /contact/.
That is the whole loop: REPL-driven exploration, lazy pagination, retry-wrapped HTTP, bounded concurrency, CSV or Datomic at the tail. Add endpoints by copy-pasting one function. The shape of your pipeline stays the same. Ready to put what you read into code? Try our endpoints live or grab the full reference.