How to Scrape TikTok User Data with Clojure and REPL

Q: Do I need a TikTok login?

No. Authentication is just the X-Api-Key header. You never hand over a TikTok password.

By TikLiveAPI Team · Published on May 29, 2026 · Updated on July 9, 2026

How to Scrape TikTok User Data with Clojure and REPL

Why Clojure for TikTok scraping

If you already think in maps, sequences, and transducers, a TikTok scraping job is a natural fit for Clojure. The data coming back from TikLiveAPI is plain JSON, which Cheshire decodes straight into idiomatic nested maps. The REPL lets you iterate on a single endpoint call until the shape is exactly what your downstream pipeline expects. Immutable values mean a paginated cursor walk is a pure recursion, not a tangle of mutable state. And when you are ready to scale, the same code can fan out across cores with pmap or push through a core.async pipeline without rewriting anything.

This tutorial walks through building a small but production-shaped pipeline against the TikLiveAPI service: resolve a username to a numeric user id, fetch the user profile, paginate posts and followers, retry on transient failures, persist to CSV, and finally transact daily snapshots into Datomic.

Prerequisites

You need:

Clojure 1.11 or newer (for update-keys, parse-long, and friends)
A deps.edn project
A TikLiveAPI key from your profile (sign up on the registration page, top up with a pay-as-you-go credit package)

Create deps.edn:

{:paths ["src" "resources"]
 :deps  {org.clojure/clojure       {:mvn/version "1.11.3"}
         clj-http/clj-http         {:mvn/version "3.12.3"}
         cheshire/cheshire         {:mvn/version "5.13.0"}
         org.clojure/data.csv      {:mvn/version "1.1.0"}
         org.clojure/core.async    {:mvn/version "1.6.681"}
         com.datomic/peer          {:mvn/version "1.0.7075"}}
 :aliases
 {:uberjar {:replace-deps {com.github.seancorfield/depstar {:mvn/version "2.1.303"}}
            :exec-fn      hf.depstar/uberjar
            :exec-args    {:jar "target/tiktok-ingest.jar"
                           :main-class tiktok.ingest}}}}

Export your credentials so they never touch source control:

export TIKLIVE_API_KEY="sk_live_..."

A tiny HTTP layer

Everything in the API uses https://api.tikliveapi.com as the base URL and authenticates via the X-Api-Key header. One credit per request, never expires. Wrap that once and you never have to think about it again.

(ns tiktok.client
  (:require [clj-http.client :as http]
            [cheshire.core   :as json]))

(def base-url "https://api.tikliveapi.com")

(defn- api-key []
  (or (System/getenv "TIKLIVE_API_KEY")
      (throw (ex-info "TIKLIVE_API_KEY not set" {}))))

(defn GET
  "Call a TikLiveAPI endpoint. path is e.g. \"/userinfo-by-username/\"."
  [path query]
  (let [{:keys [status body]}
        (http/get (str base-url path)
                  {:headers          {"X-Api-Key" (api-key)}
                   :query-params     query
                   :throw-exceptions false
                   :as               :string
                   :conn-timeout     5000
                   :socket-timeout   15000})]
    (if (= 200 status)
      (json/parse-string body true)
      (throw (ex-info "TikLiveAPI error"
                      {:status status :path path :body body})))))

Parsing with :keywordize-keys true lets you destructure responses as Clojure maps. Note that the API mixes snake_case and camelCase across endpoints, so the keywords you destructure with mirror that exactly.

Resolving a username to a user id

Most user-scoped endpoints take a numeric userid, not a handle. The /userid/ endpoint returns a flat object with a single string id:

(ns tiktok.users
  (:require [tiktok.client :as c]))

(defn resolve-user-id [username]
  (-> (c/GET "/userid/" {:username username})
      :id))

;; REPL:
;; (resolve-user-id "tiktok")
;; => "107955"

Fetching a profile with idiomatic destructuring

/userinfo-by-username/ returns a nested shape: a user map (camelCase: uniqueId, nickname, avatarLarger, secUid, verified, privateAccount, bioLink) and a stats map (followerCount, followingCount, videoCount, heartCount, diggCount). Pull what you need with :keys:

(defn fetch-profile [username]
  (let [{{:keys [id uniqueId nickname secUid verified avatarLarger bioLink]} :user
         {:keys [followerCount followingCount videoCount heartCount]}        :stats}
        (c/GET "/userinfo-by-username/" {:username username})]
    {:tiktok/id        id
     :tiktok/handle    uniqueId
     :tiktok/nickname  nickname
     :tiktok/sec-uid   secUid
     :tiktok/verified? verified
     :tiktok/avatar    avatarLarger
     :tiktok/bio-link  bioLink
     :tiktok/followers followerCount
     :tiktok/following followingCount
     :tiktok/videos    videoCount
     :tiktok/hearts    heartCount}))

If you already have the numeric id, hit /userinfo-by-id/ with userid instead. Same nested shape, different lookup key.

Pagination as a lazy sequence

The user-posts endpoint at /user-posts/ uses a cursor (string millisecond timestamp) and a hasMore flag. The items in videos are flat snake_case maps: aweme_id, region, title, cover, play, wmplay, play_count, digg_count, comment_count, share_count, create_time, plus a nested author and music_info. A lazy-seq turns the cursor walk into a single seqable value any downstream transducer can consume:

(defn user-posts
  "Lazy seq of all posts by userid, paginated server-side."
  ([userid] (user-posts userid nil))
  ([userid cursor]
   (lazy-seq
     (let [{:keys [videos cursor hasMore]}
           (c/GET "/user-posts/" (cond-> {:userid userid :count 30}
                                   cursor (assoc :cursor cursor)))]
       (concat videos
               (when hasMore
                 (user-posts userid cursor)))))))

;; (->> (user-posts "107955")
;;      (take 100)
;;      (map (juxt :aweme_id :play_count :digg_count)))

Followers use a different pagination key. /user-followers/ paginates by time (unix seconds) instead of cursor, and items are snake_case (id, unique_id, sec_uid, nickname, follower_count, aweme_count):

(defn user-followers [userid]
  (letfn [(step [t]
            (lazy-seq
              (let [{:keys [followers time hasMore]}
                    (c/GET "/user-followers/"
                           (cond-> {:userid userid :count 50}
                             t (assoc :time t)))]
                (concat followers (when hasMore (step time))))))]
    (step nil)))

The mirror endpoint /user-following/ has the same time+hasMore pagination, but the top key is followings (plural), not following. Easy to miss.

Retries that compose

Network calls fail. Wrap GET with a backoff function rather than scattering try/catch through your transforms:

(defn with-retry
  "Invoke f up to n times with exponential backoff (ms)."
  [f {:keys [max-attempts base-ms] :or {max-attempts 5 base-ms 500}}]
  (loop [attempt 1]
    (let [result (try {:ok (f)}
                      (catch Exception e {:err e}))]
      (cond
        (:ok result)              (:ok result)
        (>= attempt max-attempts) (throw (:err result))
        :else
        (do (Thread/sleep (* base-ms (long (Math/pow 2 (dec attempt)))))
            (recur (inc attempt)))))))

(defn safe-GET [path query]
  (with-retry #(c/GET path query) {:max-attempts 5 :base-ms 500}))

Swap c/GET for safe-GET inside your pagination functions when you go to production. For a deeper treatment of backoff, budgets, and queueing, see the guide on building resilient pipelines around TikTok API rate limits.

Concurrency: pmap and core.async

If you want to enrich a batch of usernames, pmap gives you cheap parallelism without leaving the seq world:

(defn enrich-handles [handles]
  (->> handles
       (pmap fetch-profile)
       (remove nil?)))

pmap is fine until you need backpressure or rate limiting. The standard limit is 200 requests per minute (increasable on request via the contact page), so a bounded core.async pipeline is a better fit for sustained throughput:

(require '[clojure.core.async :as a])

(defn pipeline-profiles
  "Concurrently fetch profiles with bounded parallelism."
  [handles parallelism]
  (let [in  (a/to-chan! handles)
        out (a/chan 64)
        xf  (map fetch-profile)]
    (a/pipeline-blocking parallelism out xf in)
    (a/<!! (a/into [] out))))

pipeline-blocking is the right primitive here because HTTP calls block a thread. Tune parallelism so you stay under your rate budget.

Persisting to CSV

clojure.data.csv works with seqs of vectors. Combine that with a transducer and you can stream millions of rows through constant memory:

(require '[clojure.data.csv :as csv]
         '[clojure.java.io  :as io])

(def post-cols
  [:aweme_id :create_time :region :title
   :play_count :digg_count :comment_count :share_count])

(defn dump-posts [userid out-path]
  (with-open [w (io/writer out-path)]
    (csv/write-csv w [(map name post-cols)])
    (csv/write-csv w
      (eduction
        (map (apply juxt post-cols))
        (user-posts userid)))))

The eduction keeps the lazy seq lazy across the CSV writer - rows are pulled, transformed, and written one at a time.

Daily ingestion: cron + uberjar

Build a -main entry point that reads handles from a file and writes a dated CSV:

(ns tiktok.ingest
  (:gen-class)
  (:require [tiktok.users :as u]
            [tiktok.client :as c]
            [clojure.java.io :as io]))

(defn -main [& handles]
  (doseq [h handles]
    (let [uid (u/resolve-user-id h)
          out (format "out/%s-%tF.csv" h (java.util.Date.))]
      (io/make-parents out)
      (dump-posts uid out)
      (println "wrote" out))))

Build the jar with clj -T:uberjar (or your tool of choice) and schedule it from cron:

15 3 * * * /usr/bin/java -jar /opt/tiktok-ingest.jar tiktok charlidamelio

Snapshotting into Datomic

Daily snapshots are where Datomic shines: every transaction is a point in time, so you get follower-growth history without rolling your own audit table. Define a minimal schema once:

(def schema
  [{:db/ident :tiktok/id        :db/valueType :db.type/string
    :db/cardinality :db.cardinality/one :db/unique :db.unique/identity}
   {:db/ident :tiktok/handle    :db/valueType :db.type/string
    :db/cardinality :db.cardinality/one :db/index true}
   {:db/ident :tiktok/followers :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one}
   {:db/ident :tiktok/videos    :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one}
   {:db/ident :tiktok/hearts    :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one}])

Then transact a daily snapshot. Because :tiktok/id is :db.unique/identity, the same entity gets updated across days while history is preserved automatically:

(require '[datomic.api :as d])

(defn snapshot! [conn handles]
  @(d/transact conn
     (vec (pmap u/fetch-profile handles))))

;; Query growth across all of history:
;; (d/q '[:find ?handle ?inst ?followers
;;        :where
;;        [?e :tiktok/handle ?handle]
;;        [?e :tiktok/followers ?followers ?tx]
;;        [?tx :db/txInstant ?inst]]
;;      (d/history (d/db conn)))

That single query gives you the full follower timeline per handle - no extra columns, no extra writes.

Other endpoints worth knowing

The same pattern (GET + destructure + lazy-seq) applies to the rest of the surface. A few worth highlighting:

/post-detail/ returns a flat snake_case object with aweme_id, play, wmplay, and hdplay for watermark-free downloads (no data{} wrapper).
/post-comments/ uses comments[] with id (not cid), text, digg_count, and reply_total; pair with /post-comment-replies/ for threaded replies.
/search-video/ accepts publish_time (0/1/7/30/90/180) and sort_by (0 relevance / 1 likes / 2 date).
/region-list/ returns the supported region codes you can pass to search and challenge endpoints.

You can poke at any of them interactively on the playground before wiring them into Clojure. And if your team is polyglot, the same pipeline is covered for Go with goroutines and plain PHP.

FAQ

Do I need a TikTok login? No. Authentication is just the X-Api-Key header. You never hand over a TikTok password.

How are credits billed? Pay-as-you-go: one request equals one credit, credits never expire, no subscription. See current credit pricing.

Will pmap get me rate-limited? Possibly. The default ceiling is 200 requests per minute. Use core.async/pipeline-blocking with a bounded parallelism, or contact support to raise the limit.

Why lazy-seq instead of reduce? Lazy seqs let downstream consumers (transducers, CSV writers, core.async channels) drive backpressure. You only pay for the pages you actually consume.

Can I refund unused credits? Yes, as long as no credits from that purchase have been used. See the blog or reach out to the support team.

That is the whole loop: REPL-driven exploration, lazy pagination, retry-wrapped HTTP, bounded concurrency, CSV or Datomic at the tail. Add endpoints by copy-pasting one function. The shape of your pipeline stays the same.