How to Scrape TikTok User Data with Clojure and REPL

Published on May 29, 2026

## Why Clojure for TikTok scraping If you already think in maps, sequences, and transducers, a TikTok scraping job is a natural fit for Clojure. The data coming back from TikLiveAPI is plain JSON, which Cheshire decodes straight into idiomatic nested maps. The REPL lets you iterate on a single endpoint call until the shape is exactly what your downstream pipeline expects. Immutable values mean a paginated cursor walk is a pure recursion, not a tangle of mutable state. And when you are ready to scale, the same code can fan out across cores with `pmap` or push through a `core.async` pipeline without rewriting anything. This tutorial walks through building a small but production-shaped pipeline against the TikLiveAPI service: resolve a username to a numeric user id, fetch the user profile, paginate posts and followers, retry on transient failures, persist to CSV, and finally transact daily snapshots into Datomic. ## Prerequisites You need: - Clojure 1.11 or newer (for `update-keys`, `parse-long`, and friends) - A `deps.edn` project - A TikLiveAPI key from your profile (sign up on /register/, top up on /pricing/) Create `deps.edn`:
{:paths ["src" "resources"]
 :deps  {org.clojure/clojure       {:mvn/version "1.11.3"}
         clj-http/clj-http         {:mvn/version "3.12.3"}
         cheshire/cheshire         {:mvn/version "5.13.0"}
         org.clojure/data.csv      {:mvn/version "1.1.0"}
         org.clojure/core.async    {:mvn/version "1.6.681"}
         com.datomic/peer          {:mvn/version "1.0.7075"}}
 :aliases
 {:uberjar {:replace-deps {com.github.seancorfield/depstar {:mvn/version "2.1.303"}}
            :exec-fn      hf.depstar/uberjar
            :exec-args    {:jar "target/tiktok-ingest.jar"
                           :main-class tiktok.ingest}}}}
Export your credentials so they never touch source control:
export TIKLIVE_API_KEY="sk_live_..."
## A tiny HTTP layer Everything in the API uses `https://api.tikliveapi.com` as the base URL and authenticates via the `X-Api-Key` header. One credit per request, never expires. Wrap that once and you never have to think about it again.
(ns tiktok.client
  (:require [clj-http.client :as http]
            [cheshire.core   :as json]))

(def base-url "https://api.tikliveapi.com")

(defn- api-key []
  (or (System/getenv "TIKLIVE_API_KEY")
      (throw (ex-info "TIKLIVE_API_KEY not set" {}))))

(defn GET
  "Call a TikLiveAPI endpoint. path is e.g. \"/userinfo-by-username/\"."
  [path query]
  (let [{:keys [status body]}
        (http/get (str base-url path)
                  {:headers          {"X-Api-Key" (api-key)}
                   :query-params     query
                   :throw-exceptions false
                   :as               :string
                   :conn-timeout     5000
                   :socket-timeout   15000})]
    (if (= 200 status)
      (json/parse-string body true)
      (throw (ex-info "TikLiveAPI error"
                      {:status status :path path :body body})))))
Parsing with `:keywordize-keys true` lets you destructure responses as Clojure maps. Note that the API mixes snake_case and camelCase across endpoints, so the keywords you destructure with mirror that exactly. ## Resolving a username to a user id Most user-scoped endpoints take a numeric `userid`, not a handle. The /userid/ endpoint returns a flat object with a single string `id`:
(ns tiktok.users
  (:require [tiktok.client :as c]))

(defn resolve-user-id [username]
  (-> (c/GET "/userid/" {:username username})
      :id))

;; REPL:
;; (resolve-user-id "tiktok")
;; => "107955"
## Fetching a profile with idiomatic destructuring `/userinfo-by-username/` returns a nested shape: a `user` map (camelCase: `uniqueId`, `nickname`, `avatarLarger`, `secUid`, `verified`, `privateAccount`, `bioLink`) and a `stats` map (`followerCount`, `followingCount`, `videoCount`, `heartCount`, `diggCount`). Pull what you need with `:keys`:
(defn fetch-profile [username]
  (let [{{:keys [id uniqueId nickname secUid verified avatarLarger bioLink]} :user
         {:keys [followerCount followingCount videoCount heartCount]}        :stats}
        (c/GET "/userinfo-by-username/" {:username username})]
    {:tiktok/id        id
     :tiktok/handle    uniqueId
     :tiktok/nickname  nickname
     :tiktok/sec-uid   secUid
     :tiktok/verified? verified
     :tiktok/avatar    avatarLarger
     :tiktok/bio-link  bioLink
     :tiktok/followers followerCount
     :tiktok/following followingCount
     :tiktok/videos    videoCount
     :tiktok/hearts    heartCount}))
If you already have the numeric id, hit `/userinfo-by-id/` with `userid` instead. Same nested shape, different lookup key. ## Pagination as a lazy sequence The user-posts endpoint at /user-posts/ uses a `cursor` (string millisecond timestamp) and a `hasMore` flag. The items in `videos` are flat snake_case maps: `aweme_id`, `region`, `title`, `cover`, `play`, `wmplay`, `play_count`, `digg_count`, `comment_count`, `share_count`, `create_time`, plus a nested `author` and `music_info`. A lazy-seq turns the cursor walk into a single seqable value any downstream transducer can consume:
(defn user-posts
  "Lazy seq of all posts by userid, paginated server-side."
  ([userid] (user-posts userid nil))
  ([userid cursor]
   (lazy-seq
     (let [{:keys [videos cursor hasMore]}
           (c/GET "/user-posts/" (cond-> {:userid userid :count 30}
                                   cursor (assoc :cursor cursor)))]
       (concat videos
               (when hasMore
                 (user-posts userid cursor)))))))

;; (->> (user-posts "107955")
;;      (take 100)
;;      (map (juxt :aweme_id :play_count :digg_count)))
Followers use a different pagination key. /user-followers/ paginates by `time` (unix seconds) instead of `cursor`, and items are snake_case (`id`, `unique_id`, `sec_uid`, `nickname`, `follower_count`, `aweme_count`):
(defn user-followers [userid]
  (letfn [(step [t]
            (lazy-seq
              (let [{:keys [followers time hasMore]}
                    (c/GET "/user-followers/"
                           (cond-> {:userid userid :count 50}
                             t (assoc :time t)))]
                (concat followers (when hasMore (step time))))))]
    (step nil)))
The mirror endpoint /user-following/ has the same `time`+`hasMore` pagination, but the top key is `followings` (plural), not `following`. Easy to miss. ## Retries that compose Network calls fail. Wrap `GET` with a backoff function rather than scattering try/catch through your transforms:
(defn with-retry
  "Invoke f up to n times with exponential backoff (ms)."
  [f {:keys [max-attempts base-ms] :or {max-attempts 5 base-ms 500}}]
  (loop [attempt 1]
    (let [result (try {:ok (f)}
                      (catch Exception e {:err e}))]
      (cond
        (:ok result)              (:ok result)
        (>= attempt max-attempts) (throw (:err result))
        :else
        (do (Thread/sleep (* base-ms (long (Math/pow 2 (dec attempt)))))
            (recur (inc attempt)))))))

(defn safe-GET [path query]
  (with-retry #(c/GET path query) {:max-attempts 5 :base-ms 500}))
Swap `c/GET` for `safe-GET` inside your pagination functions when you go to production. ## Concurrency: pmap and core.async If you want to enrich a batch of usernames, `pmap` gives you cheap parallelism without leaving the seq world:
(defn enrich-handles [handles]
  (->> handles
       (pmap fetch-profile)
       (remove nil?)))
`pmap` is fine until you need backpressure or rate limiting. The standard limit is 200 requests per minute (increasable on request, see /contact/), so a bounded `core.async` pipeline is a better fit for sustained throughput:
(require '[clojure.core.async :as a])

(defn pipeline-profiles
  "Concurrently fetch profiles with bounded parallelism."
  [handles parallelism]
  (let [in  (a/to-chan! handles)
        out (a/chan 64)
        xf  (map fetch-profile)]
    (a/pipeline-blocking parallelism out xf in)
    (a/<!! (a/into [] out))))
`pipeline-blocking` is the right primitive here because HTTP calls block a thread. Tune `parallelism` so you stay under your rate budget. ## Persisting to CSV `clojure.data.csv` works with seqs of vectors. Combine that with a transducer and you can stream millions of rows through constant memory:
(require '[clojure.data.csv :as csv]
         '[clojure.java.io  :as io])

(def post-cols
  [:aweme_id :create_time :region :title
   :play_count :digg_count :comment_count :share_count])

(defn dump-posts [userid out-path]
  (with-open [w (io/writer out-path)]
    (csv/write-csv w [(map name post-cols)])
    (csv/write-csv w
      (eduction
        (map (apply juxt post-cols))
        (user-posts userid)))))
The `eduction` keeps the lazy seq lazy across the CSV writer - rows are pulled, transformed, and written one at a time. ## Daily ingestion: cron + uberjar Build a `-main` entry point that reads handles from a file and writes a dated CSV:
(ns tiktok.ingest
  (:gen-class)
  (:require [tiktok.users :as u]
            [tiktok.client :as c]
            [clojure.java.io :as io]))

(defn -main [& handles]
  (doseq [h handles]
    (let [uid (u/resolve-user-id h)
          out (format "out/%s-%tF.csv" h (java.util.Date.))]
      (io/make-parents out)
      (dump-posts uid out)
      (println "wrote" out))))
Build the jar with `clj -T:uberjar` (or your tool of choice) and schedule it from cron:
15 3 * * * /usr/bin/java -jar /opt/tiktok-ingest.jar tiktok charlidamelio
## Snapshotting into Datomic Daily snapshots are where Datomic shines: every transaction is a point in time, so you get follower-growth history without rolling your own audit table. Define a minimal schema once:
(def schema
  [{:db/ident :tiktok/id        :db/valueType :db.type/string
    :db/cardinality :db.cardinality/one :db/unique :db.unique/identity}
   {:db/ident :tiktok/handle    :db/valueType :db.type/string
    :db/cardinality :db.cardinality/one :db/index true}
   {:db/ident :tiktok/followers :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one}
   {:db/ident :tiktok/videos    :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one}
   {:db/ident :tiktok/hearts    :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one}])
Then transact a daily snapshot. Because `:tiktok/id` is `:db.unique/identity`, the same entity gets updated across days while history is preserved automatically:
(require '[datomic.api :as d])

(defn snapshot! [conn handles]
  @(d/transact conn
     (vec (pmap u/fetch-profile handles))))

;; Query growth across all of history:
;; (d/q '[:find ?handle ?inst ?followers
;;        :where
;;        [?e :tiktok/handle ?handle]
;;        [?e :tiktok/followers ?followers ?tx]
;;        [?tx :db/txInstant ?inst]]
;;      (d/history (d/db conn)))
That single query gives you the full follower timeline per handle - no extra columns, no extra writes. ## Other endpoints worth knowing The same pattern (GET + destructure + lazy-seq) applies to the rest of the surface. A few worth highlighting: - /post-detail/ returns a flat snake_case object with `aweme_id`, `play`, `wmplay`, and `hdplay` for watermark-free downloads (no `data{}` wrapper). - /post-comments/ uses `comments[]` with `id` (not `cid`), `text`, `digg_count`, and `reply_total`; pair with `/post-comment-replies/` for threaded replies. - `/search-video/` accepts `publish_time` (0/1/7/30/90/180) and `sort_by` (0 relevance / 1 likes / 2 date). - `/region-list/` returns the supported region codes you can pass to search and challenge endpoints. You can poke at any of them interactively on the playground before wiring them into Clojure. ## FAQ **Do I need a TikTok login?** No. Authentication is just the `X-Api-Key` header. You never hand over a TikTok password. **How are credits billed?** Pay-as-you-go: one request equals one credit, credits never expire, no subscription. See /pricing/. **Will `pmap` get me rate-limited?** Possibly. The default ceiling is 200 requests per minute. Use `core.async/pipeline-blocking` with a bounded parallelism, or contact support to raise the limit. **Why lazy-seq instead of reduce?** Lazy seqs let downstream consumers (transducers, CSV writers, `core.async` channels) drive backpressure. You only pay for the pages you actually consume. **Can I refund unused credits?** Yes, as long as no credits from that purchase have been used. See the blog or reach out via /contact/. That is the whole loop: REPL-driven exploration, lazy pagination, retry-wrapped HTTP, bounded concurrency, CSV or Datomic at the tail. Add endpoints by copy-pasting one function. The shape of your pipeline stays the same.

Build with the TikTok API

Ready to put what you read into code? Try our endpoints live or grab the full reference.

Open Playground Read Documentation