How to Scrape TikTok User Data with Elixir and Phoenix

Published on May 29, 2026

How to Scrape TikTok User Data with Elixir and Phoenix

If you have ever tried to keep a TikTok dashboard fresh in real time, you already know the shape of the problem: dozens of profiles to poll, cursor-paginated feeds, sporadic rate limits, and a UI that should never block. This is exactly the kind of workload the BEAM was designed for. With Elixir's lightweight processes, OTP supervision, and Phoenix LiveView, you can build a TikTok ingestion service that fans out thousands of concurrent fetches, recovers from upstream failures cleanly, and pushes updates to the browser without a single page reload.

In this tutorial we will wire TikLiveAPI into a Phoenix project: typed structs for the response shapes, a Stream.unfold/2 pagination loop, a GenServer-per-username poller supervised by a DynamicSupervisor, batched bulk fetches over Task.Supervisor, ETS caching, retries, an Oban scheduler, and a small LiveView dashboard. The full code is idiomatic Elixir 1.16+ and assumes you are comfortable with Mix and OTP basics.

Why Elixir for TikTok ingestion

TikTok scraping is mostly I/O bound HTTP work with a long tail of latency. Three Elixir features make it a near-perfect match:

  • BEAM concurrency. Each tracked username can live in its own process at almost zero cost. Polling 5,000 accounts in parallel is not a stretch goal, it is the default mode.
  • Supervision trees. If a single poller crashes because a profile went private or the upstream returned malformed JSON, the supervisor restarts that one process and the rest of the system keeps running.
  • Phoenix LiveView. Push fresh follower counts to the browser over a WebSocket the moment ETS is updated. No client-side state, no SPA.

TikLiveAPI exposes 37 endpoints covering users, posts, music, challenges, search, playlists, downloads, collections, region, and ads. We will focus on the user-centric ones, but the patterns transfer directly to the rest.

Prerequisites

  • Elixir 1.16+ and Erlang/OTP 26+
  • An API key from tikliveapi.com (top up at /pricing/; 1 request = 1 credit, credits never expire, 200 RPM standard limit)
  • A new Phoenix project: mix phx.new tiklive_dash --live

Add the dependencies in mix.exs:

defp deps do
  [
    {:phoenix, "~> 1.7"},
    {:phoenix_live_view, "~> 0.20"},
    {:req, "~> 0.5"},
    {:jason, "~> 1.4"},
    {:oban, "~> 2.17"}
  ]
end

Then export your key (never hardcode it) and read it in config/runtime.exs:

# shell
export TIKLIVE_API_KEY="sk_live_xxx"

# config/runtime.exs
config :tiklive_dash, :api,
  key: System.fetch_env!("TIKLIVE_API_KEY"),
  base_url: "https://api.tikliveapi.com"

The HTTP client: Req

Tesla and HTTPoison both work, but Req (from Jose Valim) is the modern default: built on Finch, retries and decoding included, and friendly defaults that fit BEAM ergonomics. Wrap it once so every call sends the X-Api-Key header and decodes JSON with atom keys.

defmodule TikliveDash.Client do
  @moduledoc "Thin HTTP wrapper around api.tikliveapi.com."

  def new do
    cfg = Application.fetch_env!(:tiklive_dash, :api)

    Req.new(
      base_url: cfg[:base_url],
      headers: [{"x-api-key", cfg[:key]}, {"accept", "application/json"}],
      receive_timeout: 15_000,
      retry: :transient,
      max_retries: 3,
      retry_delay: &jittered_backoff/1,
      decode_json: [keys: :atoms]
    )
  end

  def get(path, params \\ %{}) do
    case Req.get(new(), url: path, params: params) do
      {:ok, %Req.Response{status: 200, body: body}} -> {:ok, body}
      {:ok, %Req.Response{status: s, body: b}} -> {:error, {:http, s, b}}
      {:error, reason} -> {:error, reason}
    end
  end

  defp jittered_backoff(attempt) do
    base = :math.pow(2, attempt) |> round() |> Kernel.*(500)
    base + :rand.uniform(250)
  end
end

Req's built-in retry: :transient retries on 408/429/5xx, which covers the common failure modes. The jittered_backoff/1 helper adds randomness so 1,000 pollers do not stampede when the upstream blips.

Structs that match the response shapes

TikLiveAPI mixes snake_case and camelCase across endpoints, which is normal for a TikTok-derived schema. Document the shape in structs so you fail fast on changes. For /userinfo-by-username/ the body is a nested object with user and stats keys:

defmodule TikliveDash.User do
  @enforce_keys [:id, :unique_id, :nickname]
  defstruct [:id, :unique_id, :nickname, :avatar, :signature,
             :verified, :sec_uid, :private_account, :bio_link,
             :follower_count, :following_count, :video_count,
             :heart_count, :digg_count]

  def from_userinfo(%{user: u, stats: s}) do
    %__MODULE__{
      id: u.id,
      unique_id: u.uniqueId,
      nickname: u.nickname,
      avatar: u[:avatarMedium] || u[:avatarThumb],
      signature: u[:signature],
      verified: u[:verified],
      sec_uid: u[:secUid],
      private_account: u[:privateAccount],
      bio_link: u[:bioLink],
      follower_count: s.followerCount,
      following_count: s.followingCount,
      video_count: s.videoCount,
      heart_count: s.heartCount,
      digg_count: s.diggCount
    }
  end
end

Now a single call resolves a profile:

def fetch_user(username) do
  with {:ok, body} <- TikliveDash.Client.get("/userinfo-by-username/", %{username: username}) do
    {:ok, TikliveDash.User.from_userinfo(body)}
  end
end

Cursor pagination as Stream.unfold

The /user-posts/ endpoint returns videos, a numeric cursor (string ms timestamp), and hasMore. The natural Elixir expression is a lazy stream that the caller can Enum.take/2 from, batch, or pipe into Ecto. Note the response uses flat snake_case items (aweme_id, play_count, digg_count, create_time, etc.) with a nested author object.

defmodule TikliveDash.Posts do
  alias TikliveDash.Client

  @page_size 35

  def stream_user_posts(user_id) do
    Stream.unfold("0", fn
      :done ->
        nil

      cursor ->
        case Client.get("/user-posts/", %{userid: user_id, count: @page_size, cursor: cursor}) do
          {:ok, %{videos: videos, cursor: next, hasMore: true}} ->
            {videos, next}

          {:ok, %{videos: videos}} ->
            {videos, :done}

          {:error, _} ->
            nil
        end
    end)
    |> Stream.flat_map(& &1)
  end
end

The exact same shape works for /user-followers/, except the cursor field is called time (unix seconds) and the list is followers:

def stream_followers(user_id) do
  Stream.unfold(0, fn
    :done -> nil
    time ->
      case Client.get("/user-followers/", %{userid: user_id, count: 50, time: time}) do
        {:ok, %{followers: list, time: next, hasMore: true}} -> {list, next}
        {:ok, %{followers: list}} -> {list, :done}
        {:error, _} -> nil
      end
  end)
  |> Stream.flat_map(& &1)
end

And mind the trap on /user-following/: the top-level key is followings (plural), not following. Comments on a post use the id field on each item (not cid), with hasMore as the continuation flag.

A GenServer poller per tracked username

One process per username is the cleanest model: state is isolated, failures are local, and you can pause individual pollers without touching others. The poller schedules itself via Process.send_after/3 and writes results to ETS.

defmodule TikliveDash.Poller do
  use GenServer
  require Logger

  def start_link(username), do: GenServer.start_link(__MODULE__, username, name: via(username))
  defp via(u), do: {:via, Registry, {TikliveDash.PollerRegistry, u}}

  @impl true
  def init(username) do
    send(self(), :tick)
    {:ok, %{username: username, interval: 60_000, last_followers: nil}}
  end

  @impl true
  def handle_info(:tick, %{username: u} = state) do
    case TikliveDash.fetch_user(u) do
      {:ok, user} ->
        :ets.insert(:tiklive_users, {u, user, System.system_time(:second)})
        Phoenix.PubSub.broadcast(TikliveDash.PubSub, "users", {:user_updated, user})
        Process.send_after(self(), :tick, state.interval)
        {:noreply, %{state | last_followers: user.follower_count}}

      {:error, reason} ->
        Logger.warning("poll failed for #{u}: #{inspect(reason)}")
        Process.send_after(self(), :tick, state.interval * 2)
        {:noreply, state}
    end
  end
end

Register the supervision tree in application.ex:

children = [
  {Registry, keys: :unique, name: TikliveDash.PollerRegistry},
  {DynamicSupervisor, name: TikliveDash.PollerSup, strategy: :one_for_one},
  {Task.Supervisor, name: TikliveDash.TaskSup},
  TikliveDash.Cache,
  TikliveDashWeb.Endpoint
]

Now DynamicSupervisor.start_child(TikliveDash.PollerSup, {TikliveDash.Poller, "charlidamelio"}) spawns a tracked profile. If it crashes, only that poller restarts.

Batched bulk fetch with Task.Supervisor

For one-shot enrichment, say resolving 500 usernames the user pasted into a form, spawn a supervised task per call and join with Task.async_stream/3. max_concurrency keeps you under the 200 RPM rate limit.

def bulk_fetch(usernames) do
  TikliveDash.TaskSup
  |> Task.Supervisor.async_stream_nolink(
       usernames,
       &TikliveDash.fetch_user/1,
       max_concurrency: 8,
       timeout: 20_000,
       on_timeout: :kill_task
     )
  |> Enum.map(fn
       {:ok, {:ok, user}} -> {:ok, user}
       {:ok, {:error, reason}} -> {:error, reason}
       {:exit, reason} -> {:error, {:crash, reason}}
     end)
end

async_stream_nolink keeps the caller alive if a task dies, which matters when you are processing user input.

ETS for in-memory cache

For read-mostly profile data, ETS is faster than any external store and trivially shared across processes.

defmodule TikliveDash.Cache do
  use GenServer

  def start_link(_), do: GenServer.start_link(__MODULE__, [], name: __MODULE__)

  @impl true
  def init(_) do
    :ets.new(:tiklive_users, [:set, :public, :named_table, read_concurrency: true])
    {:ok, %{}}
  end

  def get(username) do
    case :ets.lookup(:tiklive_users, username) do
      [{^username, user, ts}] -> {:hit, user, ts}
      [] -> :miss
    end
  end
end

Pair this with a TTL check in the LiveView: if System.system_time(:second) - ts > 300, fall back to TikliveDash.fetch_user/1.

Phoenix LiveView dashboard

The LiveView subscribes to the "users" PubSub topic and re-renders on every push from the pollers. No JavaScript, no polling on the client.

defmodule TikliveDashWeb.DashboardLive do
  use TikliveDashWeb, :live_view
  alias TikliveDash.{Cache, Poller, PollerSup}

  @impl true
  def mount(_params, _session, socket) do
    if connected?(socket), do: Phoenix.PubSub.subscribe(TikliveDash.PubSub, "users")
    {:ok, assign(socket, users: load_all(), form: to_form(%{"username" => ""}))}
  end

  @impl true
  def handle_event("track", %{"username" => u}, socket) do
    DynamicSupervisor.start_child(PollerSup, {Poller, u})
    {:noreply, socket}
  end

  @impl true
  def handle_info({:user_updated, user}, socket) do
    {:noreply, update(socket, :users, &Map.put(&1, user.unique_id, user))}
  end

  defp load_all do
    :ets.tab2list(:tiklive_users)
    |> Map.new(fn {u, user, _} -> {u, user} end)
  end
end

The template renders a table of usernames, follower counts, and a verified badge. Because PubSub delivers updates as messages, every connected tab sees the new numbers within milliseconds of the poller writing to ETS.

Scheduled snapshots with Oban

Pollers are great for live counters, but historical analytics need durable records. Oban writes jobs to Postgres, survives restarts, and supports cron expressions out of the box.

# config.exs
config :tiklive_dash, Oban,
  repo: TikliveDash.Repo,
  queues: [snapshots: 5],
  plugins: [{Oban.Plugins.Cron, crontab: [{"0 * * * *", TikliveDash.SnapshotWorker}]}]

# worker
defmodule TikliveDash.SnapshotWorker do
  use Oban.Worker, queue: :snapshots, max_attempts: 5

  @impl true
  def perform(%Oban.Job{args: %{"username" => u}}) do
    with {:ok, user} <- TikliveDash.fetch_user(u) do
      TikliveDash.Repo.insert(%TikliveDash.Snapshot{
        username: u,
        follower_count: user.follower_count,
        video_count: user.video_count,
        captured_at: DateTime.utc_now()
      })
    end
  end
end

Oban's exponential backoff plus Req's transient retry give you two layers of resilience. If TikLiveAPI returns a 429, Req retries with jitter; if the upstream is hard down for an hour, Oban retries the whole job up to five times.

Try it in the playground

Before writing any Elixir, sanity-check the exact JSON shapes you will be decoding. The playground calls every endpoint live through a server-side proxy that injects your key, so you can paste a username, inspect the nested user/stats object, and confirm field names match your structs. The profile page shows your current credit balance and request volume; refer to contact if you need rate-limit increases, and the blog tracks new endpoints as they ship.

FAQ

Do I need a TikTok account or cookies? No. TikLiveAPI authenticates with a single X-Api-Key header; you never hand over a TikTok password or session.

What happens if I exceed 200 requests per minute? You get a 429 response. Req's :transient retry policy backs off and tries again, but the cleanest fix is to lower max_concurrency in Task.async_stream or stagger your pollers. Higher limits are available on request.

Why use Stream.unfold instead of recursion? Lazy streams compose. stream_user_posts(id) |> Stream.take_while(&recent?/1) |> Enum.to_list() stops paging as soon as you hit an old post, without writing a single conditional in the fetch loop.

One GenServer per user, really? Yes. The BEAM happily runs hundreds of thousands of processes; a GenServer at rest costs around 2 KB. Process-per-entity is the canonical Elixir pattern and makes per-user concerns (intervals, pause/resume, last-seen state) trivial.

How do I download videos without the watermark? Call /post-detail/ with a TikTok URL. The flat snake_case response includes play (no watermark), wmplay (watermarked), and hdplay (high-definition no-watermark) URLs you can stream directly to the client or persist to object storage.

Do credits expire? No. The pay-as-you-go model on /pricing/ charges 1 credit per request, and unused credits stay on your account indefinitely.

Build with the TikTok API

Ready to put what you read into code? Try our endpoints live or grab the full reference.

Open Playground Read Documentation