Home/Blog/SERP Rank Tracker Architecture
SEO Intelligence

Building a SERP Rank Tracker: Architecture Guide

Ahrefs, SEMrush, Sistrix, Serpstat — the rank-tracker industry is worth billions because the problem is harder than it looks. Here's the architecture that makes tracking millions of keywords a day tractable.

15 min read·Queue design, proxy rotation, storage, cost modelling·Last updated: April 2026

A rank tracker's job is simple to describe — for each keyword in a customer's project, query Google from the right location, parse the SERP, record the rank for the customer's domain, and raise an alert if anything moves meaningfully. At 100 keywords it's a cron job. At 10 million it's a distributed system with proxy pools, retry logic, parser versioning, and time-series storage measured in TB. The architecture stays the same; the components scale independently.

1. Core Components

ComponentTypical choiceResponsibility
SchedulerCron / AirflowDecides which keywords are due for checking
Keyword queueRedis / SQS / RabbitMQWork queue for scraper workers
Scraper workersPython / Go, statelessExecute queries, return raw HTML
Proxy poolMobile proxies + rotation APISupplies IPs, rotates on request or failure
ParserVersioned libraryConverts HTML → structured ranks + SERP features
DatabasePostgreSQL / TimescaleDBStores rank history as time series
AlertingEvent bus + rule engineNotifies on rank drops, new SERP features
CAPTCHA fallback2Captcha / CapSolverHandles challenges that slip past the proxy layer

Each component scales horizontally on its own axis. Bottlenecks move as you grow — typically parser first (regex too slow), then database (write throughput), then proxy pool (429s under load).

2. Queue & Scheduling Design

Three scheduling dimensions matter: cadence (how often each keyword is re-checked), locality (which geo to query from), and device (desktop vs mobile SERP).

  • Daily for tracked keywords; weekly for long-tail; hourly for news/trending. Commercial tools expose this tiering via plan price.
  • Round-robin by geo: each worker picks up jobs for the IP pool it's attached to. A US-mobile-IP worker only takes US queries.
  • Spread load across the day: don't run every daily job at 00:00 — bucket jobs across 24 hours based on a hash of keyword_id. Smooths proxy pool load.
  • Deduplication: if three customers track the same keyword in the same geo, query once and fan out the rank lookup.

3. Proxy Rotation Strategy

Google rewards consistency within a session (NID cookie persistence) and punishes repetition across sessions (same IP + same fingerprint = throttle). The compromise: sticky per-query, rotate between queries.

  • One query = one IP: per-keyword mobile IP assignment, full request cycle on that IP
  • Rotate between queries: hit the rotation endpoint, pick up a fresh IP for the next keyword
  • Retry on fresh IP: any 429 / 503 / sorry-redirect triggers immediate IP rotation and a short back-off (exponential, jittered)
  • Budget per IP: cap each IP at ~30 queries/hour before forcing rotation, even without failures

Mobileproxies.org exposes rotation via API at https://buy.mobileproxies.org/ — a single HTTP call to the rotation endpoint returns a new carrier IP assigned to your port. Workers call it between queries.

4. Time-Series Storage

Rank history is effectively append-only time-series data. Two shapes to pick from:

  • PostgreSQL with partitioning: native, cheap, great up to ~1B rows. Partition by month on a rank_history table — drop old partitions in one statement.
  • TimescaleDB: Postgres extension with hypertables, continuous aggregates, automatic compression. The standard choice once you pass ~100M new rows/month.
  • ClickHouse: if you need sub-second aggregate queries across billions of rows (SEMrush-scale), columnar storage wins.

Store both the rank and the full SERP snapshot (compressed HTML or structured JSON of all result blocks). Customers frequently ask "what did the SERP look like on the day our rank dropped?" — you want that answer.

5. Alerting on Movement

  • Rank drops: threshold alert (e.g., dropped > 5 positions day-over-day)
  • Lost snippet: previously owned Featured Snippet now points elsewhere
  • New SERP feature: a Local Pack, Shopping row, or Knowledge Graph appeared for a query that didn't have one
  • New competitor: a domain entered the top 10 for the first time

6. Cost Model: 100K Keywords / Day

Line itemScale
Queries100K/day × 30 days = 3M/month
Proxy traffic~300 KB HTML per query → ~900 GB/month mobile bandwidth
Workers~3-5 concurrent Python workers per 100K/day slice
Storage growth3M rows/month in rank_history; 900 GB compressed SERP snapshots

Third-party SERP APIs charge $1-3 per 1K queries. 3M/month through a SERP API is $3K-9K/month. Equivalent in-house with mobile proxies is usually half to a third of that once bandwidth and worker infra are counted — and you own the parser, which matters when Google ships SERP feature changes you want tracked before the vendor updates.

7. Minimal Producer / Consumer Skeleton

The moving parts in one file — Redis queue, mobile proxy rotation, rank lookup, storage. Good enough to run in a single container for a few thousand keywords; structurally the same shape you'd scale up.

import requests, redis, json, time, random
from bs4 import BeautifulSoup
from urllib.parse import quote_plus

r = redis.Redis(host="localhost", port=6379, decode_responses=True)
ROTATE_URL = "https://buy.mobileproxies.org/api/rotate"  # placeholder — see docs
PROXY = "http://user:pass@proxy.mobileproxies.org:8000"
proxies = {"http": PROXY, "https": PROXY}
HEADERS = {
    "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) "
                  "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1",
    "Accept-Language": "en-US,en;q=0.9",
}

def rotate_ip():
    # Call your provider's rotation endpoint; mobileproxies exposes one per port.
    requests.post(ROTATE_URL, auth=("user", "pass"), timeout=10)

def fetch_serp(query, gl="us", hl="en"):
    url = f"https://www.google.com/search?q={quote_plus(query)}&num=50&gl={gl}&hl={hl}"
    for attempt in range(3):
        try:
            resp = requests.get(url, headers=HEADERS, proxies=proxies, timeout=20)
            if resp.status_code == 200 and "/sorry/" not in resp.url:
                return resp.text
        except requests.RequestException:
            pass
        rotate_ip()
        time.sleep((2 ** attempt) + random.random())
    return None

def find_rank(html, target_domain):
    soup = BeautifulSoup(html, "lxml")
    for i, block in enumerate(soup.select("div.g"), start=1):
        link = block.select_one("a[href]")
        if link and target_domain in link.get("href", ""):
            return i
    return None

def worker():
    while True:
        raw = r.blpop("keyword_queue", timeout=0)
        if not raw:
            continue
        job = json.loads(raw[1])  # {keyword, domain, gl, project_id}
        html = fetch_serp(job["keyword"], gl=job["gl"])
        if html is None:
            r.rpush("dead_letter", json.dumps(job))
            continue
        rank = find_rank(html, job["domain"])
        r.rpush("rank_results", json.dumps({
            "project_id": job["project_id"],
            "keyword": job["keyword"],
            "rank": rank,
            "ts": int(time.time()),
        }))
        time.sleep(random.uniform(3, 6))
        rotate_ip()

if __name__ == "__main__":
    worker()

From here, a second worker drains rank_results into Postgres/TimescaleDB, and a third compares yesterday's rank with today's to emit alert events. Each worker scales independently.

Related Guides

Proxy Pool for Your Rank Tracker

13-geo mobile IP coverage, API rotation, sticky sessions. The proxy layer Ahrefs-class infrastructure runs on.