Home/Blog/Google SERP Scraping Deep Dive
SEO Intelligence

Google SERP Scraping Deep Dive (2026)

Google's SERP has fractured into 13+ feature blocks. Here's what you can actually extract, how Google detects bots in 2026, and Python code that survives using mobile proxies.

14 min read·Covers organic, snippets, PAA, knowledge graph, local pack, shopping·Last updated: April 2026

1. What Google's SERP Actually Returns

Google's result page is no longer a list of ten links. For competitive queries it renders a mosaic of interactive blocks, each with its own HTML structure. A complete scraper has to parse them all — or decide which ones it cares about.

FeatureWherePurpose
Organic resultsMain column10 blue links (the classic list)
Featured SnippetTop, above position 1Extracted paragraph/list/table answering the query
People Also AskUsually positions 2-5Expandable related questions (load more on click)
Knowledge GraphRight rail (desktop)Entity card — person, company, place, product
Local PackInline, 3 resultsMap + top 3 local businesses for local-intent queries
Image PackInline carouselHorizontal image thumbnails with source URLs
Video PackInline blockYouTube and third-party video results
Top StoriesInline, news-heavy queriesNews articles with timestamps and publishers
ShoppingTop or right railProduct listings — price, merchant, rating
Ads (top/bottom)Above organic, below organicPaid listings marked "Sponsored"
Recipes / Events / JobsInline, vertical-specificStructured-data-driven rich results
Related SearchesBottom of page8 query refinements

Whether a feature appears depends on query intent, location, and device — the same query scraped from a US mobile IP and a German desktop IP can return completely different SERPs. That geographic variance is why consistent IP placement matters for rank tracking.

2. Google's 2026 Anti-Bot Stack

Google doesn't publish its bot detection layers, but the behaviour is observable. A scraper running from a datacenter IP typically hits a block within 20-80 queries. From a residential IP, it's more like 150-400. Here's what's happening.

IP reputation & ASN filtering

Requests from AWS, GCP, Hetzner, OVH and other datacenter ASNs are rate-limited aggressively. Google also cross-references MaxMind-style reputation data.

Session cookies — NID, CONSENT, SOCS

A fresh session has no NID cookie. Google issues one on the first request and expects it back on subsequent requests. Scrapers that never persist cookies look identical across queries — a tell.

reCAPTCHA challenges

v2 (checkbox + image) and v3 (invisible score 0.0-1.0) both fire on suspicious sessions. The "sorry/index" redirect is Google's soft block.

Per-IP rate limits

Undocumented but roughly: 30-60 queries/hour from one IP before warning screens start appearing. Mobile IPs tolerate more because Google can't tell two real users behind the same CGNAT IP from one scraper.

TLS + HTTP/2 fingerprinting

Plain Python requests produces a JA4 fingerprint that's nothing like Chrome. Google's edge compares the TLS fingerprint to the User-Agent — inconsistencies get flagged.

3. Why Mobile Proxies Survive Google's Filters

Mobile proxies route traffic through real 4G/5G carrier networks. The egress IP belongs to an ASN like AS7018 (AT&T), AS21928 (T-Mobile), or AS12430 (Vodafone). Two properties matter:

  • CGNAT sharing: hundreds of real consumer devices share the same public IP. Google can't block it without blocking real users.
  • IP churn: carriers reassign IPs frequently. A flagged IP from yesterday may be fresh tomorrow.
  • Consumer reputation: the same ASN powers the Google Maps and YouTube usage of millions of customers. IP reputation data treats it as clean traffic.

This doesn't make the scraper invisible — it means the penalty for behaving badly is a soft rate-limit, not a permanent block. Combined with respectful pacing, mobile proxies keep a SERP pipeline stable for months at a time.

4. Working Python Scraper

Minimal requests + BeautifulSoup implementation that pulls organic results through a mobile proxy. Suitable for hundreds-per-hour scale — for millions per day, move to a queue-based architecture (covered in the rank-tracker article below).

import requests
from bs4 import BeautifulSoup
from urllib.parse import quote_plus
import time, random

PROXY_USER = "your-username"
PROXY_PASS = "your-password"
PROXY_HOST = "proxy.mobileproxies.org"
PROXY_PORT = 8000

proxies = {
    "http":  f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}",
    "https": f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}",
}

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) "
        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 "
        "Mobile/15E148 Safari/604.1"
    ),
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}

def search_google(query, num=10, hl="en", gl="us"):
    url = (
        f"https://www.google.com/search?"
        f"q={quote_plus(query)}&num={num}&hl={hl}&gl={gl}"
    )
    session = requests.Session()  # persist NID/CONSENT cookies
    r = session.get(url, headers=HEADERS, proxies=proxies, timeout=20)
    r.raise_for_status()
    return parse_serp(r.text)

def parse_serp(html):
    soup = BeautifulSoup(html, "lxml")
    results = []
    # CSS selectors drift — validate against today's SERP before production use.
    for block in soup.select("div.g"):
        title_el = block.select_one("h3")
        link_el = block.select_one("a[href]")
        snippet_el = block.select_one("div.VwiC3b, span.aCOpRe")
        if not title_el or not link_el:
            continue
        results.append({
            "title": title_el.get_text(strip=True),
            "url": link_el["href"],
            "snippet": snippet_el.get_text(strip=True) if snippet_el else None,
        })
    return results

if __name__ == "__main__":
    for q in ["mobile proxies", "CGNAT explained", "JA4 fingerprint"]:
        print(q)
        for row in search_google(q):
            print("  ", row["title"][:60])
        time.sleep(random.uniform(3, 6))  # respectful pacing

Selector drift: div.g, h3, and .VwiC3b have been Google's class names for years, but the obfuscated ones (like .VwiC3b for snippets) rotate every few months. Production scrapers use multiple fallback selectors and alert on parse-rate drops.

5. Pagination, Rate Limits, Session Cookies

  • Pagination: add &start=10 for page 2, &start=20 for page 3, etc. Google caps results around start=300.
  • Rate limit target: cap each mobile IP at ~30 queries/hour if you want to stay well under the warning threshold.
  • Jitter: 2-5 second randomized delays between requests. Uniform 3-second intervals are themselves a bot signal.
  • Sessions: use requests.Session() so NID and CONSENT cookies persist across queries from the same IP.
  • Rotate on failure: 429, 503, or a redirect to /sorry/index all mean rotate the mobile IP and back off for several minutes.

6. When a SERP API Makes More Sense

Roll-your-own isn't always the right call. SERP APIs wrap the scraping, proxy rotation, and parsing into a single endpoint that returns JSON.

ProviderStrengthEndpoint style
SerpAPICleanest JSON, great for one-offs/search?engine=google&q=...
DataForSEOBatch + live, cheap at scale/serp/google/organic/live/advanced
Bright Data SERP APIHighest volume, enterpriseProxy endpoint — sends your raw query
Oxylabs SERP Scraper APIParser included/v1/queries (source: google_search)

Rule of thumb: < 100K queries/month, the APIs are usually cheaper than engineering time. Beyond that, build your own with mobile proxies — margins improve fast and you control the parsing layer for custom features.

Related Guides

Run Your Own SERP Pipeline

Mobile IPs on real carrier ASNs. Sticky sessions, API rotation, clean JA4. Test before you commit.