Google SERP Scraping Deep Dive (2026)
Google's SERP has fractured into 13+ feature blocks. Here's what you can actually extract, how Google detects bots in 2026, and Python code that survives using mobile proxies.
1. What Google's SERP Actually Returns
Google's result page is no longer a list of ten links. For competitive queries it renders a mosaic of interactive blocks, each with its own HTML structure. A complete scraper has to parse them all — or decide which ones it cares about.
| Feature | Where | Purpose |
|---|---|---|
| Organic results | Main column | 10 blue links (the classic list) |
| Featured Snippet | Top, above position 1 | Extracted paragraph/list/table answering the query |
| People Also Ask | Usually positions 2-5 | Expandable related questions (load more on click) |
| Knowledge Graph | Right rail (desktop) | Entity card — person, company, place, product |
| Local Pack | Inline, 3 results | Map + top 3 local businesses for local-intent queries |
| Image Pack | Inline carousel | Horizontal image thumbnails with source URLs |
| Video Pack | Inline block | YouTube and third-party video results |
| Top Stories | Inline, news-heavy queries | News articles with timestamps and publishers |
| Shopping | Top or right rail | Product listings — price, merchant, rating |
| Ads (top/bottom) | Above organic, below organic | Paid listings marked "Sponsored" |
| Recipes / Events / Jobs | Inline, vertical-specific | Structured-data-driven rich results |
| Related Searches | Bottom of page | 8 query refinements |
Whether a feature appears depends on query intent, location, and device — the same query scraped from a US mobile IP and a German desktop IP can return completely different SERPs. That geographic variance is why consistent IP placement matters for rank tracking.
2. Google's 2026 Anti-Bot Stack
Google doesn't publish its bot detection layers, but the behaviour is observable. A scraper running from a datacenter IP typically hits a block within 20-80 queries. From a residential IP, it's more like 150-400. Here's what's happening.
IP reputation & ASN filtering
Requests from AWS, GCP, Hetzner, OVH and other datacenter ASNs are rate-limited aggressively. Google also cross-references MaxMind-style reputation data.
Session cookies — NID, CONSENT, SOCS
A fresh session has no NID cookie. Google issues one on the first request and expects it back on subsequent requests. Scrapers that never persist cookies look identical across queries — a tell.
reCAPTCHA challenges
v2 (checkbox + image) and v3 (invisible score 0.0-1.0) both fire on suspicious sessions. The "sorry/index" redirect is Google's soft block.
Per-IP rate limits
Undocumented but roughly: 30-60 queries/hour from one IP before warning screens start appearing. Mobile IPs tolerate more because Google can't tell two real users behind the same CGNAT IP from one scraper.
TLS + HTTP/2 fingerprinting
Plain Python requests produces a JA4 fingerprint that's nothing like Chrome. Google's edge compares the TLS fingerprint to the User-Agent — inconsistencies get flagged.
3. Why Mobile Proxies Survive Google's Filters
Mobile proxies route traffic through real 4G/5G carrier networks. The egress IP belongs to an ASN like AS7018 (AT&T), AS21928 (T-Mobile), or AS12430 (Vodafone). Two properties matter:
- →CGNAT sharing: hundreds of real consumer devices share the same public IP. Google can't block it without blocking real users.
- →IP churn: carriers reassign IPs frequently. A flagged IP from yesterday may be fresh tomorrow.
- →Consumer reputation: the same ASN powers the Google Maps and YouTube usage of millions of customers. IP reputation data treats it as clean traffic.
This doesn't make the scraper invisible — it means the penalty for behaving badly is a soft rate-limit, not a permanent block. Combined with respectful pacing, mobile proxies keep a SERP pipeline stable for months at a time.
4. Working Python Scraper
Minimal requests + BeautifulSoup implementation that pulls organic results through a mobile proxy. Suitable for hundreds-per-hour scale — for millions per day, move to a queue-based architecture (covered in the rank-tracker article below).
import requests
from bs4 import BeautifulSoup
from urllib.parse import quote_plus
import time, random
PROXY_USER = "your-username"
PROXY_PASS = "your-password"
PROXY_HOST = "proxy.mobileproxies.org"
PROXY_PORT = 8000
proxies = {
"http": f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}",
"https": f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}",
}
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 "
"Mobile/15E148 Safari/604.1"
),
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
def search_google(query, num=10, hl="en", gl="us"):
url = (
f"https://www.google.com/search?"
f"q={quote_plus(query)}&num={num}&hl={hl}&gl={gl}"
)
session = requests.Session() # persist NID/CONSENT cookies
r = session.get(url, headers=HEADERS, proxies=proxies, timeout=20)
r.raise_for_status()
return parse_serp(r.text)
def parse_serp(html):
soup = BeautifulSoup(html, "lxml")
results = []
# CSS selectors drift — validate against today's SERP before production use.
for block in soup.select("div.g"):
title_el = block.select_one("h3")
link_el = block.select_one("a[href]")
snippet_el = block.select_one("div.VwiC3b, span.aCOpRe")
if not title_el or not link_el:
continue
results.append({
"title": title_el.get_text(strip=True),
"url": link_el["href"],
"snippet": snippet_el.get_text(strip=True) if snippet_el else None,
})
return results
if __name__ == "__main__":
for q in ["mobile proxies", "CGNAT explained", "JA4 fingerprint"]:
print(q)
for row in search_google(q):
print(" ", row["title"][:60])
time.sleep(random.uniform(3, 6)) # respectful pacing
Selector drift: div.g, h3, and .VwiC3b have been Google's class names for years, but the obfuscated ones (like .VwiC3b for snippets) rotate every few months. Production scrapers use multiple fallback selectors and alert on parse-rate drops.
5. Pagination, Rate Limits, Session Cookies
- →Pagination: add
&start=10for page 2,&start=20for page 3, etc. Google caps results aroundstart=300. - →Rate limit target: cap each mobile IP at ~30 queries/hour if you want to stay well under the warning threshold.
- →Jitter: 2-5 second randomized delays between requests. Uniform 3-second intervals are themselves a bot signal.
- →Sessions: use
requests.Session()so NID and CONSENT cookies persist across queries from the same IP. - →Rotate on failure: 429, 503, or a redirect to
/sorry/indexall mean rotate the mobile IP and back off for several minutes.
6. When a SERP API Makes More Sense
Roll-your-own isn't always the right call. SERP APIs wrap the scraping, proxy rotation, and parsing into a single endpoint that returns JSON.
| Provider | Strength | Endpoint style |
|---|---|---|
| SerpAPI | Cleanest JSON, great for one-offs | /search?engine=google&q=... |
| DataForSEO | Batch + live, cheap at scale | /serp/google/organic/live/advanced |
| Bright Data SERP API | Highest volume, enterprise | Proxy endpoint — sends your raw query |
| Oxylabs SERP Scraper API | Parser included | /v1/queries (source: google_search) |
Rule of thumb: < 100K queries/month, the APIs are usually cheaper than engineering time. Beyond that, build your own with mobile proxies — margins improve fast and you control the parsing layer for custom features.
Related Guides
Run Your Own SERP Pipeline
Mobile IPs on real carrier ASNs. Sticky sessions, API rotation, clean JA4. Test before you commit.