Home/Blog/How Websites Detect Proxies
Technical Deep-Dive

How Websites Detect Proxies in 2026

Modern bot protection stacks use 7+ detection layers simultaneously. Here's exactly how they catch proxy traffic — and why mobile carrier IPs remain the hardest to detect.

15 min read·Based on research from Cloudflare, DataDome, MaxMind, Salesforce

Detecting proxy traffic isn't one test — it's a stack of them, running in parallel before your request even reaches application code. Cloudflare, DataDome, PerimeterX (HUMAN), and Akamai Bot Manager all evaluate multiple signals simultaneously. A single mismatch can flag your session.

This guide walks through each layer with technical accuracy and the real tools involved. At the end, we'll show why mobile proxies on carrier networks remain the hardest traffic to classify as "proxy" — not because they're invisible, but because blocking them costs websites more than it saves.

1. IP Reputation Databases

The first filter is usually a lookup against commercial IP reputation databases. These are updated continuously from honeypots, botnet logs, customer reports, and ASN ownership data.

ServiceClassificationsUpdate Frequency
MaxMind Anonymous PlusVPN, residential proxy, hosting, Tor exit, public proxy — with confidence score & provider nameDaily + "last seen" ISO date
IPQualityScore25+ data points: honeypot traps, ML models, request velocity, abuse historyReal-time
Spur.us~60M suspect IPs, 1,000+ known VPN/proxy services, session shift detectionDaily
IP2ProxyClasses: VPN, PUB, WEB, TOR, DCH, SES, RES, CPN, EPNDaily

Datacenter proxies appear in these databases almost immediately. Residential proxies take longer but eventually surface when abuse patterns accumulate. Mobile carrier IPs are reassigned frequently by the carrier — MaxMind's "last seen" date ages out stale entries quickly.

2. ASN-Based Detection

Every IP belongs to an Autonomous System Number (ASN). Bot protection classifies ASNs before any application logic runs. AWS WAF even exposes ASN matching as a first-class rule primitive.

Flagged Datacenter ASNs

  • • AWS (AS16509, AS14618)
  • • Google Cloud (AS15169, AS396982)
  • • Hetzner (AS24940)
  • • OVH (AS16276)
  • • DigitalOcean (AS14061)
  • • Linode/Akamai (AS63949)

→ Auto-flagged, rate limited, CAPTCHA'd

Protected Carrier ASNs

  • • AT&T (AS7018, AS20057)
  • • T-Mobile (AS21928)
  • • Verizon (AS22394, AS6167)
  • • Vodafone (AS12430)
  • • Orange, O2, EE, and others

→ Cannot be blocked without blocking millions of real users

This is the foundational reason mobile proxies work: carrier ASNs front real consumer traffic at massive scale. A website that blocks AS21928 loses every T-Mobile customer.

3. TLS Fingerprinting (JA3/JA4)

Before any HTTP content transmits, the TLS ClientHello packet reveals a client fingerprint. Sophisticated detection reads this pre-application and compares it to the claimed User-Agent.

JA3 (Salesforce, 2017)

Hashes TLS ClientHello fields — version, cipher suites, extensions, elliptic curves, EC point formats — into a 32-character MD5 fingerprint. Weakness: Chrome's TLS extension randomization (GREASE + permute-extensions) broke stable JA3 hashes.

JA4 (FoxIO)

36-character fingerprint that normalizes fields to survive extension randomization. Full suite:

JA4TLS ClientHello fingerprint (normalized)
JA4STLS ServerHello response
JA4HHTTP client fingerprint (headers, order)
JA4TTCP fingerprint
JA4XX.509 certificate fingerprint
JA4LLatency-based fingerprint

Cloudflare implements JA4 at the edge using a Rust-based parser and exposes matching primitives like cf.bot_management.ja4 for WAF rules.

DataDome processes ~3 trillion signals daily. For each TLS fingerprint, they record the percentage of known bots, IP quality, and associated OS. Two patterns trigger blocks: (a) fingerprints tied to known bots, and (b) inconsistent combinations — e.g., a curl JA4 with an iPhone User-Agent.

4. TCP/IP Stack Fingerprinting

Even deeper than TLS: the TCP handshake itself leaks OS information. p0f v3 (by lcamtuf) passively analyzes packets to identify the originating OS from the transport layer.

Signature fields

Initial TTL: Windows=128, Linux=64, routers=255 Window size: OS-specific default MSS: segment size advertised TCP options: exact order matters DF flag: Don't Fragment bit

If your User-Agent claims iOS 17 but the TCP stack matches Ubuntu Linux (as it would for a Python requests script or headless Chrome on a VPS), the lie collapses. Same for Go, Node, and any client that inherits the host OS TCP stack.

5. HTTP Header Inconsistencies

Many proxy servers insert revealing headers — often without the client's knowledge. Their presence is a direct giveaway.

Headers that leak proxy usage

Via (RFC 9110)
X-Forwarded-For
X-Real-IP
Forwarded (RFC 7239)
X-Proxy-ID
X-Proxy-Connection
Proxy-Connection
Client-IP
X-BlueCoat-Via
X-Forwarded-Host
X-Forwarded-Proto
X-Cache

Any of these arriving at a server when the client claims a direct connection is high-confidence proxy evidence. F5's published guidance treats X-Forwarded-For as untrusted for security decisions.

6. WebRTC & DNS Leaks

WebRTC uses RTCPeerConnectionto query STUN servers over UDP, enumerating both local and public IP candidates. JavaScript can read these candidates silently. Since HTTP(S) proxies only tunnel TCP, WebRTC's UDP traffic escapes the proxy entirely — exposing the real IP.

Mitigations: media.peerconnection.enabled=false, Chrome's WebRTC Network Limiter extension, or an antidetect browser that spoofs at the API level.

DNS resolver mismatch: if the visible client IP geolocates to city A but DNS queries reach authoritative resolvers via ECS indicating city B, the geo claim is inconsistent. A common tell for split-tunnel proxy setups.

7. Behavioral Signals

Even with a perfect IP + TLS + TCP story, behavioral analysis catches automation. DataDome, PerimeterX (HUMAN), and Akamai Bot Manager monitor:

  • Mouse dynamics: cursor curvature, micro-jitter, velocity/acceleration distributions, click-point entropy
  • Keystroke timing: dwell time (key-down → key-up), flight time (key-up → next key-down), rhythm variance
  • Scroll velocity & momentum: deceleration curves, inertia patterns
  • Mobile-specific: touch pressure, device orientation events, battery API, ambient light
  • Session-level: request timing entropy, navigation plausibility, form-fill timing

DataDome has publicly stated that fingerprint spoofing alone is insufficient — behavioral signals carry equal weight.

Why Mobile Proxies Beat Every Layer

Mobile proxies aren't invisible — they're impractical to block. The economics flip against the website.

Layer 1 (IP reputation)

Carrier-assigned IPs churn frequently. MaxMind's 'last seen' date ages out fast. The IP you use today isn't the IP flagged last week.

Layer 2 (ASN)

Blocking AS21928 blocks every T-Mobile customer. Cloudflare's own data shows CGNAT IPs get rate-limited 3× more but are rarely blocked outright.

Layer 3 (TLS/JA4)

Traffic from a real iPhone through a real carrier produces JA3/JA4 fingerprints identical to consumer devices — because it IS a consumer device.

Layer 4 (TCP stack)

The modem's own TCP stack matches real iOS/Android stacks (not Linux VPS stacks). p0f sees exactly what it would see from a phone.

Layer 5 (headers)

No gateway injects Via, X-Forwarded-For, or X-Proxy-* headers. The egress is the phone's modem itself — same as direct traffic.

Layer 6 (leaks)

WebRTC through a mobile carrier network exposes... a mobile carrier IP. DNS resolves through the carrier's resolver. Geography stays consistent.

Layer 7 (behavior)

This is still on you. Behavioral signals must look human regardless of IP quality. Mobile proxies handle infrastructure; behavior is the operator's job.

Sources

Related Guides

Traffic That Actually Looks Human

Real carrier IPs, real TLS stacks, real CGNAT trust. Test it for $5.