Exponential Backoff for Proxy Retries: Production-Ready Patterns
Rate limits and transient failures are inevitable when scraping at scale. Retrying naively causes thundering-herd spikes and wastes proxy bandwidth. Here's the production pattern AWS and Google Cloud both recommend.
The Four Canonical Formulas
From Marc Brooker's AWS Architecture Blog post on exponential backoff and jitter. Pick one based on how much variance your workload can tolerate.
| Strategy | Formula |
|---|---|
| 1. Exponential (no jitter) | sleep = min(cap, base * 2^attempt) |
| 2. Full Jitter (AWS default) | sleep = random(0, min(cap, base * 2^attempt)) |
| 3. Equal Jitter | sleep = base + random(0, min(cap, base * 2^attempt)) |
| 4. Decorrelated Jitter (AWS SDK adaptive) | sleep = min(cap, random(base, last_sleep * 3)) |
Variables: base = 100ms–1s, cap = 20s–60s for interactive workloads, attempt = retry count.
Why Jitter Matters
Without jitter, clients all retry at t=1s, t=2s, t=4s — causing contention spikes that amplify the original failure. Full jitter spreads retries evenly across the backoff window, minimizing server load.
Marc Brooker's simulations show full jitter completes work faster AND with less total server load than no-jitter exponential. The counter-intuitive result: randomness is cheaper than order.
Python Implementation with Retry-After
Honors Retry-After headers, falls back to full jitter, handles 429s and 5xx separately, and re-raises on transport errors after max attempts.
import random
import time
import requests
def retry_with_backoff(request_fn, max_attempts=5, base=0.5, cap=30):
"""Full jitter exponential backoff with Retry-After support."""
for attempt in range(max_attempts):
try:
response = request_fn()
if response.status_code == 429:
# Honor Retry-After if present
retry_after = response.headers.get("Retry-After")
if retry_after:
sleep_time = int(retry_after)
else:
# Full jitter
sleep_time = random.uniform(0, min(cap, base * (2 ** attempt)))
time.sleep(sleep_time)
continue
if response.status_code >= 500:
sleep_time = random.uniform(0, min(cap, base * (2 ** attempt)))
time.sleep(sleep_time)
continue
return response
except (requests.ConnectionError, requests.Timeout):
if attempt == max_attempts - 1:
raise
sleep_time = random.uniform(0, min(cap, base * (2 ** attempt)))
time.sleep(sleep_time)
raise Exception(f"Max retries ({max_attempts}) exceeded")Node.js Implementation
Same pattern adapted to async/await. Works with any fetch-compatible client (native fetch, undici, axios).
async function retryWithBackoff(fn, { maxAttempts = 5, base = 500, cap = 30000 } = {}) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
const response = await fn();
if (response.status === 429) {
const retryAfter = response.headers.get('retry-after');
const sleep = retryAfter
? parseInt(retryAfter) * 1000
: Math.random() * Math.min(cap, base * (2 ** attempt));
await new Promise(r => setTimeout(r, sleep));
continue;
}
if (response.status >= 500) {
const sleep = Math.random() * Math.min(cap, base * (2 ** attempt));
await new Promise(r => setTimeout(r, sleep));
continue;
}
return response;
} catch (err) {
if (attempt === maxAttempts - 1) throw err;
const sleep = Math.random() * Math.min(cap, base * (2 ** attempt));
await new Promise(r => setTimeout(r, sleep));
}
}
throw new Error(`Max retries (${maxAttempts}) exceeded`);
}Integration with Mobile Proxy IP Rotation
When retries exceed 2 attempts, the current IP is probably the problem. Rotate it before continuing — backoff alone won't fix a flagged IP.
if attempt >= 2:
# Rotate mobile proxy IP
requests.post(
f"https://buy.mobileproxies.org/api/v1/proxies/{SLOT_ID}/switch",
headers={"Authorization": f"Bearer {API_KEY}"},
)
time.sleep(10) # wait for modem reconnectionThe 10-second sleep is the actual LTE modem reconnect time. Shorter waits will hit the same IP because the new one hasn't negotiated yet.
Provider-Specific Retry-After Handling
- →OpenAI API: returns
retry-afteron 429. Honor it. - →Anthropic API: returns
retry-afterplusanthropic-ratelimit-*headers. Honor retry-after; checkresetfor advanced scheduling. - →Most CDNs: return
retry-afterin seconds (integer) or HTTP-date format. Parse both.
Anti-Patterns to Avoid
- ×Fixed delays (no exponential growth)
- ×No jitter — guarantees thundering herds on large fleets
- ×Unbounded retry counts — need a circuit breaker beyond 3–5 attempts
- ×Ignoring retry-after headers — the server knows better than you
- ×Retrying non-idempotent POST requests without idempotency keys
Pair backoff with a circuit breaker: after 5 consecutive failures across attempts, stop retrying that endpoint for 30–60 seconds. Keeps a single bad upstream from consuming your whole worker pool.
Related Articles
Retries Only Help If the IP Is Good
Full API access for programmatic rotation, plus clean carrier IPs that hit fewer 429s in the first place.