Mobile Proxy for Scrapy
Scrapy's built-in HttpProxyMiddleware reads request.meta['proxy'] and routes the outbound socket. Wire it up with a custom downloader middleware that injects the mobile proxy URL and calls our switch API when responses degrade.
Prerequisites
- →Python 3.10+ with
scrapyandrequestsinstalled. - →Mobile proxy slot and API key from mobileproxies.org.
- →An existing Scrapy project, or scaffold one with
scrapy startproject myscraper.
Step-by-Step Configuration
Move credentials to environment variables
# .env (loaded with python-dotenv in settings.py) MP_HOST=proxy.mobileproxies.org MP_PORT=8000 MP_USER=u_4a9c MP_PASS=p_2X7q... MP_API_KEY=YOUR_API_KEY MP_SLOT=us-mob-01
settings.py
# settings.py
import os
from dotenv import load_dotenv
load_dotenv()
BOT_NAME = "myscraper"
# Be polite — mobile slot is a single carrier IP
CONCURRENT_REQUESTS = 4
CONCURRENT_REQUESTS_PER_DOMAIN = 2
DOWNLOAD_DELAY = 1.5
RANDOMIZE_DOWNLOAD_DELAY = True
RETRY_TIMES = 3
RETRY_HTTP_CODES = [429, 500, 502, 503, 504, 522, 524, 408]
# Cookies persist across requests through the same proxy
COOKIES_ENABLED = True
USER_AGENT = ("Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) "
"AppleWebKit/605.1.15 Mobile/15E148 Safari/604.1")
DOWNLOADER_MIDDLEWARES = {
# Inject the proxy URL into every request
"myscraper.middlewares.MobileProxyMiddleware": 350,
# Stock HttpProxyMiddleware reads request.meta["proxy"]
"scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 400,
# React to bans and trigger rotation
"myscraper.middlewares.RotateOnBanMiddleware": 410,
}
MP = {
"host": os.environ["MP_HOST"],
"port": int(os.environ["MP_PORT"]),
"user": os.environ["MP_USER"],
"pass": os.environ["MP_PASS"],
"api_key": os.environ["MP_API_KEY"],
"slot": os.environ["MP_SLOT"],
}middlewares.py — proxy injection
# myscraper/middlewares.py
from urllib.parse import quote
class MobileProxyMiddleware:
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings.get("MP"))
def __init__(self, mp):
self.proxy = (
f"http://{quote(mp['user'])}:{quote(mp['pass'])}"
f"@{mp['host']}:{mp['port']}"
)
def process_request(self, request, spider):
request.meta["proxy"] = self.proxy
# Force tunnel even for plaintext HTTP
request.meta.setdefault("download_timeout", 30)middlewares.py — rotation on ban
# myscraper/middlewares.py (continued)
import time, requests, threading
class RotateOnBanMiddleware:
BAN_CODES = {403, 429, 503}
COOLDOWN = 30 # seconds — don't hammer the switch endpoint
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings.get("MP"))
def __init__(self, mp):
self.mp = mp
self._last_rotate = 0
self._lock = threading.Lock()
def process_response(self, request, response, spider):
if response.status in self.BAN_CODES:
self._rotate(spider)
# Retry this request after rotation
return request.replace(dont_filter=True)
return response
def _rotate(self, spider):
with self._lock:
if time.time() - self._last_rotate < self.COOLDOWN:
return
self._last_rotate = time.time()
r = requests.post(
f"https://buy.mobileproxies.org/api/v1/proxies/{self.mp['slot']}/switch",
headers={"Authorization": f"Bearer {self.mp['api_key']}"},
timeout=10,
)
spider.logger.info(f"rotate → {r.status_code}")
time.sleep(4) # let the new IP bindSpider — example using both
# myscraper/spiders/ip_check.py
import scrapy
class IpCheckSpider(scrapy.Spider):
name = "ip_check"
start_urls = ["https://api.ipify.org?format=json"] * 5
def parse(self, response):
yield {"egress_ip": response.json()["ip"]}Run: scrapy crawl ip_check -o ips.jsonl — confirm the IPs are carrier-owned.
Verify It Works
The 5 requests in start_urls should all return the same mobile IP within a single spider run (sticky session). Trigger a rotation manually mid-run and the next request should show a different IP — both from carrier ASNs, never from datacenter ranges.
Pool of Slots (Higher Throughput)
One mobile slot caps at a few requests/sec. For higher throughput, allocate several slots and round-robin them in the proxy middleware:
# Variation: cycle through a list of slots
import itertools
class PoolProxyMiddleware:
def __init__(self, slots):
self.cycle = itertools.cycle([
f"http://{s['user']}:{s['pass']}@{s['host']}:{s['port']}"
for s in slots
])
def process_request(self, request, spider):
request.meta["proxy"] = next(self.cycle)Common Errors
"TunnelError: Could not open CONNECT tunnel"
Auth failure or wrong port. Run a quick requests.get(..., proxies=...) sanity check outside Scrapy to isolate the credentials issue.
Cookies don't persist across rotations
Expected — a new egress IP looks like a new visitor. If you need session continuity, do the work on one IP and rotate between sessions, not within them.
RotateOnBanMiddleware hammers the switch endpoint
Bumps in COOLDOWN aren't enough under burst load. Use a process-wide rotation lock (e.g. Redis SETNX) so multiple Scrapy workers can't all rotate at once.
Related Guides
Scrapy + Mobile IPs at Scale
$5 trial. Drop the middleware in, run your spiders against carrier IPs, rotate on the API.