RAG Pipelines with Mobile Proxies
RAG systems retrieve fresh data at query time. That retrieval step often involves scraping — and scraping in production means dealing with IP blocks, rate limits, and geo-gated content. Here is how to wire mobile proxies into LangChain and LlamaIndex without ceremony.
Retrieval-Augmented Generation closes the LLM's biggest gap: up-to-date, authoritative context. Instead of relying on what the model memorized during training, RAG fetches relevant documents at query time and hands them to the LLM. In production, this creates a scraping dependency — and a scraping dependency means your pipeline lives or dies by whether your fetches succeed.
1. RAG Architecture Overview
A textbook RAG pipeline has four stages. Each has its own failure modes.
| Stage | What happens | Common tooling |
|---|---|---|
| Ingestion | Scrape, chunk, embed source documents | LangChain loaders, LlamaIndex readers, Unstructured |
| Vector store | Persist embeddings with metadata | Pinecone, Chroma, Weaviate, Qdrant, pgvector |
| Retrieval | Semantic search (optional hybrid with BM25) | LangChain retrievers, LlamaIndex indexes |
| Generation | LLM call with retrieved context | OpenAI, Anthropic, any chat-completion model |
2. Where Mobile Proxies Fit In
Three integration points. Not every RAG deployment needs all three.
- →Ingestion. The initial scrape of source sites before embedding. One-shot or batch, runs offline. Most users hit their first block here.
- →Live retrieval. For domains that change hourly (news, pricing, inventory), scrape inside the request path and pass directly to the LLM. Higher latency, always fresh.
- →Re-indexing. Periodic background jobs that re-scrape and re-embed. Weekly for stable docs, hourly for news, minutely for markets.
3. LangChain Document Loaders with Proxies
LangChain's WebBaseLoaderuses requests under the hood, which respects standard HTTP_PROXY/ HTTPS_PROXY environment variables.
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
import os
os.environ["HTTPS_PROXY"] = "http://USER:PASS@hostname:http_port"
os.environ["HTTP_PROXY"] = "http://USER:PASS@hostname:http_port"
# 1. Load
loader = WebBaseLoader([
"https://example.com/docs/intro",
"https://example.com/docs/api",
])
docs = loader.load()
# 2. Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80)
chunks = splitter.split_documents(docs)
# 3. Embed + persist
store = Chroma.from_documents(
chunks,
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
persist_directory="./rag_store",
)The environment variables apply to every requestscall in the process. For per-loader control, pass requests_kwargs={"proxies": ...} to WebBaseLoader.
4. LlamaIndex Equivalent
LlamaIndex ships SimpleWebPageReaderfor the same pattern. It also respects HTTPS_PROXY.
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
import os
os.environ["HTTPS_PROXY"] = "http://USER:PASS@hostname:http_port"
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
docs = SimpleWebPageReader(html_to_text=True).load_data([
"https://example.com/docs/intro",
"https://example.com/docs/api",
])
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
print(query_engine.query("How do I authenticate?"))For JavaScript-rendered pages, swap SimpleWebPageReaderfor PlaywrightWebReader — same proxy semantics, Playwright accepts a proxy argument directly.
5. Freshness Patterns
Match the re-ingestion strategy to the volatility of the source:
| Pattern | When to use | Trade-off |
|---|---|---|
| Full re-scrape | Weekly or monthly, stable docs | Simple, expensive, embeddings grow stale between runs |
| Incremental | Sitemap with lastmod, updated_at field | Efficient; requires change detection |
| Live scrape | News, pricing, inventory — always fresh | Latency in the query path; scraper must be fast |
6. Caching and Rate Limit Strategy
A production RAG ingester should never re-fetch an unchanged URL. Two caching layers help:
- →HTTP-level cache.
requests-cacheorhttpx-cacherespects ETag and Last-Modified. Cuts bandwidth and proxy usage dramatically on re-ingest runs. - →Content-hash cache. Store SHA-256 of cleaned text per URL. If the hash matches the previous run, skip re-chunking and re-embedding — embedding calls are expensive at OpenAI and Voyage AI pricing.
- →Per-domain token bucket. Use
asyncio-throttleoraiolimiterto cap requests per second per host. Rotating mobile IPs does not absolve you of polite crawling. - →Retry with backoff. See our backoff guide. Respect 429 Retry-After.
Related Guides
Reliable Retrieval Starts at the Network Layer
Carrier-grade mobile IPs for RAG ingestion and live retrieval. Try it for $5.