RAG Pipelines with Mobile Proxies

RAG systems retrieve fresh data at query time. That retrieval step often involves scraping — and scraping in production means dealing with IP blocks, rate limits, and geo-gated content. Here is how to wire mobile proxies into LangChain and LlamaIndex without ceremony.

11 min read·LangChain, LlamaIndex, Pinecone, Chroma, Weaviate, pgvector·Last updated: April 2026

Retrieval-Augmented Generation closes the LLM's biggest gap: up-to-date, authoritative context. Instead of relying on what the model memorized during training, RAG fetches relevant documents at query time and hands them to the LLM. In production, this creates a scraping dependency — and a scraping dependency means your pipeline lives or dies by whether your fetches succeed.

1. RAG Architecture Overview

A textbook RAG pipeline has four stages. Each has its own failure modes.

Stage	What happens	Common tooling
Ingestion	Scrape, chunk, embed source documents	LangChain loaders, LlamaIndex readers, Unstructured
Vector store	Persist embeddings with metadata	Pinecone, Chroma, Weaviate, Qdrant, pgvector
Retrieval	Semantic search (optional hybrid with BM25)	LangChain retrievers, LlamaIndex indexes
Generation	LLM call with retrieved context	OpenAI, Anthropic, any chat-completion model

2. Where Mobile Proxies Fit In

Three integration points. Not every RAG deployment needs all three.

→
Ingestion. The initial scrape of source sites before embedding. One-shot or batch, runs offline. Most users hit their first block here.
→
Live retrieval. For domains that change hourly (news, pricing, inventory), scrape inside the request path and pass directly to the LLM. Higher latency, always fresh.
→
Re-indexing. Periodic background jobs that re-scrape and re-embed. Weekly for stable docs, hourly for news, minutely for markets.

3. LangChain Document Loaders with Proxies

LangChain's WebBaseLoaderuses requests under the hood, which respects standard HTTP_PROXY/ HTTPS_PROXY environment variables.

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
import os

os.environ["HTTPS_PROXY"] = "http://USER:PASS@hostname:http_port"
os.environ["HTTP_PROXY"]  = "http://USER:PASS@hostname:http_port"

# 1. Load
loader = WebBaseLoader([
    "https://example.com/docs/intro",
    "https://example.com/docs/api",
])
docs = loader.load()

# 2. Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80)
chunks = splitter.split_documents(docs)

# 3. Embed + persist
store = Chroma.from_documents(
    chunks,
    embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
    persist_directory="./rag_store",
)

The environment variables apply to every requestscall in the process. For per-loader control, pass requests_kwargs={"proxies": ...} to WebBaseLoader.

4. LlamaIndex Equivalent

LlamaIndex ships SimpleWebPageReaderfor the same pattern. It also respects HTTPS_PROXY.

from llama_index.readers.web import SimpleWebPageReader
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
import os

os.environ["HTTPS_PROXY"] = "http://USER:PASS@hostname:http_port"

Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

docs = SimpleWebPageReader(html_to_text=True).load_data([
    "https://example.com/docs/intro",
    "https://example.com/docs/api",
])

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
print(query_engine.query("How do I authenticate?"))

For JavaScript-rendered pages, swap SimpleWebPageReaderfor PlaywrightWebReader — same proxy semantics, Playwright accepts a proxy argument directly.

5. Freshness Patterns

Match the re-ingestion strategy to the volatility of the source:

Pattern	When to use	Trade-off
Full re-scrape	Weekly or monthly, stable docs	Simple, expensive, embeddings grow stale between runs
Incremental	Sitemap with lastmod, updated_at field	Efficient; requires change detection
Live scrape	News, pricing, inventory — always fresh	Latency in the query path; scraper must be fast

6. Caching and Rate Limit Strategy

A production RAG ingester should never re-fetch an unchanged URL. Two caching layers help:

→
HTTP-level cache. requests-cache or httpx-cache respects ETag and Last-Modified. Cuts bandwidth and proxy usage dramatically on re-ingest runs.
→
Content-hash cache. Store SHA-256 of cleaned text per URL. If the hash matches the previous run, skip re-chunking and re-embedding — embedding calls are expensive at OpenAI and Voyage AI pricing.
→
Per-domain token bucket. Use asyncio-throttle or aiolimiter to cap requests per second per host. Rotating mobile IPs does not absolve you of polite crawling.
→
Retry with backoff. See our backoff guide. Respect 429 Retry-After.

Related Guides

Developer Guide

Reliable Retrieval Starts at the Network Layer

Carrier-grade mobile IPs for RAG ingestion and live retrieval. Try it for $5.

Try for $5 View plans →