Home/Blog/RAG Pipelines with Mobile Proxies
AI & Data

RAG Pipelines with Mobile Proxies

RAG systems retrieve fresh data at query time. That retrieval step often involves scraping — and scraping in production means dealing with IP blocks, rate limits, and geo-gated content. Here is how to wire mobile proxies into LangChain and LlamaIndex without ceremony.

11 min read·LangChain, LlamaIndex, Pinecone, Chroma, Weaviate, pgvector·Last updated: April 2026

Retrieval-Augmented Generation closes the LLM's biggest gap: up-to-date, authoritative context. Instead of relying on what the model memorized during training, RAG fetches relevant documents at query time and hands them to the LLM. In production, this creates a scraping dependency — and a scraping dependency means your pipeline lives or dies by whether your fetches succeed.

1. RAG Architecture Overview

A textbook RAG pipeline has four stages. Each has its own failure modes.

StageWhat happensCommon tooling
IngestionScrape, chunk, embed source documentsLangChain loaders, LlamaIndex readers, Unstructured
Vector storePersist embeddings with metadataPinecone, Chroma, Weaviate, Qdrant, pgvector
RetrievalSemantic search (optional hybrid with BM25)LangChain retrievers, LlamaIndex indexes
GenerationLLM call with retrieved contextOpenAI, Anthropic, any chat-completion model

2. Where Mobile Proxies Fit In

Three integration points. Not every RAG deployment needs all three.

  • Ingestion. The initial scrape of source sites before embedding. One-shot or batch, runs offline. Most users hit their first block here.
  • Live retrieval. For domains that change hourly (news, pricing, inventory), scrape inside the request path and pass directly to the LLM. Higher latency, always fresh.
  • Re-indexing. Periodic background jobs that re-scrape and re-embed. Weekly for stable docs, hourly for news, minutely for markets.

3. LangChain Document Loaders with Proxies

LangChain's WebBaseLoaderuses requests under the hood, which respects standard HTTP_PROXY/ HTTPS_PROXY environment variables.

from langchain_community.document_loaders import WebBaseLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma import os os.environ["HTTPS_PROXY"] = "http://USER:PASS@hostname:http_port" os.environ["HTTP_PROXY"] = "http://USER:PASS@hostname:http_port" # 1. Load loader = WebBaseLoader([ "https://example.com/docs/intro", "https://example.com/docs/api", ]) docs = loader.load() # 2. Chunk splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80) chunks = splitter.split_documents(docs) # 3. Embed + persist store = Chroma.from_documents( chunks, embedding=OpenAIEmbeddings(model="text-embedding-3-small"), persist_directory="./rag_store", )

The environment variables apply to every requestscall in the process. For per-loader control, pass requests_kwargs={"proxies": ...} to WebBaseLoader.

4. LlamaIndex Equivalent

LlamaIndex ships SimpleWebPageReaderfor the same pattern. It also respects HTTPS_PROXY.

from llama_index.readers.web import SimpleWebPageReader from llama_index.core import VectorStoreIndex, Settings from llama_index.embeddings.openai import OpenAIEmbedding import os os.environ["HTTPS_PROXY"] = "http://USER:PASS@hostname:http_port" Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small") docs = SimpleWebPageReader(html_to_text=True).load_data([ "https://example.com/docs/intro", "https://example.com/docs/api", ]) index = VectorStoreIndex.from_documents(docs) query_engine = index.as_query_engine() print(query_engine.query("How do I authenticate?"))

For JavaScript-rendered pages, swap SimpleWebPageReaderfor PlaywrightWebReader — same proxy semantics, Playwright accepts a proxy argument directly.

5. Freshness Patterns

Match the re-ingestion strategy to the volatility of the source:

PatternWhen to useTrade-off
Full re-scrapeWeekly or monthly, stable docsSimple, expensive, embeddings grow stale between runs
IncrementalSitemap with lastmod, updated_at fieldEfficient; requires change detection
Live scrapeNews, pricing, inventory — always freshLatency in the query path; scraper must be fast

6. Caching and Rate Limit Strategy

A production RAG ingester should never re-fetch an unchanged URL. Two caching layers help:

  • HTTP-level cache. requests-cache or httpx-cache respects ETag and Last-Modified. Cuts bandwidth and proxy usage dramatically on re-ingest runs.
  • Content-hash cache. Store SHA-256 of cleaned text per URL. If the hash matches the previous run, skip re-chunking and re-embedding — embedding calls are expensive at OpenAI and Voyage AI pricing.
  • Per-domain token bucket. Use asyncio-throttle or aiolimiter to cap requests per second per host. Rotating mobile IPs does not absolve you of polite crawling.
  • Retry with backoff. See our backoff guide. Respect 429 Retry-After.

Related Guides

Reliable Retrieval Starts at the Network Layer

Carrier-grade mobile IPs for RAG ingestion and live retrieval. Try it for $5.