Dropshipping Product Research Automation
Finding winning products manually takes hours per niche. Automation with proxies scales this to thousands of products per hour across AliExpress, Amazon Best Sellers, and Shopify stores.
1. Sources Worth Scraping
Not all data sources give equal signal. The high-value ones:
| Source | Signal | Path |
|---|---|---|
| AliExpress Best Sellers | Orders count, price, supplier rating | /w/wholesale-*.html |
| Amazon Movers & Shakers | 24-hour rank change %, category | /gp/movers-and-shakers |
| Amazon Best Sellers | Daily rank, price, rating count | /Best-Sellers |
| Shopify stores | Catalog via /products.json, recent drops | /products.json |
| TikTok trending (TikTok Shop) | Product hashtags, view velocity | Public /shop/discover |
2. AliExpress Anti-Bot
AliExpress ships Alibaba Group's in-house bot detection. The tells:
- →Slider captcha ("nc_acb") that triggers quickly on datacenter IPs, especially on search and category pages
- →IP reputation weighting with a strong penalty on Chinese-registered datacenter blocks and any ASN marked as hosting
- →JS-only price rendering on some regions — headless Chrome or Playwright is sometimes needed
- →Region cookies (
aep_usuc_f) that control currency and shipping display — set them explicitly or scrape what consumers in the target market actually see
US carrier mobile IPs pass the reputation check easily and, because Alibaba sees tons of legitimate US mobile traffic, rarely get slider-challenged.
3. Python: Tracking Product Velocity
The signal that separates winners from everything else is velocity: how fast orders/reviews/rank are growing. Snapshot daily and diff the orders_count:
import requests, json, datetime, pathlib
headers = {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15",
"Accept-Language": "en-US,en;q=0.9",
}
proxies = {
"http": "http://USER:PASS@hostname:http_port",
"https": "http://USER:PASS@hostname:http_port",
}
STORE = pathlib.Path("velocity.json")
def snapshot(products):
"""products: [{product_id, orders_count, price}]"""
today = datetime.date.today().isoformat()
state = json.loads(STORE.read_text()) if STORE.exists() else {}
for p in products:
pid = str(p["product_id"])
state.setdefault(pid, {})[today] = {
"orders": p["orders_count"],
"price": p["price"],
}
STORE.write_text(json.dumps(state, indent=2))
def velocity(product_id, days=7):
state = json.loads(STORE.read_text())
history = state.get(str(product_id), {})
dates = sorted(history.keys())[-days:]
if len(dates) < 2:
return None
start = history[dates[0]]["orders"]
end = history[dates[-1]]["orders"]
return (end - start) / len(dates) # avg new orders per dayProducts with a rising 7-day velocity AND a stable-to-rising price are the candidates worth testing — that combo indicates demand outpacing supply, not a race to the bottom.
4. Price Arbitrage Detection
Classic dropshipping arbitrage: supplier price on AliExpress vs. retail price on Amazon/Shopify. The interesting ratio is usually:
margin_ratio = (retail_price - supplier_price - shipping) / retail_price
Products with a margin ratio of 0.4+ and a rising velocity on both the retail side and the supplier side are strong candidates. Products with high margin but falling supplier velocity are saturated.
Image similarity matching (perceptual hashes like pHash) is how you link an AliExpress listing to the same product on Amazon — titles and SKUs rarely match exactly, but hero images usually do.
5. Shopify Spy via /products.json
For any individual Shopify competitor, scrape /products.json daily and diff. New products that appear with high volume of variants typically signal a store testing a winner. Sort by created_at to see the freshest drops:
def recent_drops(store_url, days=7):
url = f"{store_url}/products.json?limit=250&page=1"
r = requests.get(url, headers=headers, proxies=proxies, timeout=30)
products = r.json().get("products", [])
cutoff = datetime.datetime.utcnow() - datetime.timedelta(days=days)
return [
p for p in products
if datetime.datetime.fromisoformat(p["created_at"].replace("Z", "")) > cutoff
]Public trending-store lists (Shopify Stars, Stores.watch) give you the store pool. Crawl that pool daily with a rotating mobile IP and a modest per-store request rate.
6. Ethical & Legal Considerations
- →Don't clone listings. Copying titles, descriptions, and photos verbatim is copyright infringement. Use scraped data as a research signal, not as final listing content.
- →Respect robots.txt as intent. Even where scraping is legal (see hiQ v LinkedIn), pay attention to what a site is asking of automated clients.
- →Pace yourself. Aggressive request rates cost small stores real money in bandwidth. Pace at a rate a human browsing the same store would generate.
- →Verify trademark & MAP before listing. A good arbitrage spread doesn't matter if the brand has an active MAP policy or enforced trademark.
Related Guides
Automate Your Product Research
Carrier IPs that get past AliExpress sliders and Amazon captchas. Rotate on demand. Test it for $5.