How Job Market Data Is Collected (2026)
Job and recruitment data produced the single most cited web-scraping precedent in US law — hiQ v. LinkedIn. Here's the full, accurate timeline, what the official APIs allow now, and how labor-analytics firms operate within the rules.
Quick Answer
Job market data — postings, salaries, headcount flows, sentiment — is collected from LinkedIn, Indeed, and Glassdoor. The landmark hiQ v. LinkedIn case established that scraping public data isn't a CFAA crime, but the same case ended with hiQ losing on a breach-of-contract claim. Official job-search APIs have largely been sunset, so analytics firms collect public postings directly.
- →hiQ v. LinkedIn: public-data scraping ≠ CFAA violation, but ToS is an enforceable contract
- →Indeed and LinkedIn have sunset their public job-search APIs
- →Firms like Revelio Labs and LinkUp build legal labor-data products from public postings
Labor-market data drives recruiting tools, sales intelligence, and economic analytics: which companies are hiring, what roles pay, where headcount is growing or shrinking. Because the official APIs have closed, much of this is built from public job postings — which is exactly what put it at the center of the defining scraping lawsuit.
What's collected
- →Job postings: title, company, location, description, posting date — the core signal
- →Salary data: posted ranges and modeled compensation estimates
- →Headcount flows & layoffs: hiring velocity, role mix, workforce changes over time
- →Sentiment: employee reviews and ratings (e.g., Glassdoor)
hiQ Labs v. LinkedIn — the full timeline
hiQ built workforce analytics from public LinkedIn profiles. When LinkedIn moved to block it, the five-year fight became the most cited scraping precedent in US law. The nuance that most summaries miss: hiQ won on the CFAA question but ultimately lost on contract.
- Aug 14, 2017
N.D. Cal. district court grants hiQ a preliminary injunction, ordering LinkedIn to stop blocking hiQ's access to public profiles.
- Sept 9, 2019
Ninth Circuit affirms — the CFAA likely does not bar scraping publicly available data.
- June 14, 2021
U.S. Supreme Court grants cert, vacates and remands (“GVR”) for reconsideration in light of Van Buren v. United States.
- Apr 18, 2022
Ninth Circuit reaffirms (No. 17-16783) — scraping public data does not violate the CFAA.
- Nov 2022
On remand, the district court rules hiQ HAD breached LinkedIn's User Agreement — a contract claim, distinct from the CFAA.
- Dec 7, 2022
Stipulated judgment — reported as a $500,000 judgment against hiQ plus an injunction; effectively a settlement (hiQ had wound down).
The official APIs have closed
- →Indeed: deprecated its Publisher Jobs API; the legacy Sponsored Jobs API was sunset January 1, 2022 (XML integration March 31, 2022), with further endpoint decommissions through June 2024. There is no public job-search API today.
- →LinkedIn: has no public profile or job-search query API. Its Jobs API is posting-only for approved ATS/enterprise partners; legacy and Content APIs were sunset February 28, 2023.
- →Glassdoor: broad public API access is restricted (specifics not deeply verified here).
With sanctioned query APIs gone, labor-data products are built from public postings — which is why the hiQ contract distinction matters so much for anyone in this space.
How labor-analytics firms operate
A whole industry turns public job data into structured intelligence:
- →Revelio Labs standardizes 1.1B+ public employment records (profiles, postings, sentiment, layoff notices) into workforce-intelligence feeds for investors, governments, and HR, and publishes Revelio Public Labor Statistics from 100M+ US profiles.
- →LinkUp indexes millions of job listings daily directly from employer websites — not aggregators — positioning on data accuracy.
The common thread: collect public data, structure it, and respect terms — turning raw postings into a defensible analytics product.
Where mobile & geo IPs fit
Job markets are inherently regional — postings, salaries, and availability differ by country and metro, and the big sites localize content by IP. Geo-distributed mobile IPs let you capture region-specific labor data accurately and spread load across the heavy anti-bot defenses these sites deploy.
They're infrastructure for legitimate, geo-distributed collection of public data — not a way around a site's Terms. The hiQ case is the reminder: public-data access can be lawful while contract terms still bind. Respect the Terms, throttle, and collect only public data.
Sources
Related Guides
Collect regional labor data accurately
Geo-distributed 4G/5G IPs for legitimate, region-accurate public job-data collection. Test it for $5.