Research collection

Proxies for Academic and Market Research Data Collection

The route is part of the method. Record it the same way you record the source list and the sampling window.

Route selection Audit metadata Updated 2026-06-12

The route belongs in the method section

A research crawl without route metadata is not reproducible. When a reviewer asks why Spain has more rows than Germany, the answer needs to come from a log — proxy route, country token, collection window, exclusion rules — not from the analyst's memory. A run that cannot be described cannot be defended.

The collector can be Scrapy, Playwright, a requests script, or a vendor export. What matters is that status codes, timestamps, and the proxy configuration are saved during collection, not reconstructed after the fact.

Start with datacenter; upgrade to residential on geo-filtered sources

Datacenter is the right starting point for open public records, public archives, and plain catalog pages. It costs less, and if the source accepts it, there is no reason to pay for residential. Switch only when the question depends on what a user in a specific country sees, or when the source starts blocking datacenter ASNs.

Volume Residential at $0.89/GB is the standard first step when residential is needed. Premium Residential at $5.00/GB is for targets that consistently waste retries on the cheaper tier — the cost is only justified if failed requests are eating into the datacenter saving anyway. Static ISP proxies carry a stable public IP address and fit cases where the source needs a consistent footprint across a long collection window rather than rotating exits.

The gateway for all pool types is http://proxynade.net:2555 with username and password auth. The expanded username carries routing options inline: base user, plan token (volume, premium, or datacenter), an optional country code, and an optional rotation window. A residential run scoped to Germany with a 20-minute sticky window uses a username like rt97db6958d9-plan-volume-country-de-lifetime-20. Datacenter lines skip the lifetime token.

Sticky per source, hard rotation between sources

When one source requires several paginated requests, keep the exit steady for the duration of that source, then rotate before moving to the next source or market segment. A route change mid-pagination can return different sort orders or break session state, which introduces variance that is hard to separate from real signal.

The lifetime-<minutes> token in the username sets the sticky window. A source that takes roughly 15 minutes to paginate through uses lifetime-15. When the window expires, the pool assigns a new exit automatically. For hard rotation between sources, construct a fresh username or omit the lifetime token and let the pool rotate each connection.

ScenarioRotation modeUsername token
Single source, multiple pagesSticky for source durationlifetime-15 (or appropriate minutes)
New source or new market segmentHard rotationNo lifetime token, or new username each call
Long-window archive crawlStatic ISPPer-IP pricing, no rotation token

App bytes and proxy bytes measure different things

The collection script counts saved records, written files, and maybe screenshots. The proxy meter counts everything it carried: redirect chains, failed TLS attempts, 403s, 429s, blocked HTML, retried requests, and discarded records. The two figures will not match, and both belong in the method notes.

The Proxynade dashboard network logs show host, outcome, latency, and byte totals per request. Usage logs export as CSV. Cross-referencing the two figures shows how much bandwidth went to successful records versus overhead — useful for estimating cost on the next run and for documenting that a high drop rate was a source issue, not a parser bug.

Access rights are not a proxy setting

The proxy handles IP routing. It does not resolve access questions. Subscriptions, paywalls, login walls, private datasets, API contracts, robots.txt restrictions (RFC 9309), personal data rules, and IRB review requirements are not proxy configuration items. If access requires approval, resolve that before touching IP rotation.

The Proxynade AUP at /acceptable-use prohibits unauthorized account creation and ban evasion. A well-scoped research crawl collects from public sources the operator is authorized to access — the proxy configuration does not change that boundary.

Metadata checklist for a replicable dataset

  • Source URL list with access date for each.
  • Proxy route: plan type, country code, rotation mode, lifetime window if sticky.
  • Collection script version or commit hash.
  • Sampling window: start and end timestamps.
  • Exclusion rules: what was dropped and why.
  • App-side byte or record count alongside proxy dashboard byte total.
  • HTTP status code distribution from the run (200s, 403s, 429s, timeouts).

A run documented to that level can be rerun by another analyst without a conversation about what happened the first time.

Research proxy FAQ

When should I use datacenter proxies instead of residential for research? Use datacenter when the target is an open public record, archive, or plain catalog page that does not geo-filter or fingerprint the request. Switch to residential only when the source returns different content by region or starts blocking datacenter ASNs.

What is the difference between sticky and hard rotation for a research crawl? Sticky keeps one exit IP for the duration of one source so pagination stays consistent. Hard rotation assigns a new exit before each new source or market segment to avoid one route biasing the sample.

Why does my app show fewer bytes than the proxy dashboard? The app counts saved records. The proxy meter counts everything it carried: redirects, retries, 403s, 429s, rejected HTML, and discarded records. Both figures belong in the method notes.

What metadata should I record alongside a research dataset? Source URL list, proxy route and country used, collection script version, sampling window (start and end timestamps), exclusion rules applied, and the byte totals from both the app and the proxy dashboard.

Does using a proxy make a research crawl legal? The proxy handles IP routing only. Access rights depend on the source's terms, robots.txt, any applicable data protection rules, and whether the data involves personal information requiring IRB review. Resolve those questions before configuring any rotation.