Datacenter works on boring targets
Datacenter is the route to test first on plain public work: product catalogs, public directories, open HTML endpoints, status pages. Not the starting point for logins, checkout flows, fingerprint-aware pages, or anything that already shows it wants residential behavior. When a sample starts heading there, stop calling it a datacenter candidate.
The price difference is the reason to test it. Datacenter proxies on Proxynade are billed per transferred byte, as are Volume Residential ($0.89/GB) and Premium Residential ($5.00/GB). That gap matters at 100 GB or 300 GB. It matters less if the target produces enough empty pages, 403s, and retries that the run has to be repeated. A cheap route that runs twice is just delayed spend.
Build the sample to match the real job
The first sample is small and intentionally representative. It should include a category page, a detail page, an empty result, an older URL if the site has them, and whichever page type the real collector will hit most. If the real job runs Scrapy with eight concurrent requests and one retry, the sample uses that exact shape. If the real job uses Puppeteer because content renders late, a plain requests check does not validate that path.
Get the proxy URL from the dashboard for that run, not from a wiki page that was last updated in March. Pick Datacenter, pick HTTP or HTTPS, copy the generated output, and store the credential in an environment variable. Keep the protocol fixed for the whole sample; a test that starts with HTTPS and drifts into SOCKS5 halfway through leaves network logs that are harder to compare.
Write down what the sample covers before it runs: target mix, client, protocol, sample size, and the stop thresholds for 403s and 429s. If that note feels tedious, the larger run is too early.
Read provider logs against app output together
After the sample, read the job output and the dashboard network logs side by side. The app might report "done" because it wrote a file, while the logs show timeouts, 403s, 429s, or a string of 200 responses that saved nothing useful. The fields that matter: host, outcome, HTTP status, latency, bytes transferred, and the app's kept record count.
The app counter lies by omission, not by design. It counts rows kept, files saved, screenshots written, or tasks completed. The provider meter also saw redirects, retries, blocked pages, JavaScript payloads, images, partial downloads, and responses the parser discarded. That is how a run that looks small inside the app still produces a real bandwidth bill.
Once the sample has numbers, price the measured GB against the output you actually kept. Datacenter can be cheaper than Volume Residential and much cheaper than Premium Residential, but only when the kept output stays close enough. If the cheaper route turns one run into two, the rate card difference did not save the money it appeared to.
Domain blocklists cut waste — verify first
A domain blocklist is useful after the sample reveals obvious waste. Matching HTTP and HTTPS targets can be rejected at the router, logged with a blocked outcome, and charged zero bytes. That trims media hosts, third-party trackers, and oversized assets the collector never needed.
Before adding a rule, export the usage CSV and group by host. Check which hosts produced kept output. A block rule can make a batch look cheaper simply because it stopped requesting a page that mattered — and that is a slow, downstream way to discover the mistake.
Set stop rules before scaling
Set thresholds before volume goes up: 403 rate, 429 rate, timeout rate, parser-failure rate, and provider-metered GB per kept record. If datacenter fails those rules, move the target to a residential test or drop it for now. Keep route classes separated in any reporting — blended averages are how bad runs hide inside good ones.
| Signal | What it means | Next step |
|---|---|---|
| Rising 403 rate | Target is filtering datacenter ASNs | Run the same sample on Volume Residential before scaling further |
| 429 responses | Request rate is too high for that target | Reduce concurrency; if it persists at low concurrency, add delays |
| High bytes / kept record | Retries, redirects, or discarded assets dominate | Review blocklist; check if retry settings are too aggressive |
| Timeouts with low 4xx | Network or proxy route issue, not a target block | Check proxy credentials and dashboard latency logs |
Datacenter collection FAQ
When should I use datacenter proxies instead of residential? Use datacenter for plain public targets: product catalogs, public directories, open HTML endpoints. Switch to residential when the target consistently returns blocks, 403s, or empty pages under datacenter ASNs.
Why does my app report fewer bytes than the provider meter? The provider meter counts all transferred bytes including redirects, retries, blocked pages, and assets the parser discarded. The app counts only what it kept. That gap is normal and expected.
What stop rules should I set before scaling up? Set thresholds on 403 rate, 429 rate, timeout rate, parser-failure rate, and provider-metered GB per kept record before volume goes up. If any threshold breaks, stop and reassess the route.
How do I build the right sample for datacenter proxies? Sample with the same client and concurrency the real job will use. Include a category page, a detail page, an empty result, and the page type the real job hits most. A curl check does not validate a Puppeteer job.
Do domain blocklists actually reduce bandwidth costs? They can, but verify first. Export the usage CSV and group by host. A block rule can look cheaper because it stopped collecting a page that mattered — check kept output counts before and after.
Pre-scale checklist
- Sample uses the real client and concurrency settings.
- Proxy URL is from the current dashboard run, not a stale config.
- Provider network logs reviewed alongside app output.
- Stop thresholds set for 403 rate, 429 rate, and GB per kept record.
- Route classes kept separate in reporting.