Large jobs fail quietly
Small scraping jobs fail loudly. Large ones fail quietly for an hour and leave you with a clean-looking mess. The first warning is rarely a crash. It is accepted rows dropping, retry counts climbing, or detail pages returning empty shells while the worker keeps saying "done."
Rotating proxies help when each request can stand on its own. Public search pages, product pages, and directory pages usually fit that pattern — if you are authorized to collect them. Rotation changes the exit IP. It does not make a bad target good, and it does not authorize access to targets that deny automated collection.
Split the queue before adding workers
Discovery pages find URLs. Detail pages confirm fields. Retry jobs sit in a separate lane with a hard budget. A useful first batch is one domain, one page type, two or three workers, and a stop trigger that fires when retries rise faster than accepted rows. That shape catches bad scale before it gets expensive.
Country targeting belongs in the test run, not after the crawl is already large. If a target changes price, stock, language, or availability by country, run a separate queue for that market with the appropriate routing token. On a Proxynade pool, add country-<cc> to the expanded username — for example, rt97db6958d9-plan-volume-country-de for German exits.
Sticky sessions are for flows that need a consistent exit across several steps. Broad collection should not inherit sticky settings just because one checkout flow needed them. On a Proxynade pool, the lifetime-<minutes> token in the username sets the rotation window for residential plans. Leave it off when you want a fresh exit per request.
Measure accepted rows, not request count
A crawler that fires 200,000 requests and accepts 18,000 rows is not healthy just because the graph goes up. It is buying retries. The numbers that matter are accepted rows per minute, retries per accepted row, p95 latency, bytes per accepted row, and duplicate rate.
The acceptance gate should be strict enough to be annoying. A good row needs the expected status code, a sane response size, the page marker your parser expects, and a dedupe key. For search pages, store the page URL, market, page number or cursor, parser version, and row count. For detail pages, store the source URL, entity ID when available, timestamp, parser version, and the field set you actually used.
Keep the proxy line in environment variables
Generate the proxy line from the Proxynade dashboard for the plan and mode you are testing. The gateway is http://proxynade.net:2555 with username/password auth. Keep the line in environment variables, not the repo and not the job log. One generated line per plan is easier to debug than a dozen copied strings where nobody remembers which one had country targeting or sticky mode set.
The Proxynade dashboard network logs show host, outcome, latency, and byte totals per request. Usage logs export as CSV. Those two sources together let you join proxy bytes to saved records and find which domains are burning bandwidth on retries.
The proxy meter counts more than your app does
Your scraper may save a 3 KB row. The provider meter counts redirects, blocked pages, retries, images, fonts, scripts, and browser warm-up traffic. Ten thousand pages at 150 KB each sounds like 1.5 GB. Add retries, soft-block loops, and a browser renderer, and that number is gone before the job finishes.
App-level counters report what the app cared about: saved rows, parsed bodies, successful responses. The proxy meter reports what the provider carried. That gap is why a run can look clean in the collector log while the bandwidth bill does not match.
Stop before adding more rotation
Raise workers only while accepted rows rise faster than retries and bytes per accepted row stays flat. If bytes per accepted row doubles, stop. If one host slows while others stay normal, stop. If 403, 407, 429 Too Many Requests, or empty-page responses become the main output, stop. More rotation will not fix a parser, policy, or queue problem.
| Signal | What it usually means | Next check |
|---|---|---|
| Retries rising faster than accepted rows | Parser, target block, or policy change | Run a single URL manually; confirm the parser still matches the page shape. |
| Bytes per accepted row doubles | Retries and block pages burning bandwidth | Check the retry lane; confirm the acceptance gate is actually filtering. |
407 Proxy Auth Required | Credentials failed at the proxy | Check username format, password, and account balance. |
| All workers slow simultaneously | Network or target rate limit | Reduce concurrency; check dashboard latency logs for the affected host. |