Bandwidth notes

Bandwidth burn: why the proxy bill exceeds your app's count

The meter counts bytes your parser never wanted.

Provider-metered bytes Usage logs Updated 2026-06-12

The meter and the app see different traffic

Bandwidth burn is proxy bandwidth that crossed the gateway but produced nothing useful. Not failed requests in the network-error sense — more like the extra traffic attached to a page: browser assets, tracking scripts, fonts, images, redirect chains, challenge pages, and retries that kept moving bytes after the useful part was gone.

App dashboards undercount this routinely. The app counts accepted responses or saved rows. The proxy meter counts provider-metered bytes at the tunnel layer. If a browser pulled a 200-status block page, three font files, two image variants, and a redirect chain before the parser discarded the page, those bytes still crossed the proxy. That is why "we only downloaded 3 GB" and "the panel says 8 GB" can both be accurate from their own vantage points.

The log row worth keeping before changing anything: route name, target URL, status code, provider bytes, saved rows, proxy line used. Without it, every later cost argument is guesswork.

The gap shows up clearly in usage logs

A retail crawl in browser mode makes the gap concrete:

run: price pages, browser mode
plan: Volume Residential
accepted rows: fine, not amazing

app view:
  saved bodies: 3.2 GB
  failed pages: mostly ignored
  rows exported: 41k

provider view:
  metered: 8.6 GB
  top waste hosts:
    image CDN, not used by parser
    analytics bundle, loaded every page
    font host, small files but everywhere
    challenge page, status 200, no rows

note: don't scale workers until this is cleaned up

The exact hosts change by target. The shape does not. The crawl looks healthy in the app because rows are being saved, while the proxy bill is full of traffic the parser never touched.

Plan cost is not the first lever

Residential proxies make burn more painful because the wasted bytes are expensive bytes. Volume Residential at $0.89/GB can handle broad collection, but a noisy browser run depletes balance before anyone notices. Premium Residential at $5.00/GB may improve acceptance on stricter targets, but it also makes waste hurt more if the job is still pulling assets and retrying blocked pages.

Switching plans moves the per-GB rate but not the volume. The fix comes from usage logs first. Export from the dashboard, sort by bytes, not request count. A host with ten large responses matters more than a host with five hundred small ones. Mark each top host: required, optional, or waste. Block the waste, then run a small sample before touching worker count.

Do not block blindly. Some CDN hosts carry required data. Some image URLs are the content. Some "analytics" endpoints gate the API response. The audit is slow, but guessing breaks a working scraper and calls it optimization.

Block assets in the browser, then read the logs again

Playwright is the usual place to reduce browser-mode burn because the browser causes most of it. Start with obvious targets, then watch what disappears from the provider logs.

// first pass — tighten per target after this
await page.route('**/*', route => {
  const url = route.request().url();
  if (/\.(png|jpe?g|gif|webp|svg|woff2?)$/i.test(url)) return route.abort();
  if (/doubleclick|analytics|pixel|beacon/i.test(url)) return route.abort();
  return route.continue();
});

If bytes drop and row counts stay flat, the block is working. If rows fall, the block was too broad or the target hid data in one of those asset URLs. Keep each change narrow enough that the failure is readable.

Retrying block pages burns budget silently

A blocked page that returns status 200 with a challenge title registers as a success to generic retry logic. The selector fails to match, the code retries as if it were a network error, and the provider counts every attempt. The bill grows while the output stays flat.

Log failure categories separately: network timeout, HTTP error, selector miss, challenge-page title, empty body, parser reject. They should not share a retry policy. A timeout might warrant a route change. A challenge page returning the same title four times deserves a stop, not another residential GB.

The reduction loop is linear, not clever

Export usage logs. Sort by bytes. Compare provider-metered bytes against accepted rows. Block one waste host class. Run a small sample. Check bytes per accepted row. Repeat until the large waste is gone, then scale workers.

That is the whole thing. Not trusting the app dashboard. Not swapping proxy plans first. The gateway log is the cost record, and the cost record includes all the browser traffic the scraper pretended not to see.

Bandwidth burn FAQ

Why does the proxy bill show more GB than my app logged? The proxy meter counts bytes at the tunnel layer — redirects, font files, challenge pages, and failed attempts all count. The app only sees accepted responses or saved rows.

What is bandwidth burn? Proxy-metered bytes that crossed the gateway but produced no useful output — browser assets, tracking scripts, retried block pages, and redirect chains.

How do I find which hosts are wasting bandwidth? Export the dashboard usage logs, sort by bytes transferred, and mark each host as required, optional, or waste. The top few hosts usually account for most of the gap.

Should I switch proxy plans to reduce costs? Not first. Switching plans moves the per-GB rate but not the volume. Fix waste hosts and rerun a sample. If bytes per accepted row drops significantly, then compare plan costs.

Can blocking resources break a scraper? Yes. Some CDN hosts carry required data, not just assets. Block one class at a time, run a small sample, and verify row counts before scaling the block list.

Reduction checks

  • Sort usage logs by bytes, not request count.
  • Block nonessential asset hosts one class at a time.
  • Compare provider-metered bytes against accepted rows.
  • Log failure categories separately and give each its own retry policy.
  • Run a small sample after each block change before scaling.