How to know if you actually need mobile proxies (without buying any)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    How to know if you actually need mobile proxies (without buying any)

    Every scraping project I start, the same question comes up: do I actually need mobile

    proxies for this target, or will residential or datacenter do?


    Picking wrong on this is the most expensive mistake on a scraping project. Too cheap and

    your requests get blocked — you pay for traffic that achieves nothing. Too expensive

    and your margins evaporate; mobile carrier IPs run roughly 5–10× the per-GB rate of

    datacenter ones. And the answer changes per target: a sitemap crawl on a documentation

    site doesn't need carrier-grade trust; the same scraper pointed at Nike's product pages

    will be rejected from a datacenter IP within a hundred requests.


    I got tired of doing this analysis manually — running curl -i against the target,

    grepping for the familiar markers, mentally mapping them to vendors — so I packaged the

    heuristic into a CLI.






    npx anti-bot-sniffer https://www.nike.com











    Nike delivers innovative products, experiences and services to inspire athletes.

    status 200 · 7 cookies set

    Detected
    ● Akamai Bot Manager
    via ak_bmsc cookie
    Enterprise-grade. Behavior + IP scoring; carrier ASN avoids
    most challenges.

    Recommended proxy tier
    ▶ MOBILE CARRIER







    The tool is open-source (MIT) at github.com/atheris-ee/anti-bot-sniffer. Zero runtime dependencies, Node 18+. The rest of this

    post is a quick tour of what it does and the reasoning behind the recommendations,

    since picking the right tier matters whether you use this tool or not.


    ## What the tool actually checks


    A single GET request with a normal browser-ish User-Agent, follows up to 5 redirects,

    reads the first 64KB of response body, then matches against a signature catalog. It

    looks at three places:

    1. Response headers — cf-ray, server, x-dd-b, x-kpsdk-cd, and so on. CDN and
      WAF vendors leak identity here even when they don't mean to.
    2. Set-Cookie names — __cf_bm, _abck, _px3, incap_ses_*. Cookies set on
      the first response are the cleanest signal of what's running, because they're set
      before the page renders.
    3. HTML markers — js.datadome.co, challenges.cloudflare.com/turnstile,
      captcha.px-cdn.net. Vendor scripts embedded in the initial HTML.


    No JavaScript execution. The tool runs in milliseconds and doesn't spin up a browser.


    ## What it can — and can't — see


    Catches the outer wall:
    • CDN / WAF identity (Cloudflare, Akamai, Imperva, AWS WAF, Sucuri…)
    • Bot management add-ons (Cloudflare BM, DataDome, PerimeterX/HUMAN, Kasada, Akamai Bot
      Manager, F5/Shape)
    • Challenge widgets (reCAPTCHA, hCaptcha, Turnstile)


    Doesn't catch:
    • Client-side JS fingerprinting (canvas, WebGL, AudioContext, behavior heuristics)
    • Anti-bot vendors that defer detection until specific user actions
    • Custom in-house systems with no public markers


    So if anti-bot-sniffer says "nothing detected," that doesn't guarantee the target is

    friendly to bots — it guarantees the target hasn't put a known anti-bot vendor between

    you and the document. That's enough information to start with datacenter and escalate

    if you see challenges, which is the right calibration for most workflows anyway.


    ## How the recommendations map to proxy tiers


    Three tiers, in order of strictness:


    mobile — only real mobile carrier IPs reliably pass. Triggered by: Cloudflare Bot

    Management, DataDome, PerimeterX/HUMAN, Akamai Bot Manager, Kasada, F5/Shape. The reason

    mobile is the answer here isn't magic — it's CGNAT. Mobile carriers share each

    public IP among hundreds or thousands of subscribers, so IP-level reputation scoring is

    unreliable. Blocking one mobile IP would block hundreds of real customers, so anti-bot

    platforms treat carrier ASNs leniently by default.


    residential — residential ISP pool usually works, sometimes mobile is needed.

    Triggered by: AWS WAF, Imperva/Incapsula, base Cloudflare CDN without Bot Management.

    Residential IPs blend with real home traffic at the ISP-ASN layer. Cheaper than mobile,

    but the well-known pool ASNs (the big-three residential providers' ranges) are

    increasingly being flagged by anti-bot platforms that watch for concurrent-automation

    patterns.


    datacenter — datacenter usually fine. Triggered by: Sucuri, Wordfence, or no

    detected anti-bot. These are mostly application-rule WAFs that don't score IP class

    aggressively. A datacenter proxy at sane request rates passes most of these without

    challenges.


    I wrote a longer breakdown of when each tier is actually the right answer — including

    the cases where datacenter is correct despite being the cheapest — at Mobile vs

    residential vs datacenter proxies — how to

    choose
    .


    ## Three sample probes


    To make the output concrete, here's what three well-known targets return:


    example.com — base Cloudflare CDN, no Bot Management:






    Detected
    ◐ Cloudflare (base CDN tier)
    via server: cloudflare

    Recommended proxy tier
    ▶ RESIDENTIAL







    www.cloudflare.com — running their own Bot Management:






    Detected
    ● Cloudflare Bot Management
    via __cf_bm cookie

    Recommended proxy tier
    ▶ MOBILE CARRIER







    example.org — no anti-bot detected:






    ◯ No anti-bot stack detected from HTTP signals.

    Recommended proxy tier
    ▶ DATACENTER (OK)







    The --json flag emits a stable structured shape, so you can pipe it into

    target-tracking spreadsheets, CI, or whatever:






    $ npx anti-bot-sniffer nike.com --json | jq '.recommendedTier'
    "mobile"







    ## The honest gaps


    The signature catalog covers the major vendors but isn't exhaustive. Coverage I'd like

    in future versions but didn't land in v0.1: GeeTest, Friendly Captcha, Bot Master Lab,

    Reblaze, Radware. If you hit a target that should match a particular vendor and doesn't,

    drop a curl -iL snippet in an

    issue
    — I'll add the detection.


    I'd also welcome contributions on the recommendation logic itself. The tier mapping is

    2025 industry consensus but varies per target. A site running Cloudflare base CDN often

    passes from datacenter at low request rates and trips at high ones — the tool can't tell

    you the request-rate boundary, only that the platform might enforce one. PRs that

    surface that nuance are welcome.


    ## Where this came from


    Disclosure: I run Atheris, a small mobile and residential

    proxy reseller in Estonia. This tool is independent, MIT-licensed, and works regardless

    of where you buy proxies. The recommendation logic deliberately tells you to use

    datacenter when datacenter is enough — we'd rather earn the customers whose workloads

    actually need mobile than upsell the ones whose workloads don't.


    I wrote it because every prospect's first question was the same one this tool answers,

    and forcing them to sign up for a paid plan just to find out whether mobile proxies were

    the right tool felt like the wrong friction to put first. Releasing it as OSS solves

    the friction problem permanently: people learn the answer, decide for themselves, and

    the ones who do need mobile can find us if they want.


    If you find it useful, a star on the

    repo
    would help others find it too. PRs

    and issues welcome.


    Further reading: Mobile vs residential vs datacenter

    proxies
    .




    More...
Working...