How to Avoid Bot Detection While Scraping in Python


Anti-bot systems have improved dramatically. A vanilla Playwright script fails on roughly 40% of commercial sites. Here’s what’s being detected and how to address it.

What Gets Detected

1. The WebDriver Flag

Browsers automated via CDP expose navigator.webdriver = true. Every anti-bot system checks this.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # This gets detected immediately
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://bot.sannysoft.com/")
    # Check the "WebDriver" row - it'll say "present"

Fix: Launch with automation detection disabled:

browser = p.chromium.launch(
    args=["--disable-blink-features=AutomationControlled"]
)

2. Navigator Properties

Anti-bot scripts probe various navigator properties looking for inconsistencies:

// What detection scripts check
navigator.webdriver        // Should be undefined
navigator.plugins.length   // Should be > 0
navigator.languages        // Should match Accept-Language header
navigator.platform         // Should match User-Agent
navigator.hardwareConcurrency  // 1 is suspicious
navigator.deviceMemory     // Should exist

Fix: Use a properly configured context:

context = browser.new_context(
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    viewport={"width": 1920, "height": 1080},
    locale="en-US",
    timezone_id="America/New_York",
    permissions=["geolocation"],
)

# Inject additional properties
page = context.new_page()
page.add_init_script("""
    Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
    Object.defineProperty(navigator, 'plugins', {
        get: () => [1, 2, 3, 4, 5]
    });
    Object.defineProperty(navigator, 'languages', {
        get: () => ['en-US', 'en']
    });
""")

3. Headless Detection

Headless Chrome has detectable differences from headed Chrome:

  • Missing chrome.runtime object
  • window.chrome behaves differently
  • Different WebGL renderer strings
  • Missing audio/video codecs
# Headed mode passes more checks but is slower
browser = p.chromium.launch(headless=False)

# New headless mode (Chrome 109+) is harder to detect
browser = p.chromium.launch(
    headless=True,
    args=["--headless=new"]  # Uses new headless implementation
)

4. TLS Fingerprinting (JA3/JA4)

Your TLS handshake has a fingerprint. Requests from Python’s TLS stack look different from Chrome’s.

This is why requests + fake User-Agent fails—the TLS fingerprint doesn’t match Chrome.

Playwright uses the real browser’s TLS stack, so this isn’t an issue for browser automation. It matters if you’re mixing requests with Playwright.

5. Canvas and WebGL Fingerprinting

Browsers render graphics slightly differently. Anti-bot systems detect:

  • Identical canvas fingerprints across sessions
  • WebGL vendor/renderer that doesn’t match the User-Agent
  • Headless-specific rendering artifacts
# Check your canvas fingerprint
page.goto("https://browserleaks.com/canvas")

Partial fix: Inject noise into canvas operations:

page.add_init_script("""
    const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
    HTMLCanvasElement.prototype.toDataURL = function(type) {
        if (type === 'image/png') {
            const context = this.getContext('2d');
            const imageData = context.getImageData(0, 0, this.width, this.height);
            for (let i = 0; i < imageData.data.length; i += 4) {
                imageData.data[i] ^= Math.floor(Math.random() * 2);  // Tiny noise
            }
            context.putImageData(imageData, 0, 0);
        }
        return originalToDataURL.apply(this, arguments);
    };
""")

6. Timing and Behavior Patterns

Bots behave differently than humans:

  • Instant actions: Real users don’t click 0ms after page load
  • Linear scrolling: Humans scroll erratically, not in perfect increments
  • No mouse movement: Bots often click without moving the mouse there first
import random
import asyncio

async def human_like_click(page, selector):
    element = page.locator(selector)
    box = await element.bounding_box()

    # Move mouse with slight randomness
    await page.mouse.move(
        box["x"] + box["width"] / 2 + random.uniform(-5, 5),
        box["y"] + box["height"] / 2 + random.uniform(-5, 5),
        steps=random.randint(10, 25)  # Multiple steps = curve
    )

    # Random delay before click
    await asyncio.sleep(random.uniform(0.1, 0.3))

    await page.mouse.click(
        box["x"] + box["width"] / 2,
        box["y"] + box["height"] / 2
    )

async def human_like_type(page, selector, text):
    await page.locator(selector).click()
    for char in text:
        await page.keyboard.type(char)
        await asyncio.sleep(random.uniform(0.05, 0.15))  # Typing speed varies

The Full Stealth Setup

Combining everything:

from playwright.sync_api import sync_playwright
import random

def create_stealth_browser():
    p = sync_playwright().start()

    browser = p.chromium.launch(
        headless=False,  # Or headless=True with --headless=new
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-dev-shm-usage",
            "--no-sandbox",
            "--disable-setuid-sandbox",
            "--disable-infobars",
            "--window-size=1920,1080",
            "--start-maximized",
        ]
    )

    context = browser.new_context(
        viewport={"width": 1920, "height": 1080},
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        locale="en-US",
        timezone_id="America/New_York",
        geolocation={"latitude": 40.7128, "longitude": -74.0060},
        permissions=["geolocation"],
    )

    page = context.new_page()

    # Stealth scripts
    page.add_init_script("""
        // Remove webdriver flag
        Object.defineProperty(navigator, 'webdriver', { get: () => undefined });

        // Fix plugins
        Object.defineProperty(navigator, 'plugins', {
            get: () => {
                const plugins = [
                    { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
                    { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
                    { name: 'Native Client', filename: 'internal-nacl-plugin' },
                ];
                plugins.length = 3;
                return plugins;
            }
        });

        // Fix languages
        Object.defineProperty(navigator, 'languages', {
            get: () => ['en-US', 'en']
        });

        // Fix chrome object
        window.chrome = {
            runtime: {},
            loadTimes: () => {},
            csi: () => {},
            app: {}
        };

        // Fix permissions
        const originalQuery = window.navigator.permissions.query;
        window.navigator.permissions.query = (parameters) => (
            parameters.name === 'notifications' ?
            Promise.resolve({ state: Notification.permission }) :
            originalQuery(parameters)
        );
    """)

    return p, browser, page

# Usage
p, browser, page = create_stealth_browser()
page.goto("https://bot.sannysoft.com/")
# Check results - most should pass

IP Reputation: The Hard Problem

All the browser stealth in the world won’t help if your IP is flagged.

Datacenter IPs: Most are pre-flagged. AWS, GCP, Azure IP ranges are well-known.

Residential proxies: Look like real users but cost money.

# Using a proxy with Playwright
browser = p.chromium.launch(
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass"
    }
)

Proxy rotation: For high-volume scraping, rotate IPs per request or per session:

proxies = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    # ...
]

def get_browser_with_proxy():
    proxy = random.choice(proxies)
    browser = p.chromium.launch(proxy={"server": proxy})
    return browser

Testing Your Setup

Use these sites to check what’s being detected:

  1. https://bot.sannysoft.com/ - Comprehensive detection tests
  2. https://browserleaks.com/ - Fingerprinting details
  3. https://pixelscan.net/ - Commercial-grade detection
  4. https://www.browserscan.net/ - Browser fingerprint analysis

What Doesn’t Work

User-Agent alone: Changing UA without fixing other signals is useless.

Random delays only: If everything else screams “bot,” delays won’t help.

Selenium stealth libraries: Most are outdated. Detection has evolved.

Free proxies: They’re already flagged. You’ll hit captchas immediately.

The Reality

For simple sites, basic stealth works. For Cloudflare, PerimeterX, or DataDome protected sites, you need:

  1. Real browser fingerprints (not spoofed)
  2. Residential proxies
  3. Human-like behavior patterns
  4. Session management (cookies, local storage)

Managing all this is complex. That’s why browser-as-a-service platforms exist—they handle stealth infrastructure so you focus on scraping logic.

Summary

Detection MethodDifficulty to BypassSolution
WebDriver flagEasyLaunch args
Navigator propertiesEasyInit scripts
Headless detectionMediumNew headless mode
Canvas/WebGLMediumFingerprint spoofing
TLS fingerprintN/A (real browser)Use real browser
IP reputationHardResidential proxies
Behavior analysisHardHuman-like timing

Start with the easy fixes. Add complexity only when you hit specific detection.