How to Avoid Bot Detection While Scraping in Python
Anti-bot systems have improved dramatically. A vanilla Playwright script fails on roughly 40% of commercial sites. Here’s what’s being detected and how to address it.
What Gets Detected
1. The WebDriver Flag
Browsers automated via CDP expose navigator.webdriver = true. Every anti-bot system checks this.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
# This gets detected immediately
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://bot.sannysoft.com/")
# Check the "WebDriver" row - it'll say "present"
Fix: Launch with automation detection disabled:
browser = p.chromium.launch(
args=["--disable-blink-features=AutomationControlled"]
)
2. Navigator Properties
Anti-bot scripts probe various navigator properties looking for inconsistencies:
// What detection scripts check
navigator.webdriver // Should be undefined
navigator.plugins.length // Should be > 0
navigator.languages // Should match Accept-Language header
navigator.platform // Should match User-Agent
navigator.hardwareConcurrency // 1 is suspicious
navigator.deviceMemory // Should exist
Fix: Use a properly configured context:
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
viewport={"width": 1920, "height": 1080},
locale="en-US",
timezone_id="America/New_York",
permissions=["geolocation"],
)
# Inject additional properties
page = context.new_page()
page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5]
});
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
""")
3. Headless Detection
Headless Chrome has detectable differences from headed Chrome:
- Missing
chrome.runtimeobject window.chromebehaves differently- Different WebGL renderer strings
- Missing audio/video codecs
# Headed mode passes more checks but is slower
browser = p.chromium.launch(headless=False)
# New headless mode (Chrome 109+) is harder to detect
browser = p.chromium.launch(
headless=True,
args=["--headless=new"] # Uses new headless implementation
)
4. TLS Fingerprinting (JA3/JA4)
Your TLS handshake has a fingerprint. Requests from Python’s TLS stack look different from Chrome’s.
This is why requests + fake User-Agent fails—the TLS fingerprint doesn’t match Chrome.
Playwright uses the real browser’s TLS stack, so this isn’t an issue for browser automation. It matters if you’re mixing requests with Playwright.
5. Canvas and WebGL Fingerprinting
Browsers render graphics slightly differently. Anti-bot systems detect:
- Identical canvas fingerprints across sessions
- WebGL vendor/renderer that doesn’t match the User-Agent
- Headless-specific rendering artifacts
# Check your canvas fingerprint
page.goto("https://browserleaks.com/canvas")
Partial fix: Inject noise into canvas operations:
page.add_init_script("""
const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function(type) {
if (type === 'image/png') {
const context = this.getContext('2d');
const imageData = context.getImageData(0, 0, this.width, this.height);
for (let i = 0; i < imageData.data.length; i += 4) {
imageData.data[i] ^= Math.floor(Math.random() * 2); // Tiny noise
}
context.putImageData(imageData, 0, 0);
}
return originalToDataURL.apply(this, arguments);
};
""")
6. Timing and Behavior Patterns
Bots behave differently than humans:
- Instant actions: Real users don’t click 0ms after page load
- Linear scrolling: Humans scroll erratically, not in perfect increments
- No mouse movement: Bots often click without moving the mouse there first
import random
import asyncio
async def human_like_click(page, selector):
element = page.locator(selector)
box = await element.bounding_box()
# Move mouse with slight randomness
await page.mouse.move(
box["x"] + box["width"] / 2 + random.uniform(-5, 5),
box["y"] + box["height"] / 2 + random.uniform(-5, 5),
steps=random.randint(10, 25) # Multiple steps = curve
)
# Random delay before click
await asyncio.sleep(random.uniform(0.1, 0.3))
await page.mouse.click(
box["x"] + box["width"] / 2,
box["y"] + box["height"] / 2
)
async def human_like_type(page, selector, text):
await page.locator(selector).click()
for char in text:
await page.keyboard.type(char)
await asyncio.sleep(random.uniform(0.05, 0.15)) # Typing speed varies
The Full Stealth Setup
Combining everything:
from playwright.sync_api import sync_playwright
import random
def create_stealth_browser():
p = sync_playwright().start()
browser = p.chromium.launch(
headless=False, # Or headless=True with --headless=new
args=[
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-infobars",
"--window-size=1920,1080",
"--start-maximized",
]
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
locale="en-US",
timezone_id="America/New_York",
geolocation={"latitude": 40.7128, "longitude": -74.0060},
permissions=["geolocation"],
)
page = context.new_page()
# Stealth scripts
page.add_init_script("""
// Remove webdriver flag
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
// Fix plugins
Object.defineProperty(navigator, 'plugins', {
get: () => {
const plugins = [
{ name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
{ name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
{ name: 'Native Client', filename: 'internal-nacl-plugin' },
];
plugins.length = 3;
return plugins;
}
});
// Fix languages
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
// Fix chrome object
window.chrome = {
runtime: {},
loadTimes: () => {},
csi: () => {},
app: {}
};
// Fix permissions
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
""")
return p, browser, page
# Usage
p, browser, page = create_stealth_browser()
page.goto("https://bot.sannysoft.com/")
# Check results - most should pass
IP Reputation: The Hard Problem
All the browser stealth in the world won’t help if your IP is flagged.
Datacenter IPs: Most are pre-flagged. AWS, GCP, Azure IP ranges are well-known.
Residential proxies: Look like real users but cost money.
# Using a proxy with Playwright
browser = p.chromium.launch(
proxy={
"server": "http://proxy.example.com:8080",
"username": "user",
"password": "pass"
}
)
Proxy rotation: For high-volume scraping, rotate IPs per request or per session:
proxies = [
"http://user:pass@proxy1.example.com:8080",
"http://user:pass@proxy2.example.com:8080",
# ...
]
def get_browser_with_proxy():
proxy = random.choice(proxies)
browser = p.chromium.launch(proxy={"server": proxy})
return browser
Testing Your Setup
Use these sites to check what’s being detected:
- https://bot.sannysoft.com/ - Comprehensive detection tests
- https://browserleaks.com/ - Fingerprinting details
- https://pixelscan.net/ - Commercial-grade detection
- https://www.browserscan.net/ - Browser fingerprint analysis
What Doesn’t Work
User-Agent alone: Changing UA without fixing other signals is useless.
Random delays only: If everything else screams “bot,” delays won’t help.
Selenium stealth libraries: Most are outdated. Detection has evolved.
Free proxies: They’re already flagged. You’ll hit captchas immediately.
The Reality
For simple sites, basic stealth works. For Cloudflare, PerimeterX, or DataDome protected sites, you need:
- Real browser fingerprints (not spoofed)
- Residential proxies
- Human-like behavior patterns
- Session management (cookies, local storage)
Managing all this is complex. That’s why browser-as-a-service platforms exist—they handle stealth infrastructure so you focus on scraping logic.
Summary
| Detection Method | Difficulty to Bypass | Solution |
|---|---|---|
| WebDriver flag | Easy | Launch args |
| Navigator properties | Easy | Init scripts |
| Headless detection | Medium | New headless mode |
| Canvas/WebGL | Medium | Fingerprint spoofing |
| TLS fingerprint | N/A (real browser) | Use real browser |
| IP reputation | Hard | Residential proxies |
| Behavior analysis | Hard | Human-like timing |
Start with the easy fixes. Add complexity only when you hit specific detection.