You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
Any time I try to run my PlaywrightCrawler with proxy urls, I get the following error:
ERROR PlaywrightCrawler: Request failed and reached maximum retries. Error: Detected a session error, rotating session...
page.goto: net::ERR_TUNNEL_CONNECTION_FAILED at [url]
Call log:
- navigating to "[url]", waiting until "load"
at PlaywrightCrawler._throwIfProxyError
at PlaywrightCrawler._handleNavigation
at runNextTicks (node:internal/process/task_queues:60:5)
at processImmediate (node:internal/timers:447:9)
at process.topLevelDomainCallback (node:domain:161:15)
at process.callbackTrampoline (node:internal/async_hooks:128:24)
at async PlaywrightCrawler._runRequestHandler
at async PlaywrightCrawler._runRequestHandler
at async wrap
When I remove the proxyConfiguration from the PlaywrightCrawler options it works as expected.
This is happening both locally and in my deployed container. I haven't changed anything in my code since last time this was working.
Solution attempts
Tested without proxies and it works fine, no errors
Upgraded crawlee + playwright to latest versions
Tried a different proxy provider and it still didn't work
Tested with PuppeteerCrawler and same error occurs
net::ERR_TUNNEL_CONNECTION_FAILED is an error coming from Chromium when it cannot connect to the target server using a proxy.
My first idea would be to try the proxies directly in your OS / browser to confirm they work with regular web browsers as expected. Note that Playwright / Puppeteer support HTTP(S) and SOCKSv5 proxies only.
If you can confirm the proxies work with your browsers without Crawlee - you're using puppeteer-extra-plugin-stealth (or similar?) in your snippet. This is a third-party library that Crawlee wasn't tested with. Try disabling this library and see whether the proxy-related behavior changes.
Tested proxies directly in my OS / browser and those worked as expected
My proxies are HTTP.
But, I found the issue and it's really weird. I went back and looked at my code and the only thing I added was datadog tracing. When I remove this, everything works as expected.
// tracer.ts
import tracer from 'dd-trace';
tracer.init();
export default tracer;
// main.ts
import './tracer';
...
I haven't yet discovered why this causes an issue with Playwright browsers + proxies, but thought I'd put my findings here.
That's interesting 👀 Could you please confirm whether this causes the proxies to fail even with bare Playwright / Puppeteer (without Crawlee)? I assume it should, as Crawlee essentially picks a proxy URL from a list and launches a browser with it. Anyway, could you open an issue in the respective projects if this is the case?
If the error only appears in the DataDog + Crawlee configuration (but not with Playwright + DataDog), we're of course ready to look into it more :)
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
Any time I try to run my
PlaywrightCrawler
with proxy urls, I get the following error:When I remove the
proxyConfiguration
from the PlaywrightCrawler options it works as expected.This is happening both locally and in my deployed container. I haven't changed anything in my code since last time this was working.
Solution attempts
PuppeteerCrawler
and same error occursCode sample
Package version
3.12.2
Node.js version
node: v18.18.2, ts: 5.5.3
Operating system
MacOS
Apify platform
I have tested this on the
next
release3.12.3-beta.17
Other context
No response
The text was updated successfully, but these errors were encountered: