Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Playwright + Puppeteer crawlers fail to connect with proxy #2855

Open
1 task
jhamilton14 opened this issue Feb 20, 2025 · 3 comments
Open
1 task

Playwright + Puppeteer crawlers fail to connect with proxy #2855

jhamilton14 opened this issue Feb 20, 2025 · 3 comments
Assignees
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@jhamilton14
Copy link

jhamilton14 commented Feb 20, 2025

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/playwright (PlaywrightCrawler)

Issue description

Any time I try to run my PlaywrightCrawler with proxy urls, I get the following error:

ERROR PlaywrightCrawler: Request failed and reached maximum retries. Error: Detected a session error, rotating session... 
page.goto: net::ERR_TUNNEL_CONNECTION_FAILED at [url]
Call log:
  - navigating to "[url]", waiting until "load"


    at PlaywrightCrawler._throwIfProxyError
    at PlaywrightCrawler._handleNavigation
    at runNextTicks (node:internal/process/task_queues:60:5)
    at processImmediate (node:internal/timers:447:9)
    at process.topLevelDomainCallback (node:domain:161:15)
    at process.callbackTrampoline (node:internal/async_hooks:128:24)
    at async PlaywrightCrawler._runRequestHandler
    at async PlaywrightCrawler._runRequestHandler
    at async wrap

When I remove the proxyConfiguration from the PlaywrightCrawler options it works as expected.

This is happening both locally and in my deployed container. I haven't changed anything in my code since last time this was working.

Solution attempts

  • Tested without proxies and it works fine, no errors
  • Upgraded crawlee + playwright to latest versions
  • Tried a different proxy provider and it still didn't work
  • Tested with PuppeteerCrawler and same error occurs

Code sample

chromium.use(stealthPlugin());

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls, // verified working proxies
});

const crawler = new PlaywrightCrawler({
   proxyConfiguration,
   launchContext: {
     launcher: chromium,
     launchOptions: {
       headless: true,
     },
   },
   requestHandler: async ({ page, request, log }) => {
       ...
   },
   failedRequestHandler({ request }) {
     console.error(`Request ${request.url} failed`);
   },
});

Package version

3.12.2

Node.js version

node: v18.18.2, ts: 5.5.3

Operating system

MacOS

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

3.12.3-beta.17

Other context

No response

@jhamilton14 jhamilton14 added the bug Something isn't working. label Feb 20, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 20, 2025
@barjin
Copy link
Contributor

barjin commented Feb 21, 2025

Hello and thank you for your interest in Crawlee.

net::ERR_TUNNEL_CONNECTION_FAILED is an error coming from Chromium when it cannot connect to the target server using a proxy.

My first idea would be to try the proxies directly in your OS / browser to confirm they work with regular web browsers as expected. Note that Playwright / Puppeteer support HTTP(S) and SOCKSv5 proxies only.

If you can confirm the proxies work with your browsers without Crawlee - you're using puppeteer-extra-plugin-stealth (or similar?) in your snippet. This is a third-party library that Crawlee wasn't tested with. Try disabling this library and see whether the proxy-related behavior changes.

@jhamilton14
Copy link
Author

jhamilton14 commented Feb 21, 2025

@barjin Thanks for getting back to me!

I tried the following to no avail:

  • Removed the launcher: chromium line
  • Removed the puppeteer-extra-plugin-stealth setup
  • Tested proxies directly in my OS / browser and those worked as expected

My proxies are HTTP.

But, I found the issue and it's really weird. I went back and looked at my code and the only thing I added was datadog tracing. When I remove this, everything works as expected.

// tracer.ts
import tracer from 'dd-trace';

tracer.init();

export default tracer;
// main.ts
import './tracer';

...

I haven't yet discovered why this causes an issue with Playwright browsers + proxies, but thought I'd put my findings here.

@barjin
Copy link
Contributor

barjin commented Feb 24, 2025

That's interesting 👀 Could you please confirm whether this causes the proxies to fail even with bare Playwright / Puppeteer (without Crawlee)? I assume it should, as Crawlee essentially picks a proxy URL from a list and launches a browser with it. Anyway, could you open an issue in the respective projects if this is the case?

If the error only appears in the DataDog + Crawlee configuration (but not with Playwright + DataDog), we're of course ready to look into it more :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants