-
Notifications
You must be signed in to change notification settings - Fork 33
Move fullDomFetcher to Playwright #1144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -94,10 +94,8 @@ | |||
"morgan": "^1.10.0", | |||
"node-fetch": "^3.1.0", | |||
"octokit": "2.0.2", | |||
"patchright": "1.50.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just flagging that this is an explicit pin at the moment due to Kaliiiiiiiiii-Vinyzu/patchright#58. Issue is closed but the fix is not yet part of the latest release. This version should be adjusted after review, prior to merging.
I noticed the tests are failing due to linting and commit/changelog issues. I'll fix these, but happy to have a first high-level review first to ensure this is useful and worth merging and fix everything at once afterwards :) |
Thanks @LVerneyEC for this contribution! Fully agree with a first high-level overview before ironing out details :) |
Not so much. I have another PR to come for the htmlOnlyFetcher, for which this increases widely coverage. Here, the main benefit is to move away from Also, more high-level updates such as supporting corporate proxies and offering the ability to run headful for debugging purposes. |
Hi @LVerneyEC, I've conducted a series of benchmark tests to evaluate the potential benefits of switching from Puppeteer to Playwright. Below are the detailed results:
Observations:
Based on these benchmark results, I do not recommend switching to Playwright at this time. Even if it has faster execution times, its higher failure rates and blocking issues is a blocking point for me. Regarding the other points mentioned:
Have I missed any key points in my analysis, and do you still see reasons to switch to Playwright despite these results? |
Would you have more details the benchmark and the results? I am a bit surprised about the 403 and selector errors, since it does not really match my experience so far. |
Hi,
Here is a proposal for a rewriting of the full DOM fetcher, moving it to Playwright instead of Puppeteer.
This edited browser also has support for HTTP/HTTPS proxy (e.g. corporate proxy) and behavior can be adjusted by two environment variables:
PLAYWRIGHT_NO_SANDBOX
to disable all the sandboxing in Chrome (required for running in Docker, depending on the Docker setup).PLAYWRIGHT_NO_HEADLESS
to run it in headful mode (sometimes useful for debugging purposes)This is using patchright wrapper around Playwright, which adds several patches for obvious Playwright detection mechanisms. Similar to the previously used
puppeteer-extra-plugin-stealth
.Best,