-
Notifications
You must be signed in to change notification settings - Fork 434
refactor!: Split BrowserType
literal into two different literals based on context
#1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Two similar, but different contexts are `Playwright` and browser fingerprints
This is slightly breaking change, so probably wait until more braking changes accumulate. |
New release is coming, so let's add this change in now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, could you please resolve conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the BrowserType
literal in the fingerprinting context, replacing Playwright-centric values (chromium
, webkit
) with explicit fingerprinting values (chrome
, safari
). It introduces a mapping function to translate Playwright browser types into the new literal set and propagates these changes through the header generator, adapter, crawler code, tests, and docs.
- Updated
SupportedBrowserType
and related constants to use['chrome', 'firefox', 'safari', 'edge']
- Added
fingerprint_browser_type_from_playwright_browser_type
and applied it throughout header and crawler code - Adjusted tests and documentation to reference the new browser type literals
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
src/crawlee/fingerprint_suite/_types.py | Changed SupportedBrowserType literal values |
src/crawlee/fingerprint_suite/_header_generator.py | Added mapping function and updated default browser_type |
src/crawlee/fingerprint_suite/_consts.py | Updated BROWSER_TYPE_HEADER_KEYWORD keys |
src/crawlee/fingerprint_suite/_browserforge_adapter.py | Updated adapter logic/comments to use chrome /safari |
src/crawlee/crawlers/_playwright/_playwright_crawler.py | Mapped Playwright browser_type to fingerprint context |
src/crawlee/browsers/_playwright_browser_controller.py | Mapped Playwright browser_type in header generator call |
tests/unit/fingerprint_suite/test_header_generator.py | Updated parameterized tests for new browser types |
tests/unit/fingerprint_suite/test_adapters.py | Added test for PatchedHeaderGenerator with various input types |
tests/unit/crawlers/_playwright/test_playwright_crawler.py | Adjusted tests to use mapping function and new browser types |
docs/upgrading/upgrading_to_v1.md | Documented breaking change in browser type literals |
docs/examples/code_examples/playwright_crawler_with_fingerprint_generator.py | Updated example to use 'chrome' |
Comments suppressed due to low confidence (3)
src/crawlee/fingerprint_suite/_browserforge_adapter.py:86
- The inline comment still refers to
chromium
; for consistency with the updated literal set, please change this tochrome
.
# Increase max attempts as from `BrowserForge` header generator perspective even `chromium`
src/crawlee/fingerprint_suite/_header_generator.py:13
- [nitpick] Consider adding a short docstring for this helper to explain that it maps Playwright browser literals (
'chromium'
,'firefox'
,'webkit'
) into the fingerprinting context ('chrome'
,'firefox'
,'safari'
).
def fingerprint_browser_type_from_playwright_browser_type(
src/crawlee/fingerprint_suite/_header_generator.py:13
- The new mapping function isn’t covered by a direct unit test. Consider adding tests for all three Playwright inputs (
'chromium'
,'firefox'
,'webkit'
) and their expected outputs ('chrome'
,'firefox'
,'safari'
).
def fingerprint_browser_type_from_playwright_browser_type(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Copilot <[email protected]>
Description
Split
BrowserType
literal into two different literals based on context.This avoids some confusion and some implicit string manipulation in favor of explicit name mapping between the two different literals.
In Playwright:
'chromium', 'firefox', 'webkit'
In browser fingerprints context it is :
'chrome', 'firefox', 'safari', 'edge'