Skip to content

Add post-navigation hooks and browser lifecycle hooks #1741

@vdusek

Description

@vdusek

Context

Crawlee JS provides navigation hooks and browser lifecycle hooks that Python is missing. See parity report for broader context.

Gaps

1. Post-navigation hooks (main gap)

JS BrowserCrawler and HttpCrawler both support postNavigationHooks — they run after page.goto() / HTTP request completes but before the request handler. Useful for CAPTCHA detection, response validation, etc.

Python only has only pre_navigation_hook. No post-navigation equivalent exists.

2. Browser lifecycle hooks (BrowserPool)

JS BrowserPool exposes 6 hook types, these are for consideration:

  • preLaunchHooks / postLaunchHooks — before/after browser launch
  • prePageCreateHooks / postPageCreateHooks — before/after new page creation
  • prePageCloseHooks / postPageCloseHooks — before/after page close

Python's BrowserPool has no lifecycle hooks.

Reference

  • JS BrowserCrawler hooks: packages/browser-crawler/src/internals/browser-crawler.ts
  • JS BrowserPool hooks: packages/browser-pool/src/browser-pool.ts
  • Python pre-nav hooks: src/crawlee/crawlers/_playwright/_playwright_crawler.py
  • Python BrowserPool: src/crawlee/browsers/_browser_pool.py

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions