Skip to content

failed_request_handler runs and logs but context.push_data(...) does not write to dataset (PlaywrightCrawler) #1532

@loic-bellinger

Description

@loic-bellinger

When a request fails in PlaywrightCrawler, my @crawler.failed_request_handler runs (I can see context.log output) but await context.push_data(...) does not create any dataset rows. Logging works (I can see the context.log.info(...) lines), only the dataset push appears to be ignored.

Minimal repro attached below.

import asyncio

from crawlee.crawlers import (
    PlaywrightCrawler,
    PlaywrightCrawlingContext,
    BasicCrawlingContext,
)


async def main() -> None:
    crawler = PlaywrightCrawler(
        max_requests_per_crawl=10,
        max_request_retries=2,
        headless=True,
        browser_type='chromium',
    )

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')

    @crawler.failed_request_handler
    async def failed_request_handler(context: BasicCrawlingContext, error: Exception) -> None:
        context.log.info(f'failed_request_handler: processing {context.request.url} ...')
        await context.push_data(
            dataset_name="failed_request_handler_errors",
            data={
                "failed_url": context.request.url,
                "label": context.request.label,
                "error_type": type(error).__name__,
                "error_message": str(error),
                "retry_count": context.request.retry_count,
                "status": "failed",
            },
        )

    await crawler.run(['https://www.info.gouv.fr/totalnonsense'])


if __name__ == '__main__':
    asyncio.run(main())

I'm using crawlee[playwright]>=1.0.4

Relevant part of the documentation is https://crawlee.dev/python/docs/guides/request-router#failed-request-handler

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions