Skip to content

Conversation

@Mantisus
Copy link
Collaborator

Description

  • This PR adds error handling for RobotsTxtFile.load. This prevents crawler failures related to network errors, DNS errors for non-existent domains (e.g., https://placeholder.com/), or unexpected data formats received from the /robots.txt page (e.g., https://avatars.githubusercontent.com/robots.txt).

@Mantisus Mantisus requested review from janbuchar and vdusek October 30, 2025 17:14
@Mantisus Mantisus self-assigned this Oct 30, 2025
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we cover this fix by a test? Otherwise LGTM.

@janbuchar janbuchar removed their request for review October 31, 2025 13:15
@vdusek vdusek merged commit 596a311 into apify:master Nov 3, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants