Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add requests argument to EnqueueLinksFunction #1024

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Pijukatel
Copy link
Contributor

@Pijukatel Pijukatel commented Feb 25, 2025

Description

Add requests argument to EnqueueLinksFunction.
Split EnqueueLinksFunction implementations to extract_links and add_requests.
Add overload variants of EnqueueLinksFunction.
Raise error in EnqueueLinksFunction implementations if called with mutually exclusive arguments.

Relates to : #906

@github-actions github-actions bot added this to the 109th sprint - Tooling team milestone Feb 25, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 25, 2025

It adds explicitly passed requests to the `RequestManager`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine at least two instances of confusing behavior:

  1. you pass in requests and selector, the function enqueues the requests (even though their elements don't match the selector), and it extracts and enqueues some additional links from the current page
  2. you pass in requests and transform_request_function, but the function won't be called

I know that the docblock makes it pretty clear that this is what will happen, but I also know that there will be bug reports about this 😁

In my opinion, we should make two overloads and throw a runtime error if someone passes both requests and one or more of the other arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine at least two instances of confusing behavior:

1. you pass in `requests` and `selector`, the function enqueues the requests (even though their elements don't match the selector), and it extracts and enqueues some additional links from the current page

2. you pass in `requests` and `transform_request_function`, but the function won't be called

I know that the docblock makes it pretty clear that this is what will happen, but I also know that there will be bug reports about this 😁

In my opinion, we should make two overloads and throw a runtime error if someone passes both requests and one or more of the other arguments.

Ok, so the function will basically become two in one alias for:

  1. add_requests(requests)
  2. add_requests(extract_requests(...))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @janbuchar, rather be more strict here.

@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Feb 25, 2025
@Pijukatel Pijukatel requested a review from janbuchar February 25, 2025 14:45
@Pijukatel Pijukatel added the enhancement New feature or request. label Feb 25, 2025
@Pijukatel Pijukatel marked this pull request as ready for review February 25, 2025 14:48
label: str | None = None,
user_data: dict[str, Any] | None = None,
transform_request_function: Callable[[RequestOptions], RequestOptions | RequestTransformAction] | None = None,
requests: Sequence[str | Request] | None = None,
Copy link
Collaborator

@janbuchar janbuchar Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now noticed that the JS counterpart accepts just urls as an array of strings. We should either restrict this, or extend the JS version 🙂

If we choose restricting this one, then most of the other parameters (barring selector) would actually start making sense in combination with urls.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep it as it is for consistency, since we use request: str | Request everywhere else.

@vdusek vdusek mentioned this pull request Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants