Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.actor().call() method to set the correct timeout, show the progress in status message, and stream logs #632

Open
mtrunkat opened this issue Jan 28, 2025 · 1 comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@mtrunkat
Copy link
Member

I was trying out https://apify.com/jakub.kopecky/llmstxt-generator Actor, the experience was not great because of the following:

Timeout

The Actor above was started with a timeout of 18,000 seconds, but the WCC is triggered with a default timeout of 360,000. So, it may happen that the original Actor timeouts, but the WCC will continue running. IMHO, in this case, we should set the timeout for the remaining time for the original Actor.

There might be cases when this is not appropriate, so this behavior could be opt-in or out.

Logs

It's called WCC underneath, which may take a long time to finish in the case of a large website. This means that the Actor seem to get stuck on the following log:

2025-01-23T13:56:06.535Z ACTOR: Pulling Docker image of build OQWIcf5rmeLt4icyd from repository.
2025-01-23T13:56:08.308Z ACTOR: Creating Docker container.
2025-01-23T13:56:08.850Z ACTOR: Starting Docker container.
2025-01-23T13:56:11.052Z [apify] INFO  Initializing Actor...
2025-01-23T13:56:11.054Z [apify] INFO  System info ({"apify_sdk_version": "2.1.0", "apify_client_version": "1.8.1", "crawlee_version": "0.4.5", "python_version": "3.12.8", "os": "linux"})
2025-01-23T13:56:11.119Z [apify] INFO  Starting the "apify/website-content-crawler" actor for URL: https://docs.apify.com/

So, I am thinking about improving the .actor().call() method in SDK/client the way that it enables developers to optionally stream the log from the Actor called via a .call() to provide progress/context info.

Status message

Finally, it displays a dummy status message that does not communicate progress. The call could automatically update the status message, for example, here, with:

Running Website Content Crawler: processed 235/7876

@mtrunkat
Copy link
Member Author

You can see @MQ37 improving this on the Actor side: apify/actor-llmstxt-generator#10

@B4nan B4nan added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants