Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 500K big crawljob of tv2.dk completed without harvesting 275K pages after trying to add filters for facebook and twitter or is crashed #2473

Open
tuehlarsen opened this issue Mar 10, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@tuehlarsen
Copy link

Browsertrix Version

v1.13.2-a21b2ff

What did you expect to happen? What happened instead?

Before i tried to add 2 new filters: 1 for twitter and 1 for facebook - it was running smoothly and crawling in 4 active crawling windows. I had some strange GUI dialogs while i were trying to add the 2 filters WITHOUT clicking on the save bottom. The GUI was lagging and suddently it said DONE and started "uplouding the wacs". Now it is says the job is complete but have not crawled 275K pages out of 500K. So far it has crawled 235K pages and about 700 GB.

Here a screen dump of the last GUI error page. Download of the log crashes so how during download - it is 1,8 Gb big. I don't get an error.

Image

Perhaps can you add some more checks before it ends a crawl and in the add filter dialog?

Start Time
17.02.25, 11.20 CET
Finish Time
10.03.25, 07.36 CET
Elapsed Time
20 d, 20 t, 15 m og 58 s
Execution Time
59K minutes (979h 45m 59s)
Initiator
Manual start by Tue Larsen
Size
698 GB, 223.497 / 500.000 pages
Crawler Channel (Exact Crawler Version)
Latest (docker.io/webrecorder/browsertrix-crawler:latest)
Crawl ID
manual-20250217102015-3a4db63a-176

Here is some logs from our it-departm:

log.txt
logs-from-op-in-browsertrix-cloud-backend-976bfb455-z2jv5.log

Reproduction instructions

see above

Screenshots / Video

No response

Environment

No response

Additional details

No response

@tuehlarsen tuehlarsen added the bug Something isn't working label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Triage
Development

No branches or pull requests

1 participant