HtmlCollectionScraper gives 500 error #1266

OmnipotentEntity · 2024-12-27T09:46:36Z

I get the following error:

{"message":"Uncaught PHP Exception ArgumentCountError: \"Too few arguments to function App\\Service\\Scraper\\HtmlScraper::extract(), 3 passed in /var/www/koillection/src/Service/Scraper/HtmlCollectionScraper.php on line 18 and exactly 4 expected\" at HtmlScraper.php line 48","context":{"exception":{"class":"ArgumentCountError","message":"Too few arguments to function App\\Service\\Scraper\\HtmlScraper::extract(), 3 passed in /var/www/koillection/src/Service/Scraper/HtmlCollectionScraper.php on line 18 and exactly 4 expected","code":0,"file":"/var/www/koillection/src/Service/Scraper/HtmlScraper.php:48"}},"level":500,"level_name":"CRITICAL","channel":"request","datetime":"2024-12-27T03:27:24.772326-06:00","extra":{}}

I was able to hunt it down to commit 432f476 which seems to had added image scraping, which required an API change, but this API change wasn't added to HtmlCollectionScraper.php and also line 22.

It seems like $scraping as a variable is available in this context, so it might be as simple as simply adding this variable to the 4th argument position in both locations. However, I'm not familiar enough with the project to feel confident in creating a PR.

Thank you for your hard work!

The text was updated successfully, but these errors were encountered:

OmnipotentEntity · 2024-12-27T09:56:59Z

I have attempted to modify these files in place and restart the service and I have the following new error which seems to be related to the image not being scraped properly. This probably has something to do with the fact that I only very barely attempted to understand what's going on here, and there's probably a few other changes that needed to happen to emulate the referenced commit.

The new error is:

{"message":"Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages:\nerror:0A000086:SSL routines::certificate verify failed","context":{"exception":{"class":"ErrorException","message":"Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages:\nerror:0A000086:SSL routines::certificate verify failed","code":0,"file":"/var/www/koillection/src/Service/Scraper/HtmlCollectionScraper.php:23"}},"level":400,"level_name":"ERROR","channel":"php","datetime":"2024-12-27T03:51:51.994482-06:00","extra":{}}
{"message":"Warning: file_get_contents(): Failed to enable crypto","context":{"exception":{"class":"ErrorException","message":"Warning: file_get_contents(): Failed to enable crypto","code":0,"file":"/var/www/koillection/src/Service/Scraper/HtmlCollectionScraper.php:23"}},"level":400,"level_name":"ERROR","channel":"php","datetime":"2024-12-27T03:51:51.994630-06:00","extra":{}}
{"message":"Warning: file_get_contents(https://s4.anilist.co/file/anilistcdn/media/manga/cover/large/bx30703-iRLjKRnSwCFP.jpg): Failed to open stream: operation failed","context":{"exception":{"class":"ErrorException","message":"Warning: file_get_contents(https://s4.anilist.co/file/anilistcdn/media/manga/cover/large/bx30703-iRLjKRnSwCFP.jpg): Failed to open stream: operation failed","code":0,"file":"/var/www/koillection/src/Service/Scraper/HtmlCollectionScraper.php:23"}},"level":400,"level_name":"ERROR","channel":"php","datetime":"2024-12-27T03:51:51.994706-06:00","extra":{}}
{"message":"Uncaught PHP Exception TypeError: \"base64_encode(): Argument #1 ($string) must be of type string, false given\" at HtmlCollectionScraper.php line 23","context":{"exception":{"class":"TypeError","message":"base64_encode(): Argument #1 ($string) must be of type string, false given","code":0,"file":"/var/www/koillection/src/Service/Scraper/HtmlCollectionScraper.php:23"}},"level":500,"level_name":"CRITICAL","channel":"request","datetime":"2024-12-27T03:51:51.994961-06:00","extra":{}}

For completeness sake, here is my scraper:

Name: Anilist - Manga Series
Url Pattern: https://anilist.co/manga/
Name Path: #//div[@class="type"][text()="English"]/following-sibling::div/text()#
Image Path: #//img[@class="cover"]/@src#
Volume Count: (Text) #//div[@class="type"][text()="Volumes"]/following-sibling::div/text()#
Status: (Text) #//div[@class="type"][text()="Status"]/following-sibling::div/text()#

OmnipotentEntity · 2024-12-27T19:40:00Z

With this patch the scrap finishes successfully, but the thumbnail isn't scraped properly. So it's not a full solution yet.

--- HtmlCollectionScraper.php.old       2024-12-27 09:49:20.107123727 +0000
+++ HtmlCollectionScraper.php.new       2024-12-27 19:36:08.045680868 +0000
@@ -15,12 +15,12 @@
         $crawler = $this->getCrawler($scraping);
         $scraper = $scraping->getScraper();
 
-        $image = $scraping->getScrapImage() ? $this->extract($scraper->getImagePath(), DatumTypeEnum::TYPE_TEXT, $crawler) : null;
+        $image = $scraping->getScrapImage() ? $this->extract($scraper->getImagePath(), DatumTypeEnum::TYPE_TEXT, $crawler, $scraper) : null;
         $image = $this->guessHost($image, $scraping);
 
         return [
-            'name' => $scraping->getScrapName() ? $this->extract($scraper->getNamePath(), DatumTypeEnum::TYPE_TEXT, $crawler) : null,
-            'base64Image' => 'data:image/png;base64,' . base64_encode(file_get_contents($image)),
+            'name' => $scraping->getScrapName() ? $this->extract($scraper->getNamePath(), DatumTypeEnum::TYPE_TEXT, $crawler, $scraper) : null,
+            'image' => $image,
             'data' => $this->scrapData($scraping, $crawler, ScraperTypeEnum::TYPE_COLLECTION),
             'scrapedUrl' => $scraping->getUrl()
         ];

benjaminjonard · 2024-12-27T20:16:00Z

I had a quick look today and did a quick fix but as you noticed the image can't be properly scrapped.

I'm looking into new ways to scrap urls, like this method suggested here #1263.
While it works better than the current implementation, I still can't make it work with your example. The website returns a blank page saying javascript is required.

I may have another solution but I'm having a hard time making it work with Docker (https://github.com/symfony/panther)

It's going to take some time but I hope I can push a better implementation for the scrapper in the next release

OmnipotentEntity · 2024-12-27T21:03:43Z

That's interesting, because the same scraper seems to work as an Item scraper rather than a collection scraper. Unless something changed with the website overnight (which is possible.)

TaylanTatli · 2025-02-25T16:23:12Z

I've tried only for Wish scraper and it gives the same error. I tried the patch, it didn't solve my problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HtmlCollectionScraper gives 500 error #1266

HtmlCollectionScraper gives 500 error #1266

OmnipotentEntity commented Dec 27, 2024

OmnipotentEntity commented Dec 27, 2024 •

edited

Loading

OmnipotentEntity commented Dec 27, 2024

benjaminjonard commented Dec 27, 2024

OmnipotentEntity commented Dec 27, 2024 •

edited

Loading

TaylanTatli commented Feb 25, 2025

HtmlCollectionScraper gives 500 error #1266

HtmlCollectionScraper gives 500 error #1266

Comments

OmnipotentEntity commented Dec 27, 2024

OmnipotentEntity commented Dec 27, 2024 • edited Loading

OmnipotentEntity commented Dec 27, 2024

benjaminjonard commented Dec 27, 2024

OmnipotentEntity commented Dec 27, 2024 • edited Loading

TaylanTatli commented Feb 25, 2025

OmnipotentEntity commented Dec 27, 2024 •

edited

Loading

OmnipotentEntity commented Dec 27, 2024 •

edited

Loading