Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Jsonpath support for scraping #1260

Open
Renaud11232 opened this issue Dec 14, 2024 · 1 comment
Open

[FR] Jsonpath support for scraping #1260

Renaud11232 opened this issue Dec 14, 2024 · 1 comment

Comments

@Renaud11232
Copy link

Renaud11232 commented Dec 14, 2024

Feature request

Context

I've been using koillection for the past year and it's been pretty good all around.
I've found the scraping feature to be very useful to avoid the tedious task of manually typing every item detail.
However, recent changes in the website I've been using for scraping (Discogs) made it impossible to retrieve all the information I wanted by using only xpath.

Possible solution

One way to consistently retrieve such information would be to use the Discogs JSON API, but since XPath cannot work on JSON documents, it might be useful to allow other query languages.

Currently, koillection only supports xpath queries (surrounded by # delimiters), but I would like it to also support JsonPath, (this might also make it easier to add more in the future, if needed). One possible way of doing so would be to add a string at the begining of the selector to tell wether it's using xpath or jsonpath in a similar way changedetection.io is doing it, ie:

  • #xpath://h1/text()# and maybe keep xpath as default to avoid beaking compatibility with existing scapers to keep #//h1/text()# as valid.
  • #jsonpath:$.data.example#

The library https://github.com/SoftCreatR/JSONPath seem to provide an easy-ish way of adding this.

If you are OK with this change, I can give it a try on my free time and submit a PR.

Thank you

@benjaminjonard
Copy link
Owner

Hello, that would be a nice addition to Koillection.

I'm guessing a scraper would either be used with xPath or jsonPath, and not both at the same time. So maybe we could just add a new property when creating a new scraper. It would be easier for users as they wouldn't have to repeat (or forget) the prefix.

There isn't much documentation on how to develop for this project, as I've worked alone on it for the most part. So, if you want to give it a try, don't hesitate to ask questions. I'll do my best to answer quickly.

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants