-
Notifications
You must be signed in to change notification settings - Fork 14
Unable to fetch filters #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @MarcSeebold - yes they did. I'm (passively) trying to figure out a way to bypass this. The first next step, Selenium, doesn't work. Alas, I tried other JS libraries similar to Selenium but to no avail. |
@MarcSeebold Ah that is interesting - works in Python as well. However, it looks like a lot of the critical attributes of a CL post (price, number of cylinders, car make, car transmission, etc.) are omitted. I'm not sure how much useful information I could pull out of these entries. |
I think Selenium is the right way to go. We just have to figure out how CL detects that it's not a "supported" browser. From my experience, changing the User-Agent sometimes helped. |
Also, try it w/o --headless first. I noticed that it sometimes behaves differently.
|
Ok - I could give the non |
Maybe that will do the trick: https://antoinevastel.com/bot%20detection/2023/02/19/new-headless-chrome.html |
query/filters.py:get_addl_filters is unable to crawl the page (
search_html = next(sessions.yield_html(url))
)window.cl.specialCurtainMessages = { unsupportedBrowser: [ "We've detected you are using a browser that is missing critical features.", "Please visit craigslist from a modern browser." ], unrecoverableError: [ "There was an error loading the page." ] };
I guess Craigslist put some new anti-crawling features in place.
The text was updated successfully, but these errors were encountered: