You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.
So, I have a website that I'm trying to scrape and it requires login. Unfortunately it doesn't seem to use cookies for login so opening multiple sessions won't work.
Anyway, it works as a kind of online file system in that there are multiple layers to go through. I currently have 5 nested for loops (all require getting href from an xpath with multiple matches) to go through the files. Inside of each I do some processing and access more URLs from the same session. Problem is, lets say after returning to layer 3 from the last layer when it loops for the second time I get an error when trying to "course.get_attr("href")" saying it is no longer in DOM.
The for statement is course in session.xpath("//div[@id='_26_1termCourses_noterm']/ul/li/a"):
So I imagine it may be some sort of timeout bug, since if no fors are nested and no processing is done a loop like that works normally to extract all links matching the xpath from the page.
Any ideas?
Thanks!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
So, I have a website that I'm trying to scrape and it requires login. Unfortunately it doesn't seem to use cookies for login so opening multiple sessions won't work.
Anyway, it works as a kind of online file system in that there are multiple layers to go through. I currently have 5 nested for loops (all require getting href from an xpath with multiple matches) to go through the files. Inside of each I do some processing and access more URLs from the same session. Problem is, lets say after returning to layer 3 from the last layer when it loops for the second time I get an error when trying to
"course.get_attr("href")"
saying it is no longer in DOM.The for statement is
course in session.xpath("//div[@id='_26_1termCourses_noterm']/ul/li/a"):
So I imagine it may be some sort of timeout bug, since if no fors are nested and no processing is done a loop like that works normally to extract all links matching the xpath from the page.
Any ideas?
Thanks!
The text was updated successfully, but these errors were encountered: