Skip to content
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.

Requesting a page, then visiting another causes issues #53

Open
danrossi opened this issue Jun 24, 2016 · 1 comment
Open

Requesting a page, then visiting another causes issues #53

danrossi opened this issue Jun 24, 2016 · 1 comment

Comments

@danrossi
Copy link

Sorry this is a question. There seems to be a problem requesting a page to scrape a special link, then choosing to visit that link. The page does not render or parse correctly. It seems I have to create a second session but xpath is not parsing it correctly.

ie

sess = dryscrape.Session(base_url = 'host')

# we don't need images
sess.set_attribute('auto_load_images', False)

# visit homepage and search for a term
sess.visit('/path')

links = sess.xpath('//a[contains .. ]')
link = links[0]["href"]

time.sleep(10)


sess = dryscrape.Session(base_url = 'host')

sess.visit(link)

 sess.xpath("//div[@class='searchitem']")

This is a problem I have to parse the whole body first. like

tree = fromstring(sess.body())

Unfortunately clicking on the link to visit does not work it has to choose to visit it with the visit method.

Is there a special way to reuse the session so xpath works ?

@danrossi
Copy link
Author

danrossi commented Jun 24, 2016

I can't explain it but for some reason on ubuntu this same code that works on OSX doesn't work on Ubuntu. The new visited link is not registered properly on the site and therefore fails and the html parsing breaks.

It can extract the link from the first page but the second page has issues.

Any ideas ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant