-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't get the IDs #23
Comments
I also have this problem, it used to work and now doesn't. My suspicion is that the problem is with the css selector. Maybe Twitter recently changed the way they store tweet ids in the css file? I also don't really know what I'm talking about because I'm pretty new to python. If you figure it out, please let me know! |
@shenyizy I think I've fixed it but I'm not entirely confident this is logic error free. It's a bit messy, but the trick is to use a new and less effective css selector. I've noticed three problems so far but I've been able to work around:
New selector:
New loop:
New writetofile
Hope that works for you too! |
@jaackland Thanks for sharing a solution. However, when I run this code, it doesn't get out of the "while len(found_tweets) >= increment:" loop. The problem is coming from the "all_tweets" variable. There is nothing added to that variable and so it never gets out of that loop. Any alternative solution? |
@Ahsancode Sorry I should have made clearer that that isn't a full script. Are you substituting that into the original Scrape.py? |
@jaackland No worries, I am aware that this isn't the full script. The original version was also working for me until now. I just made the same adjustments as you did. It is the "tweet_selector" that is giving me problems at the moment. |
@jaackland My mistake, I had an indent problem. The code is running fine now. There are still a couple of issues: |
@Ahsancode Yes unfortunately I think Twitter have managed to rate-limit Selenium now (the original post implies this wasn't always the case). If you increase the delay variable it will scrape more tweets (but take longer, obviously). I went up to 5 and got all the tweets I needed, but you might be able to get away with less than that. Glad it was just an indent problem because as far as I can tell that tweet_selector is universal (if a bit sloppy). |
@jaackland Thanks so much for sharing the codes. However, when I tried to substitute the original codes with yours. There is a syntax error in the part of New writetofile as shown below.
SyntaxError: invalid syntax I am also pretty new to python so sorry for the stupid question. |
I found a selector which seems to only select the time posted link, which links to the full tweet's page. |
Hi, could you please tell me how did you get this CSS selector for tweet_selector? Thanks |
Twitter changed CSS styling, therefore in current code you need to change id_selector and tweet_selector to: id_selector = "div > div > :nth-child(2) > :nth-child(2) > :nth-child(1) > div > div > :nth-child(1) > a" |
I would combine the article part with the rest of the selector, for code neatness. Your selector seems to grab a few more tweets for whatever reason.
I used Chrome DevTools to generate a selector and stripped out class names, etc. |
I've tried the changes suggested to id_selector and tweet_selector, however I'm not getting the ID with this. I've changed the line collecting the id (line 65) to this: This gives me some ids, but not even close the number of tweets I'm finding. Any suggestions on what the problem might be? |
I've started writing some scripts for a project I'm working on. Currently tweets_between.py generates a text file but I'll see if I can generate a json so that |
I'm back with a new selector! I have also run into a new twitter search page which will not work with this selector. Simple fix: just restart the script, I think they're A/B testing it. |
@rougetimelord that selector worked... kinda |
I was able to get a list of ids... the thing is that now I can´t transform it to Json file. I tried transforming into a dict o a tuple... but didn't worked. My Try code looks like this know
These are my selectors:
|
I finally got it to work! But I hit Twitter limit every time.... I started at 2 seconds of sleep and did like 4 months. I will try to upload my version. |
Hi! Were you able to fix this error? I was trying what you recommended and it also shows the 'unhashable type: list' error. I'd appreciate any help you could give me with this. I'm trying to retrieve tweets of specific accounts from November and December 2019. |
I submitted a Pull Request with my working version |
Hi, I used the updated .py file and it still doesn't work. |
@rougetimelord Just wanted to let you know that your css selector can be condensed into something way smaller |
I have fixed this in my own version, as I see PRs are not being sorted out I am not sure if I should make a PR here to fix it. |
bro ı try your version 0 tweets found, 0 total getting results like this |
Why dont i get all tweets from a user? For example, @lulaoficial i only scratch 8k out of 22k |
The twitter webpage was modified years after this code was written. The modifications needed are as follows:
|
When I run the scrape.py, the final JSON created is blank without any ids in it. It was working last week. Does anyone know how to solve it?
The text was updated successfully, but these errors were encountered: