Skip to content

bescks/twitter-web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

twitter-web-scraping

Program Function

This is a web-scraping program to retrieve tweets (filtered by date and geography) from Twitter.

Twitter doesn't provide a API to help developer to get tweets filtered both by history dates (a week before) and geography.

But Twitter provide search techniques like this:

The objective of this program is to build HTTP requests and simulate user actions like scrolling down to get all tweets.

Parameter setting

You should find the following snippet in the bottom of file. You can change these parameters manually.

if __name__ == '__main__':
    # search tweets posted in latitude = 40.730610, longitude = -73.935242, radius = 5 
    set_geo(40.730610, -73.935242, 5)
    # search tweets posted between 2016-5-29 and 2016-5-31
    set_date(2016, 5, 29, 2016, 5, 31)
    # start scraping tweets
    start_scraping()

After execution, tweets will be grouped by date and stored in json files.

Results

like this:

{"timestamp": "16:59 - 29 mag 2016", "timestamp_ms": "1464566370000", "fullname": "\nNorthern Bell", "username": "@NorthernBellNY", "content": "\nDuClaw Sour Me This now available on tap. http://brmn.us/1Is6JcT @duclawbrewing #BeerMenus\n"}
{"timestamp": "16:58 - 29 mag 2016", "timestamp_ms": "1464566327000", "fullname": "\nDon Valdez", "username": "@Don_Valdez", "content": "\nWedding tux. Shoutout to Jon for the reco- Thanks to Jeremy for the wonderful time! (at @EnzoCustom Clothiers)https://www.swarmapp.com/c/lEgUm7pOC3k\n"}
{"timestamp": "16:58 - 29 mag 2016", "timestamp_ms": "1464566289000", "fullname": "\njdorf", "username": "@jdorf", "content": "\ntonight's entertainment... @ The Public Theater https://www.instagram.com/p/BGAqvywszr-/\n"}
{"timestamp": "16:57 - 29 mag 2016", "timestamp_ms": "1464566234000", "fullname": "\nCory Bonfiglio", "username": "@CoryBonfiglio", "content": "\nDrinking a Frambuesa I Chocolate by @hoftendormaal at @beerstreetny http://untp.beer/s/c317578872\n"}
{"timestamp": "16:57 - 29 mag 2016", "timestamp_ms": "1464566231000", "fullname": "\nJohny ", "username": "@tranceboy_johny", "content": "\nI'm at Zenon Taverna in Astoria, NY https://www.swarmapp.com/c/84nsiZ1EcSK\n"}

Cautions

If the number of tweets you scraping is relatively large, your ip will be blocked by Twitter, which means you can still use normal functions but advanced search techniques is impossible. Don't worry, block can be removed in a few days. My suggestion is to take turns to use proxies to build requests or connect a new WIFI after block.  

About

A web scraping program to scrape tweets from Twitter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages