The goal of this project was to collect news headlines over one month, to perform an exploratory analysis of how the subject of Artificial Intelligence is being covered between different English-language news sources. The results can be viewed in the PDF file in this repository.
To set up the environment for this project, I activated a GCP account, which allowed me to receive $300 worth of credits, for a free trial. I opened a GCP shell, uploaded the Data_Collection and Sentiment_Analysis scripts, and created a python virtual environment. Then I installed the GCP version of apache beam, and the openai and newsapi libraries. To check the performance of the code, and to verify the number of workers, I enabled GCP’s Dataflow API, which created a dashboard to view relevant information and allowed the code to run on multiple workers in the Eastern US.