Coronavirus (COVID-19) Academic Preprint Topic Modelling

This script does topic modelling on the latest academic pre-prints on coronavirus to see if there were any unusual patterns. Its a total experiment and I have written an article summarising the things I thought were interesting.

I go into more detail about the process, method and application in my blog post - 9 Coronavirus Research Trends using LDA and Topic Modelling

Stopwords Library

Coronavirus data collected from the results section of each pre-print listed in the Elsevier Novel Coronavirus Information Center accessed on March 1st 2020 - https://www.elsevier.com/connect/coronavirus-information-center

Stopwords Library

We will need the stopwords from NLTK and spacy’s en model for text pre-processing. Later, we will be using the spacy model for lemmatization.

Run in python console

import nltk; nltk.download('stopwords')

Run in terminal or command prompt

python3 -m spacy download en

NLP Libraries

This script requires a bunch of NLP libraries which I'm sure you will be able to download.

Mallet Download

You will also need to download Mallet, unzip and point to that folder directory in the python script: Download File: http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip mallet_path = '../mallet-2.0.8/bin/mallet' # update this path in the python file

Inspiration

This script was heavily inspired by this tutorial - https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/#2prerequisitesdownloadnltkstopwordsandspacymodelforlemmatization

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
08_03_20		08_03_20
Figure_1.png		Figure_1.png
README.md		README.md
coherence.png		coherence.png
coronavirus_research.csv		coronavirus_research.csv
coronavirus_research.zip		coronavirus_research.zip
data_out.csv		data_out.csv
data_out_bk.csv		data_out_bk.csv
data_out_impactful.csv		data_out_impactful.csv
data_out_median.csv		data_out_median.csv
data_out_median_recent.csv		data_out_median_recent.csv
data_out_pre.csv		data_out_pre.csv
data_out_recent.csv		data_out_recent.csv
document_for_topic.csv		document_for_topic.csv
dominant_topics.csv		dominant_topics.csv
lda_engine.py		lda_engine.py
output_filename.html		output_filename.html
sentence_for_topic.csv		sentence_for_topic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coronavirus (COVID-19) Academic Preprint Topic Modelling

Stopwords Library

Stopwords Library

Run in python console

Run in terminal or command prompt

NLP Libraries

Mallet Download

Inspiration

About

Releases

Packages

Languages

Raudaschl/coronvavirus_preprint_research_nlp

Folders and files

Latest commit

History

Repository files navigation

Coronavirus (COVID-19) Academic Preprint Topic Modelling

Stopwords Library

Stopwords Library

Run in python console

Run in terminal or command prompt

NLP Libraries

Mallet Download

Inspiration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages