Thursday, May 16th, 9:00am - 4:00pm, CDIL.
- Critical understanding of text as data.
- Explore text analysis, machine learning, and visualization.
Follow along with the day's activities using Text as Data workshop site
- Overview of text as data
- Finding and preparing text for analysis
- Text repositories and OCR
- Text Analysis
- Word frequency and concordance with Voyant Tools
- Visualizing context with WordTree
- Break
- Machine Learning
- Natural Language Processing Natural Language Understanding Demo
- Sentiment Analysis with books Book Visualizations Sandbox
- Topic Modeling in-browser with jsLDA
- Exploring big data
- Google Books Ngram Viewer
- HathiTrust Research Center Portal
- Text generators
Readings
- Matt Jockers, The LDA Buffet is Now Open; or, Latent Dirichlet Allocation for English Majors (Topic Modeling "Fable")
- Ted Underwood, "Seven Ways Humanists are Using Computers to Understand Text" (2015). (intro overview of types of computational analysis)
- Ted Underwood and Jordan Sellers, "How Quickly Do Literary Standards Change?".
- article explicitly explains the research process, from collecting/selecting data to analysis. Published as a draft with more content, questions, and illustrations than are allowed in a traditional article. Also published traditionally as, Jordan Sellers and Ted Underwood, "The Longue Durée of Literary Prestige", MLQ 77:3 (2016). Additionally, a GitHub repository shares the code necessary to reproduce the analysis, paceofchange (2015).
- Extra: Cultural Analytics Now, ed. Dan Sinykin, post45 (2019).
- A collection of articles critically evaluating the current state of quantitative methods in DH, particularly literary studies. First article is a review of Underwood's most recent book, Distant Horizons.