For example, this article is separated on two pages: https://stanforddailyarchive.com/cgi-bin/stanford?a=d&d=stanford20140106-01.2.5&e=-------en-20--1--txt-txIN-------# <img width="1130" alt="Screen Shot 2019-05-22 at 8 49 29 PM" src="https://user-images.githubusercontent.com/4939421/58224561-1db73380-7cd3-11e9-8fba-62e3e79e8d08.png"> But https://github.com/TheStanfordDaily/archives-text/blob/3e24b7ee6c55dac8fcff552e02119b502afd6f42/2014/01/06/MODSMD_ARTICLE4.article.txt only has the part that is on the first page. https://github.com/TheStanfordDaily/archives-text/blob/3e24b7ee6c55dac8fcff552e02119b502afd6f42/2014/01/06/MODSMD_ARTICLE4.article.txt#L1-L53