This is an honest-to-goodness attempt to read the MTA turnstile data published weekly.
wget http://www.mta.info/developers/data/nyct/turnstile/turnstile_130803.txt
The goal would be to specialize ajschumacher's "Calendar view" visualization of MTA usage to each train station (e.g. 125st & Lexington) and line (e.g. D ).
The weekly metrocard swipes of the City are also available from 2010 onwards.
The Heisenberg Uncertainty principle says we can't say too much about where a particle is and where a particle is going at the same time. In the New Yorker, they combined turnstile data with census data to write an viz addressing income disparity.
Both data sets would make great use of mixture models to make guesses:
- Among the people who enter/exit Times Square, what fraction is taking the S-train?
- Among the people who enter/exit Grand Central Station who is taking the 6-train?
- At 125st Lexington Ave, how many people are heading uptown or downtown?
Theoretically the MTA data offers an unprecedented level of granularity:
- Every 4 hours
- Every station - or even station entrance - in NYC
- Weekdays, Saturday, Sunday
- Every week since June 2010
In order to read MTA's data set you need to understand lines like
R151,G009,STILLWELL AVE,DFNQ,BMT
and
A002,R051,02-00-00,
07-27-13,00:00:00,REGULAR,004209603,001443585,
07-27-13,04:00:00,REGULAR,004209643,001443593,
07-27-13,08:00:00,REGULAR,004209663,001443616,
07-27-13,12:00:00,REGULAR,004209741,001443687,
07-27-13,16:00:00,REGULAR,004210004,001443740,
07-27-13,20:00:00,REGULAR,004210276,001443777,
07-28-13,00:00:00,REGULAR,004210432,001443801,
07-28-13,04:00:00,REGULAR,004210472,001443805
Hopefully we recognize the 3rd & 4th items in R151,G009,STILLWELL AVE,DFNQ,BMT
. They say that
- The station name is Stillwell Ave
- The D,F,N,Q trains run through this station
- This is a BMT line
We New Yorkers can point this spot on a map! What about R151,G009
? These are the Remote and the Booth (and "Stillwell Ave" was the Station).
It is hard to decipher these two pieces of MTA jargon. Could be they kind of map to station entrances.
R032,R145,42 ST-TIMES SQ,1237ACENQRS,IRT
R032,A021,42 ST-TIMES SQ,1237ACENQRS,BMT
R032,R143,42 ST-TIMES SQ,ACENQRS1237,IRT
R032,R146,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R151,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R148,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R150,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R153,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R147,42 ST-TIMES SQ,1237ACENQRS,IRT
The 11 lines 1-2-3-7-A-C-E-N-Q-R-S pass thru here. There are two "remotes" with 4 and 5 "booths". This makes sense since Times Square is very large
R034,R174,125 ST,1,IRT
125 st - one of the busiest stations in the MTA system gets only 1 remote and 1 booth.
MTA's turnstile data set is comma-separated, with 3 items related to the station and turnstile, followed by groups of five.
A typical line has 3+5×8=43 items for 59st + Lexington Ave station on July 27, 2013.
A002,R051,02-00-00,
07-27-13,00:00:00,REGULAR,004209603,001443585,
07-27-13,04:00:00,REGULAR,004209643,001443593,
07-27-13,08:00:00,REGULAR,004209663,001443616,
07-27-13,12:00:00,REGULAR,004209741,001443687,
07-27-13,16:00:00,REGULAR,004210004,001443740,
07-27-13,20:00:00,REGULAR,004210276,001443777,
07-28-13,00:00:00,REGULAR,004210432,001443801,
07-28-13,04:00:00,REGULAR,004210472,001443805
- the first column is the date
- the second column is the time stamp. it usually checks every 4 hours
- the status is REGULAR
- the entries read "004209603" and the exits read "001443585"
- between 8am and 12am on July 23, 741-663=78 passengers entered the station (and 71 exited ) through this turnstile
The three info A002,R051,02-00-00
specify a single turnstile in the entire MTA system. Looking at remote-booth-station.csv
we find it was part of a group of 3 entrances, itself part of two sets of entrances.
R050,R244,59 ST,456NQR,IRT
R050,R244A,59 ST,456NQR,IRT
R050,A004,LEXINGTON AVE,456NQR,BMT
R051,R245,59 ST,456NQR,IRT
R051,R245A,59 ST,456NQR,IRT
R051,A002,LEXINGTON AVE,456NQR,BMT
This station has two names LEXINGTON AVE
and 59 ST
. This is a very busy station with passengers between Queens, the Upper East Side, Times Square, and Midtown East.
You are ready to read the MTA turns tile data. I suggest you stop reading and get your hands dirty. In about 30 mins to 1 hour, this section will make a lot more sense.
By now, we sort-of-have rough interpretations for Remote-Booth-Station
- Remote is a station entrance
- Booth probably has to do with subway booths
- Station is well...
REGULAR
is not the only code we can have and turnstiles don't necessarily report every 4 hours. I've seen status DOOR OPEN
, RECOVR AUD
, LOGON
and time stamps of 08:40:29
. These irregularities bend in knots your chances at visualizating MTA's turnstile data.