I'm taking the bus today, so I got to thinking about bus schedules. I use Google Calendar a little bit (out of habit and convenience more than any particular love), and I was thinking "why doesn't my calendar just know the times of transit routes I use?"
I thought maybe there'd be, say, iCal (CalDAV? What is actually the thing?) data somewhere for a given RTD schedule, or failing that, maybe JSON or TSV or something. A cursory search doesn't turn up much, but I did find these:
I grabbed that last one.
brennen@desiderata 16:16:43 /home/brennen ★ mkdir rtd && mv google_transit_Jan15_Runboard.zip rtd
brennen@desiderata 16:16:51 /home/brennen ★ cd rtd
brennen@desiderata 16:16:53 /home/brennen/rtd ★ unzip google_transit_Jan15_Runboard.zip
Archive: google_transit_Jan15_Runboard.zip
inflating: calendar.txt
inflating: calendar_dates.txt
inflating: agency.txt
inflating: shapes.txt
inflating: stop_times.txt
inflating: trips.txt
inflating: stops.txt
inflating: routes.txt
Ok, so this is pretty minimalist CSV stuff from the look of most of it.
brennen@desiderata 16:22:12 /home/brennen/rtd ★ grep Lyons stops.txt
20921,Lyons PnR,Vehicles Travelling East, 40.223979,-105.270174,,,0
So it looks like stops have an individual id?
brennen@desiderata 16:24:41 /home/brennen/rtd ★ grep '20921' ./*.txt | wc -l
87
A lot of this is noise, but:
brennen@desiderata 16:26:23 /home/brennen/rtd ★ grep 20921 ./stop_times.txt
8711507,12:52:00,12:52:00,20921,43,,1,0,
8711508,11:32:00,11:32:00,20921,43,,1,0,
8711509,07:55:00,07:55:00,20921,43,,1,0,
8711512,16:41:00,16:41:00,20921,43,,1,0,
8711519,05:37:00,05:37:00,20921,3,,0,1,
8711517,16:47:00,16:47:00,20921,1,,0,1,
8711511,17:58:00,17:58:00,20921,43,,1,0,
8711514,13:02:00,13:02:00,20921,1,,0,1,
8711516,07:59:00,07:59:00,20921,1,,0,1,
8711515,11:42:00,11:42:00,20921,1,,0,1,
8711510,19:10:00,19:10:00,20921,43,,1,0,
8711513,18:05:00,18:05:00,20921,1,,0,1,
8711518,06:47:00,06:47:00,20921,1,,0,1,
brennen@desiderata 16:26:57 /home/brennen/rtd ★ head -1 stop_times.txt
trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
So:
brennen@desiderata 16:41:47 /home/brennen/code/rtd-tools (master) ★ grep ',20921,' ./stop_times.txt | cut -d, -f1,3 | sort -n
8711507,12:52:00
8711508,11:32:00
8711509,07:55:00
8711510,19:10:00
8711511,17:58:00
8711512,16:41:00
8711513,18:05:00
8711514,13:02:00
8711515,11:42:00
8711516,07:59:00
8711517,16:47:00
8711518,06:47:00
8711519,05:37:00
That first number is a trip_id
, the second one departure time. Trips
are provided in trips.txt
:
brennen@desiderata 16:54:56 /home/brennen/code/rtd-tools (master) ★ head -2 trips.txt
route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id
0,SA,8690507,Union Station,0, 0 2,793219
I don't usually use join
very much, but this seems like a logical place for
it. It turns out that join
wants its input sorted on the join field, so I do
this:
brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★ sort -t, -k1 stop_times.txt > stop_times.sorted.txt
brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★ sort -t, -k3 trips.txt > trips.sorted.txt
And then:
brennen@desiderata 16:51:07 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921
,Y,WK,Lyons PnR,0, Y 16,79481043,,1,0,
,Y,WK,Lyons PnR,0, Y 16,79481043,,1,0,
,Y,WK,Lyons PnR,0, Y 15,79481043,,1,0,
,Y,WK,Lyons PnR,0, Y 41,79480943,,1,0,
,Y,WK,Lyons PnR,0, Y 41,79481043,,1,0,
,Y,WK,Lyons PnR,0, Y 41,79481043,,1,0,
,Y,WK,Boulder Transit Center,1, Y 41,794814
,Y,WK,Boulder Transit Center,1, Y 16,794812
,Y,WK,Boulder Transit Center,1, Y 16,794814
,Y,WK,Boulder Transit Center,1, Y 15,794812
,Y,WK,Boulder Transit Center,1, Y 41,794813
,Y,WK,Boulder Transit Center,1, Y 15,794813
,Y,WK,Boulder Transit Center,1, 206 1,794816
Ok, waitasec. What the fuck is going on here? The string 20921
appears
nowhere in these lines. It takes me too long to figure out that the
text files have CRLF line-endings and this is messing with something in
the chain (probably just output from grep
, since it's obviously
finding the string). So:
brennen@desiderata 16:59:35 /home/brennen/code/rtd-tools (master) ★ dos2unix *.sorted.txt
dos2unix: converting file stop_times.sorted.txt to Unix format ...
dos2unix: converting file trips.sorted.txt to Unix format ...
Why does dos2unix
operate in-place on files instead of printing to STDOUT?
It beats me, but I sure am glad I didn't run it on anything especially
breakable. It does do what you'd expect when piped to, anyway, which is
probably what I should have done.
So this seems to work:
brennen@desiderata 17:04:45 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921
8711507,12:52:00,12:52:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 16,794810
8711508,11:32:00,11:32:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 16,794810
8711509,07:55:00,07:55:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 15,794810
8711510,19:10:00,19:10:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794809
8711511,17:58:00,17:58:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794810
8711512,16:41:00,16:41:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794810
8711513,18:05:00,18:05:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 41,794814
8711514,13:02:00,13:02:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 16,794812
8711515,11:42:00,11:42:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 16,794814
8711516,07:59:00,07:59:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 15,794812
8711517,16:47:00,16:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 41,794813
8711518,06:47:00,06:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 15,794813
8711519,05:37:00,05:37:00,20921,3,,0,1,,Y,WK,Boulder Transit Center,1, 206 1,794816
Which seems kind of right for the South &
Northbound schedules, but they're weirdly intermingled. I think
this pulls departure time and a direction_id
field:
brennen@desiderata 17:15:12 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | sort -n
05:37:00,1
06:47:00,1
07:55:00,0
07:59:00,1
11:32:00,0
11:42:00,1
12:52:00,0
13:02:00,1
16:41:00,0
16:47:00,1
17:58:00,0
18:05:00,1
19:10:00,0
So southbound, I guess:
brennen@desiderata 17:15:59 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | grep ',1' | sort -n
05:37:00,1
06:47:00,1
07:59:00,1
11:42:00,1
13:02:00,1
16:47:00,1
18:05:00,1
This should probably be where I think oh, right, this is a Google spec - maybe there's already some tooling. Failing that, slurping them into SQLite or something would be a lot less painful. Or at least using csvkit.