Tuesday, January 13 =================== rtd / bus schedules / transit data ---------------------------------- I'm taking the bus today, so I got to thinking about bus schedules. I use Google Calendar a little bit (out of habit and convenience more than any particular love), and I was thinking "why doesn't my calendar just know the times of transit routes I use?" I thought maybe there'd be, say, iCal (CalDAV? What is actually the thing?) data somewhere for a given RTD schedule, or failing that, maybe JSON or TSV or something. A cursory search doesn't turn up much, but I did find these: * http://www.rtd-denver.com/Developer.shtml * https://developers.google.com/transit/gtfs/reference?csw=1 * http://www.rtd-denver.com/GoogleFeeder/ * http://www.rtd-denver.com/GoogleFeeder/google_transit_Jan15_Runboard.zip I grabbed that last one. brennen@desiderata 16:16:43 /home/brennen ★ mkdir rtd && mv google_transit_Jan15_Runboard.zip rtd brennen@desiderata 16:16:51 /home/brennen ★ cd rtd brennen@desiderata 16:16:53 /home/brennen/rtd ★ unzip google_transit_Jan15_Runboard.zip Archive: google_transit_Jan15_Runboard.zip inflating: calendar.txt inflating: calendar_dates.txt inflating: agency.txt inflating: shapes.txt inflating: stop_times.txt inflating: trips.txt inflating: stops.txt inflating: routes.txt Ok, so this is pretty minimalist CSV stuff from the look of most of it. brennen@desiderata 16:22:12 /home/brennen/rtd ★ grep Lyons stops.txt 20921,Lyons PnR,Vehicles Travelling East, 40.223979,-105.270174,,,0 So it looks like stops have an individual id? brennen@desiderata 16:24:41 /home/brennen/rtd ★ grep '20921' ./*.txt | wc -l 87 A lot of this is noise, but: brennen@desiderata 16:26:23 /home/brennen/rtd ★ grep 20921 ./stop_times.txt 8711507,12:52:00,12:52:00,20921,43,,1,0, 8711508,11:32:00,11:32:00,20921,43,,1,0, 8711509,07:55:00,07:55:00,20921,43,,1,0, 8711512,16:41:00,16:41:00,20921,43,,1,0, 8711519,05:37:00,05:37:00,20921,3,,0,1, 8711517,16:47:00,16:47:00,20921,1,,0,1, 8711511,17:58:00,17:58:00,20921,43,,1,0, 8711514,13:02:00,13:02:00,20921,1,,0,1, 8711516,07:59:00,07:59:00,20921,1,,0,1, 8711515,11:42:00,11:42:00,20921,1,,0,1, 8711510,19:10:00,19:10:00,20921,43,,1,0, 8711513,18:05:00,18:05:00,20921,1,,0,1, 8711518,06:47:00,06:47:00,20921,1,,0,1, brennen@desiderata 16:26:57 /home/brennen/rtd ★ head -1 stop_times.txt trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled So: brennen@desiderata 16:41:47 /home/brennen/code/rtd-tools (master) ★ grep ',20921,' ./stop_times.txt | cut -d, -f1,3 | sort -n 8711507,12:52:00 8711508,11:32:00 8711509,07:55:00 8711510,19:10:00 8711511,17:58:00 8711512,16:41:00 8711513,18:05:00 8711514,13:02:00 8711515,11:42:00 8711516,07:59:00 8711517,16:47:00 8711518,06:47:00 8711519,05:37:00 That first number is a `trip_id`, the second one departure time. Trips are provided in `trips.txt`: brennen@desiderata 16:54:56 /home/brennen/code/rtd-tools (master) ★ head -2 trips.txt route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id 0,SA,8690507,Union Station,0, 0 2,793219 I don't usually use `join` very much, but this seems like a logical place for it. It turns out that `join` wants its input sorted on the join field, so I do this: brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★ sort -t, -k1 stop_times.txt > stop_times.sorted.txt brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★ sort -t, -k3 trips.txt > trips.sorted.txt And then: brennen@desiderata 16:51:07 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 ,Y,WK,Lyons PnR,0, Y 16,79481043,,1,0, ,Y,WK,Lyons PnR,0, Y 16,79481043,,1,0, ,Y,WK,Lyons PnR,0, Y 15,79481043,,1,0, ,Y,WK,Lyons PnR,0, Y 41,79480943,,1,0, ,Y,WK,Lyons PnR,0, Y 41,79481043,,1,0, ,Y,WK,Lyons PnR,0, Y 41,79481043,,1,0, ,Y,WK,Boulder Transit Center,1, Y 41,794814 ,Y,WK,Boulder Transit Center,1, Y 16,794812 ,Y,WK,Boulder Transit Center,1, Y 16,794814 ,Y,WK,Boulder Transit Center,1, Y 15,794812 ,Y,WK,Boulder Transit Center,1, Y 41,794813 ,Y,WK,Boulder Transit Center,1, Y 15,794813 ,Y,WK,Boulder Transit Center,1, 206 1,794816 Ok, waitasec. What the fuck is going on here? The string `20921` appears nowhere in these lines. It takes me too long to figure out that the text files have CRLF line-endings and this is messing with something in the chain (probably just output from `grep`, since it's obviously finding the string). So: brennen@desiderata 16:59:35 /home/brennen/code/rtd-tools (master) ★ dos2unix *.sorted.txt dos2unix: converting file stop_times.sorted.txt to Unix format ... dos2unix: converting file trips.sorted.txt to Unix format ... Why does `dos2unix` operate in-place on files instead of printing to STDOUT? It beats me, but I sure am glad I didn't run it on anything especially breakable. It _does_ do what you'd expect when piped to, anyway, which is probably what I should have done. So this seems to work: brennen@desiderata 17:04:45 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 8711507,12:52:00,12:52:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 16,794810 8711508,11:32:00,11:32:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 16,794810 8711509,07:55:00,07:55:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 15,794810 8711510,19:10:00,19:10:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794809 8711511,17:58:00,17:58:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794810 8711512,16:41:00,16:41:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794810 8711513,18:05:00,18:05:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 41,794814 8711514,13:02:00,13:02:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 16,794812 8711515,11:42:00,11:42:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 16,794814 8711516,07:59:00,07:59:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 15,794812 8711517,16:47:00,16:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 41,794813 8711518,06:47:00,06:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 15,794813 8711519,05:37:00,05:37:00,20921,3,,0,1,,Y,WK,Boulder Transit Center,1, 206 1,794816 Which seems kind of right for the [South][southbound] & [Northbound][northbound] schedules, but they're weirdly intermingled. I think this pulls departure time and a `direction_id` field: brennen@desiderata 17:15:12 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | sort -n 05:37:00,1 06:47:00,1 07:55:00,0 07:59:00,1 11:32:00,0 11:42:00,1 12:52:00,0 13:02:00,1 16:41:00,0 16:47:00,1 17:58:00,0 18:05:00,1 19:10:00,0 So southbound, I guess: brennen@desiderata 17:15:59 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | grep ',1' | sort -n 05:37:00,1 06:47:00,1 07:59:00,1 11:42:00,1 13:02:00,1 16:47:00,1 18:05:00,1 This should probably be where I think oh, right, this is a Google spec - maybe there's [already some tooling](https://github.com/google/transitfeed). Failing that, slurping them into SQLite or something would be a lot less painful. Or at least using csvkit. [southbound]: http://www3.rtd-denver.com/schedules/getSchedule.action?runboardId=151&routeId=Y&routeType=12&direction=S-Bound&serviceType=3 [northbound]: http://www3.rtd-denver.com/schedules/getSchedule.action?runboardId=151&routeId=Y&routeType=12&direction=N-Bound&serviceType=3