|
|
- Tuesday, January 13
- ===================
-
- rtd / bus schedules / transit data
- ----------------------------------
-
- I'm taking the bus today, so I got to thinking about bus schedules. I use
- Google Calendar a little bit (out of habit and convenience more than any
- particular love), and I was thinking "why doesn't my calendar just know the
- times of transit routes I use?"
-
- I thought maybe there'd be, say, iCal (CalDAV? What is actually the thing?)
- data somewhere for a given RTD schedule, or failing that, maybe JSON or TSV or
- something. A cursory search doesn't turn up much, but I did find these:
-
- * http://www.rtd-denver.com/Developer.shtml
- * https://developers.google.com/transit/gtfs/reference?csw=1
- * http://www.rtd-denver.com/GoogleFeeder/
- * http://www.rtd-denver.com/GoogleFeeder/google_transit_Jan15_Runboard.zip
-
- I grabbed that last one.
-
- brennen@desiderata 16:16:43 /home/brennen ★ mkdir rtd && mv google_transit_Jan15_Runboard.zip rtd
- brennen@desiderata 16:16:51 /home/brennen ★ cd rtd
- brennen@desiderata 16:16:53 /home/brennen/rtd ★ unzip google_transit_Jan15_Runboard.zip
- Archive: google_transit_Jan15_Runboard.zip
- inflating: calendar.txt
- inflating: calendar_dates.txt
- inflating: agency.txt
- inflating: shapes.txt
- inflating: stop_times.txt
- inflating: trips.txt
- inflating: stops.txt
- inflating: routes.txt
-
- Ok, so this is pretty minimalist CSV stuff from the look of most of it.
-
- brennen@desiderata 16:22:12 /home/brennen/rtd ★ grep Lyons stops.txt
- 20921,Lyons PnR,Vehicles Travelling East, 40.223979,-105.270174,,,0
-
- So it looks like stops have an individual id?
-
- brennen@desiderata 16:24:41 /home/brennen/rtd ★ grep '20921' ./*.txt | wc -l
- 87
-
- A lot of this is noise, but:
-
- brennen@desiderata 16:26:23 /home/brennen/rtd ★ grep 20921 ./stop_times.txt
- 8711507,12:52:00,12:52:00,20921,43,,1,0,
- 8711508,11:32:00,11:32:00,20921,43,,1,0,
- 8711509,07:55:00,07:55:00,20921,43,,1,0,
- 8711512,16:41:00,16:41:00,20921,43,,1,0,
- 8711519,05:37:00,05:37:00,20921,3,,0,1,
- 8711517,16:47:00,16:47:00,20921,1,,0,1,
- 8711511,17:58:00,17:58:00,20921,43,,1,0,
- 8711514,13:02:00,13:02:00,20921,1,,0,1,
- 8711516,07:59:00,07:59:00,20921,1,,0,1,
- 8711515,11:42:00,11:42:00,20921,1,,0,1,
- 8711510,19:10:00,19:10:00,20921,43,,1,0,
- 8711513,18:05:00,18:05:00,20921,1,,0,1,
- 8711518,06:47:00,06:47:00,20921,1,,0,1,
- brennen@desiderata 16:26:57 /home/brennen/rtd ★ head -1 stop_times.txt
- trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
-
- So:
-
- brennen@desiderata 16:41:47 /home/brennen/code/rtd-tools (master) ★ grep ',20921,' ./stop_times.txt | cut -d, -f1,3 | sort -n
- 8711507,12:52:00
- 8711508,11:32:00
- 8711509,07:55:00
- 8711510,19:10:00
- 8711511,17:58:00
- 8711512,16:41:00
- 8711513,18:05:00
- 8711514,13:02:00
- 8711515,11:42:00
- 8711516,07:59:00
- 8711517,16:47:00
- 8711518,06:47:00
- 8711519,05:37:00
-
- That first number is a `trip_id`, the second one departure time. Trips
- are provided in `trips.txt`:
-
- brennen@desiderata 16:54:56 /home/brennen/code/rtd-tools (master) ★ head -2 trips.txt
- route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id
- 0,SA,8690507,Union Station,0, 0 2,793219
-
- I don't usually use `join` very much, but this seems like a logical place for
- it. It turns out that `join` wants its input sorted on the join field, so I do
- this:
-
- brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★ sort -t, -k1 stop_times.txt > stop_times.sorted.txt
- brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★ sort -t, -k3 trips.txt > trips.sorted.txt
-
- And then:
-
- brennen@desiderata 16:51:07 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921
- ,Y,WK,Lyons PnR,0, Y 16,79481043,,1,0,
- ,Y,WK,Lyons PnR,0, Y 16,79481043,,1,0,
- ,Y,WK,Lyons PnR,0, Y 15,79481043,,1,0,
- ,Y,WK,Lyons PnR,0, Y 41,79480943,,1,0,
- ,Y,WK,Lyons PnR,0, Y 41,79481043,,1,0,
- ,Y,WK,Lyons PnR,0, Y 41,79481043,,1,0,
- ,Y,WK,Boulder Transit Center,1, Y 41,794814
- ,Y,WK,Boulder Transit Center,1, Y 16,794812
- ,Y,WK,Boulder Transit Center,1, Y 16,794814
- ,Y,WK,Boulder Transit Center,1, Y 15,794812
- ,Y,WK,Boulder Transit Center,1, Y 41,794813
- ,Y,WK,Boulder Transit Center,1, Y 15,794813
- ,Y,WK,Boulder Transit Center,1, 206 1,794816
-
- Ok, waitasec. What the fuck is going on here? The string `20921` appears
- nowhere in these lines. It takes me too long to figure out that the
- text files have CRLF line-endings and this is messing with something in
- the chain (probably just output from `grep`, since it's obviously
- finding the string). So:
-
- brennen@desiderata 16:59:35 /home/brennen/code/rtd-tools (master) ★ dos2unix *.sorted.txt
- dos2unix: converting file stop_times.sorted.txt to Unix format ...
- dos2unix: converting file trips.sorted.txt to Unix format ...
-
- Why does `dos2unix` operate in-place on files instead of printing to STDOUT?
- It beats me, but I sure am glad I didn't run it on anything especially
- breakable. It _does_ do what you'd expect when piped to, anyway, which is
- probably what I should have done.
-
- So this seems to work:
-
- brennen@desiderata 17:04:45 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921
- 8711507,12:52:00,12:52:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 16,794810
- 8711508,11:32:00,11:32:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 16,794810
- 8711509,07:55:00,07:55:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 15,794810
- 8711510,19:10:00,19:10:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794809
- 8711511,17:58:00,17:58:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794810
- 8711512,16:41:00,16:41:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794810
- 8711513,18:05:00,18:05:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 41,794814
- 8711514,13:02:00,13:02:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 16,794812
- 8711515,11:42:00,11:42:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 16,794814
- 8711516,07:59:00,07:59:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 15,794812
- 8711517,16:47:00,16:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 41,794813
- 8711518,06:47:00,06:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 15,794813
- 8711519,05:37:00,05:37:00,20921,3,,0,1,,Y,WK,Boulder Transit Center,1, 206 1,794816
-
- Which seems kind of right for the [South][southbound] &
- [Northbound][northbound] schedules, but they're weirdly intermingled. I think
- this pulls departure time and a `direction_id` field:
-
- brennen@desiderata 17:15:12 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | sort -n
- 05:37:00,1
- 06:47:00,1
- 07:55:00,0
- 07:59:00,1
- 11:32:00,0
- 11:42:00,1
- 12:52:00,0
- 13:02:00,1
- 16:41:00,0
- 16:47:00,1
- 17:58:00,0
- 18:05:00,1
- 19:10:00,0
-
- So southbound, I guess:
-
- brennen@desiderata 17:15:59 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | grep ',1' | sort -n
- 05:37:00,1
- 06:47:00,1
- 07:59:00,1
- 11:42:00,1
- 13:02:00,1
- 16:47:00,1
- 18:05:00,1
-
- This should probably be where I think oh, right, this is a Google spec - maybe
- there's [already some tooling](https://github.com/google/transitfeed). Failing
- that, slurping them into SQLite or something would be a lot less painful. Or
- at least using csvkit.
-
- [southbound]: http://www3.rtd-denver.com/schedules/getSchedule.action?runboardId=151&routeId=Y&routeType=12&direction=S-Bound&serviceType=3
- [northbound]: http://www3.rtd-denver.com/schedules/getSchedule.action?runboardId=151&routeId=Y&routeType=12&direction=N-Bound&serviceType=3
|