a technical notebook
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

181 lines
8.0 KiB

  1. Tuesday, January 13
  2. ===================
  3. rtd / bus schedules / transit data
  4. ----------------------------------
  5. I'm taking the bus today, so I got to thinking about bus schedules. I use
  6. Google Calendar a little bit (out of habit and convenience more than any
  7. particular love), and I was thinking "why doesn't my calendar just know the
  8. times of transit routes I use?"
  9. I thought maybe there'd be, say, iCal (CalDAV? What is actually the thing?)
  10. data somewhere for a given RTD schedule, or failing that, maybe JSON or TSV or
  11. something. A cursory search doesn't turn up much, but I did find these:
  12. * http://www.rtd-denver.com/Developer.shtml
  13. * https://developers.google.com/transit/gtfs/reference?csw=1
  14. * http://www.rtd-denver.com/GoogleFeeder/
  15. * http://www.rtd-denver.com/GoogleFeeder/google_transit_Jan15_Runboard.zip
  16. I grabbed that last one.
  17. brennen@desiderata 16:16:43 /home/brennen ★ mkdir rtd && mv google_transit_Jan15_Runboard.zip rtd
  18. brennen@desiderata 16:16:51 /home/brennen ★ cd rtd
  19. brennen@desiderata 16:16:53 /home/brennen/rtd ★ unzip google_transit_Jan15_Runboard.zip
  20. Archive: google_transit_Jan15_Runboard.zip
  21. inflating: calendar.txt
  22. inflating: calendar_dates.txt
  23. inflating: agency.txt
  24. inflating: shapes.txt
  25. inflating: stop_times.txt
  26. inflating: trips.txt
  27. inflating: stops.txt
  28. inflating: routes.txt
  29. Ok, so this is pretty minimalist CSV stuff from the look of most of it.
  30. brennen@desiderata 16:22:12 /home/brennen/rtd ★ grep Lyons stops.txt
  31. 20921,Lyons PnR,Vehicles Travelling East, 40.223979,-105.270174,,,0
  32. So it looks like stops have an individual id?
  33. brennen@desiderata 16:24:41 /home/brennen/rtd ★ grep '20921' ./*.txt | wc -l
  34. 87
  35. A lot of this is noise, but:
  36. brennen@desiderata 16:26:23 /home/brennen/rtd ★ grep 20921 ./stop_times.txt
  37. 8711507,12:52:00,12:52:00,20921,43,,1,0,
  38. 8711508,11:32:00,11:32:00,20921,43,,1,0,
  39. 8711509,07:55:00,07:55:00,20921,43,,1,0,
  40. 8711512,16:41:00,16:41:00,20921,43,,1,0,
  41. 8711519,05:37:00,05:37:00,20921,3,,0,1,
  42. 8711517,16:47:00,16:47:00,20921,1,,0,1,
  43. 8711511,17:58:00,17:58:00,20921,43,,1,0,
  44. 8711514,13:02:00,13:02:00,20921,1,,0,1,
  45. 8711516,07:59:00,07:59:00,20921,1,,0,1,
  46. 8711515,11:42:00,11:42:00,20921,1,,0,1,
  47. 8711510,19:10:00,19:10:00,20921,43,,1,0,
  48. 8711513,18:05:00,18:05:00,20921,1,,0,1,
  49. 8711518,06:47:00,06:47:00,20921,1,,0,1,
  50. brennen@desiderata 16:26:57 /home/brennen/rtd ★ head -1 stop_times.txt
  51. trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
  52. So:
  53. brennen@desiderata 16:41:47 /home/brennen/code/rtd-tools (master) ★ grep ',20921,' ./stop_times.txt | cut -d, -f1,3 | sort -n
  54. 8711507,12:52:00
  55. 8711508,11:32:00
  56. 8711509,07:55:00
  57. 8711510,19:10:00
  58. 8711511,17:58:00
  59. 8711512,16:41:00
  60. 8711513,18:05:00
  61. 8711514,13:02:00
  62. 8711515,11:42:00
  63. 8711516,07:59:00
  64. 8711517,16:47:00
  65. 8711518,06:47:00
  66. 8711519,05:37:00
  67. That first number is a `trip_id`, the second one departure time. Trips
  68. are provided in `trips.txt`:
  69. brennen@desiderata 16:54:56 /home/brennen/code/rtd-tools (master) ★ head -2 trips.txt
  70. route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id
  71. 0,SA,8690507,Union Station,0, 0 2,793219
  72. I don't usually use `join` very much, but this seems like a logical place for
  73. it. It turns out that `join` wants its input sorted on the join field, so I do
  74. this:
  75. brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★ sort -t, -k1 stop_times.txt > stop_times.sorted.txt
  76. brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★ sort -t, -k3 trips.txt > trips.sorted.txt
  77. And then:
  78. brennen@desiderata 16:51:07 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921
  79. ,Y,WK,Lyons PnR,0, Y 16,79481043,,1,0,
  80. ,Y,WK,Lyons PnR,0, Y 16,79481043,,1,0,
  81. ,Y,WK,Lyons PnR,0, Y 15,79481043,,1,0,
  82. ,Y,WK,Lyons PnR,0, Y 41,79480943,,1,0,
  83. ,Y,WK,Lyons PnR,0, Y 41,79481043,,1,0,
  84. ,Y,WK,Lyons PnR,0, Y 41,79481043,,1,0,
  85. ,Y,WK,Boulder Transit Center,1, Y 41,794814
  86. ,Y,WK,Boulder Transit Center,1, Y 16,794812
  87. ,Y,WK,Boulder Transit Center,1, Y 16,794814
  88. ,Y,WK,Boulder Transit Center,1, Y 15,794812
  89. ,Y,WK,Boulder Transit Center,1, Y 41,794813
  90. ,Y,WK,Boulder Transit Center,1, Y 15,794813
  91. ,Y,WK,Boulder Transit Center,1, 206 1,794816
  92. Ok, waitasec. What the fuck is going on here? The string `20921` appears
  93. nowhere in these lines. It takes me too long to figure out that the
  94. text files have CRLF line-endings and this is messing with something in
  95. the chain (probably just output from `grep`, since it's obviously
  96. finding the string). So:
  97. brennen@desiderata 16:59:35 /home/brennen/code/rtd-tools (master) ★ dos2unix *.sorted.txt
  98. dos2unix: converting file stop_times.sorted.txt to Unix format ...
  99. dos2unix: converting file trips.sorted.txt to Unix format ...
  100. Why does `dos2unix` operate in-place on files instead of printing to STDOUT?
  101. It beats me, but I sure am glad I didn't run it on anything especially
  102. breakable. It _does_ do what you'd expect when piped to, anyway, which is
  103. probably what I should have done.
  104. So this seems to work:
  105. brennen@desiderata 17:04:45 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921
  106. 8711507,12:52:00,12:52:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 16,794810
  107. 8711508,11:32:00,11:32:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 16,794810
  108. 8711509,07:55:00,07:55:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 15,794810
  109. 8711510,19:10:00,19:10:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794809
  110. 8711511,17:58:00,17:58:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794810
  111. 8711512,16:41:00,16:41:00,20921,43,,1,0,,Y,WK,Lyons PnR,0, Y 41,794810
  112. 8711513,18:05:00,18:05:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 41,794814
  113. 8711514,13:02:00,13:02:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 16,794812
  114. 8711515,11:42:00,11:42:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 16,794814
  115. 8711516,07:59:00,07:59:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 15,794812
  116. 8711517,16:47:00,16:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 41,794813
  117. 8711518,06:47:00,06:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1, Y 15,794813
  118. 8711519,05:37:00,05:37:00,20921,3,,0,1,,Y,WK,Boulder Transit Center,1, 206 1,794816
  119. Which seems kind of right for the [South][southbound] &
  120. [Northbound][northbound] schedules, but they're weirdly intermingled. I think
  121. this pulls departure time and a `direction_id` field:
  122. brennen@desiderata 17:15:12 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | sort -n
  123. 05:37:00,1
  124. 06:47:00,1
  125. 07:55:00,0
  126. 07:59:00,1
  127. 11:32:00,0
  128. 11:42:00,1
  129. 12:52:00,0
  130. 13:02:00,1
  131. 16:41:00,0
  132. 16:47:00,1
  133. 17:58:00,0
  134. 18:05:00,1
  135. 19:10:00,0
  136. So southbound, I guess:
  137. brennen@desiderata 17:15:59 /home/brennen/code/rtd-tools (master) ★ join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | grep ',1' | sort -n
  138. 05:37:00,1
  139. 06:47:00,1
  140. 07:59:00,1
  141. 11:42:00,1
  142. 13:02:00,1
  143. 16:47:00,1
  144. 18:05:00,1
  145. This should probably be where I think oh, right, this is a Google spec - maybe
  146. there's [already some tooling](https://github.com/google/transitfeed). Failing
  147. that, slurping them into SQLite or something would be a lot less painful. Or
  148. at least using csvkit.
  149. [southbound]: http://www3.rtd-denver.com/schedules/getSchedule.action?runboardId=151&routeId=Y&routeType=12&direction=S-Bound&serviceType=3
  150. [northbound]: http://www3.rtd-denver.com/schedules/getSchedule.action?runboardId=151&routeId=Y&routeType=12&direction=N-Bound&serviceType=3