A book about the command line for humans.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

328 lines
14 KiB

10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
  1. 6. one of these things is not like the others
  2. =============================================
  3. If you're the sort of person who took a few detours into the history of
  4. religion in college, you might be familiar with some of the ways people used to
  5. do textual comparison. When pen, paper, and typesetting were what scholars had
  6. to work with, they did some fairly sophisticated things in order to expose the
  7. relationships between multiple pieces of text.
  8. -> <img src="images/throckmorton_small.jpg" height=320 width=470> <-
  9. Here's a book I got in college: _Gospel Parallels: A Comparison of the
  10. Synoptic Gospels_, Burton H. Throckmorton, Jr., Ed. It breaks up three books
  11. from the New Testament by the stories and themes that they contain, and shows
  12. the overlapping sections of each book that contain parallel texts. You can
  13. work your way through and see what parts only show up in one book, or in two
  14. but not the other, or in all three. Pages are arranged like so:
  15. <pre>
  16. § JESUS DOES SOME STUFF
  17. ________________________________________________
  18. | MAT | MAR | LUK |
  19. |-----------------+--------------------+---------|
  20. | Stuff | | |
  21. | | Stuff | |
  22. | | Stuff | Stuff |
  23. | | Stuff | |
  24. | | Stuff | |
  25. | | | |
  26. </pre>
  27. The way I understand it, a book like this one only scratches the surface of the
  28. field. Tools like this support a lot of theory about which books copied each
  29. other and how, and what other sources they might have copied that we've since
  30. lost.
  31. This is some _incredibly_ dry material, even if you kind of dig thinking about
  32. the questions it addresses. It takes a special temperament to actually sit
  33. poring over fragmentary texts in ancient languages and do these painstaking
  34. comparisons. Even if you're a writer or editor and work with a lot of
  35. revisions of a text, there's a good chance you rarely do this kind of
  36. comparison on your own work, because that shit is _tedious_.
  37. diff
  38. ----
  39. It turns out that academics aren't the only people who need tools for comparing
  40. different versions of a text. Working programmers, in fact, need to do this
  41. _constantly_. Programmers are also happiest when putting off the _actual_ task
  42. at hand to solve some incidental problem that cropped up along the way, so by
  43. now there are a lot of ways to say "here's how this file is different from this
  44. file", or "here's how this file is different from itself a year ago".
  45. Let's look at a couple of shell scripts from an earlier chapter:
  46. <!-- exec -->
  47. $ cat ../script/okpoems
  48. #!/bin/bash
  49. # find all the marker files and get the name of
  50. # the directory containing each
  51. find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
  52. exit 0
  53. <!-- end -->
  54. <!-- exec -->
  55. $ cat ../script/findprop
  56. #!/bin/bash
  57. if [ ! $1 ]
  58. then
  59. echo "usage: findprop <property>"
  60. exit
  61. fi
  62. # find all the marker files and get the name of
  63. # the directory containing each
  64. find ~/p1k3/archives -name $1 | xargs -n1 dirname
  65. exit 0
  66. <!-- end -->
  67. It's pretty obvious these are similar files, but do we know what _exactly_
  68. changed between them at a glance? It wouldn't be hard to figure out, once. If
  69. you wanted to be really certain about it, you could print them out, set them
  70. side by side, and go over them with a highlighter.
  71. Now imagine doing that for a bunch of files, some of them hundreds or thousands
  72. of lines long. I've actually done that before, colored markers and all, but I
  73. didn't feel smart while I was doing it. This is a job for software.
  74. <!-- exec -->
  75. $ diff ../script/okpoems ../script/findprop
  76. 2a3,8
  77. > if [ ! $1 ]
  78. > then
  79. > echo "usage: findprop <property>"
  80. > exit
  81. > fi
  82. >
  83. 5c11
  84. < find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
  85. ---
  86. > find ~/p1k3/archives -name $1 | xargs -n1 dirname
  87. <!-- end -->
  88. That's not the most human-friendly output, but it's a little simpler than it
  89. seems at first glance. It's basically just a way of describing the changes
  90. needed to turn `okpoems` into `findprop`. The string `2a3,8` can be read as
  91. "at line 2, add lines 3 through 8". Lines with a `>` in front of them are
  92. added. `5c11` can be read as "line 5 in the original file becomes line 11 in
  93. the new file", and the `<` line is replaced with the `>` line. If you wanted,
  94. you could take a copy of the original file and apply these instructions by hand
  95. in your text editor, and you'd wind up with the new file.
  96. A lot of people (me included) prefer what's known as a "unified" diff, because
  97. it's easier to read and offers context for the changed lines. We can ask for
  98. one of these with `diff -u`:
  99. <!-- exec -->
  100. $ diff -u ../script/okpoems ../script/findprop
  101. --- ../script/okpoems 2014-04-19 00:08:03.321230818 -0600
  102. +++ ../script/findprop 2014-04-21 21:51:29.360846449 -0600
  103. @@ -1,7 +1,13 @@
  104. #!/bin/bash
  105. +if [ ! $1 ]
  106. +then
  107. + echo "usage: findprop <property>"
  108. + exit
  109. +fi
  110. +
  111. # find all the marker files and get the name of
  112. # the directory containing each
  113. -find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
  114. +find ~/p1k3/archives -name $1 | xargs -n1 dirname
  115. exit 0
  116. <!-- end -->
  117. That's a little longer, and has some metadata we might not always care about,
  118. but if you look for lines starting with `+` and `-`, it's easy to read as
  119. "added these, took away these". This diff tells us at a glance that we added
  120. some lines to complain if we didn't get a command line argument, and replaced
  121. `'meta-ok-poem'` in the `find` command with that argument. Since it shows us
  122. some context, we have a pretty good idea where those lines are in the file
  123. and what they're for.
  124. What if we don't care exactly _how_ the files differ, but only whether they
  125. do?
  126. <!-- exec -->
  127. $ diff -q ../script/okpoems ../script/findprop
  128. Files ../script/okpoems and ../script/findprop differ
  129. <!-- end -->
  130. I use `diff` a lot in the course of my day job, because I spend a lot of time
  131. needing to know just how two programs differ. Just as importantly, I often
  132. need to know how (or whether!) the _output_ of programs differs. As a concrete
  133. example, I want to make sure that `findprop meta-ok-poem` is really a suitable
  134. replacement for `okpoems`. Since I expect their output to be identical, I can
  135. do this:
  136. <!-- exec -->
  137. $ ../script/okpoems > okpoem_output
  138. <!-- end -->
  139. <!-- exec -->
  140. $ ../script/findprop meta-ok-poem > findprop_output
  141. <!-- end -->
  142. <!-- exec -->
  143. $ diff -s okpoem_output findprop_output
  144. Files okpoem_output and findprop_output are identical
  145. <!-- end -->
  146. The `-s` just means that `diff` should explicitly tell us if files are the
  147. **s**ame. Otherwise, it'd output nothing at all, because there aren't any
  148. differences.
  149. As with many other tools, `diff` doesn't very much care whether it's looking at
  150. shell scripts or a list of filenames or what-have-you. If you read the man
  151. page, you'll find some features geared towards people writing C-like
  152. programming languages, but its real specialty is just text files with lines
  153. made out of characters, which works well for lots of code, but certainly could
  154. be applied to English prose.
  155. Since I have a couple of versions ready to hand, let's apply this to a text
  156. with some well-known variations and a bit of a literary legacy. Here's the
  157. first day of the Genesis creation narrative in a couple of English
  158. translations:
  159. <!-- exec -->
  160. $ cat genesis_nkj
  161. In the beginning God created the heavens and the earth. The earth was without
  162. form, and void; and darkness was on the face of the deep. And the Spirit of
  163. God was hovering over the face of the waters. Then God said, "Let there be
  164. light"; and there was light. And God saw the light, that it was good; and God
  165. divided the light from the darkness. God called the light Day, and the darkness
  166. He called Night. So the evening and the morning were the first day.
  167. <!-- end -->
  168. <!-- exec -->
  169. $ cat genesis_nrsv
  170. In the beginning when God created the heavens and the earth, the earth was a
  171. formless void and darkness covered the face of the deep, while a wind from
  172. God swept over the face of the waters. Then God said, "Let there be light";
  173. and there was light. And God saw that the light was good; and God separated
  174. the light from the darkness. God called the light Day, and the darkness he
  175. called Night. And there was evening and there was morning, the first day.
  176. <!-- end -->
  177. What happens if we diff them?
  178. <!-- exec -->
  179. $ diff -u genesis_nkj genesis_nrsv
  180. --- genesis_nkj 2014-05-11 16:28:29.692508461 -0600
  181. +++ genesis_nrsv 2014-05-11 16:28:29.744508459 -0600
  182. @@ -1,6 +1,6 @@
  183. -In the beginning God created the heavens and the earth. The earth was without
  184. -form, and void; and darkness was on the face of the deep. And the Spirit of
  185. -God was hovering over the face of the waters. Then God said, "Let there be
  186. -light"; and there was light. And God saw the light, that it was good; and God
  187. -divided the light from the darkness. God called the light Day, and the darkness
  188. -He called Night. So the evening and the morning were the first day.
  189. +In the beginning when God created the heavens and the earth, the earth was a
  190. +formless void and darkness covered the face of the deep, while a wind from
  191. +God swept over the face of the waters. Then God said, "Let there be light";
  192. +and there was light. And God saw that the light was good; and God separated
  193. +the light from the darkness. God called the light Day, and the darkness he
  194. +called Night. And there was evening and there was morning, the first day.
  195. <!-- end -->
  196. Kind of useless, right? If a given line differs by so much as a character,
  197. it's not the same line. This highlights the limitations of `diff` for comparing
  198. things that
  199. - aren't logically grouped by line
  200. - aren't easily thought of as versions of the same text with some lines changed
  201. We could edit the files into a more logically defined structure, like
  202. one-line-per-verse, and try again:
  203. <!-- exec -->
  204. $ diff -u genesis_nkj_by_verse genesis_nrsv_by_verse
  205. --- genesis_nkj_by_verse 2014-05-11 16:51:14.312457198 -0600
  206. +++ genesis_nrsv_by_verse 2014-05-11 16:53:02.484453134 -0600
  207. @@ -1,5 +1,5 @@
  208. -In the beginning God created the heavens and the earth.
  209. -The earth was without form, and void; and darkness was on the face of the deep. And the Spirit of God was hovering over the face of the waters.
  210. +In the beginning when God created the heavens and the earth,
  211. +the earth was a formless void and darkness covered the face of the deep, while a wind from God swept over the face of the waters.
  212. Then God said, "Let there be light"; and there was light.
  213. -And God saw the light, that it was good; and God divided the light from the darkness.
  214. -God called the light Day, and the darkness He called Night. So the evening and the morning were the first day.
  215. +And God saw that the light was good; and God separated the light from the darkness.
  216. +God called the light Day, and the darkness he called Night. And there was evening and there was morning, the first day.
  217. <!-- end -->
  218. It might be a little more descriptive, but editing all that text just for a
  219. quick comparison felt suspiciously like work, and anyway the output still
  220. doesn't seem very useful.
  221. wdiff
  222. -----
  223. For cases like this, I'm fond of a tool called `wdiff`:
  224. <!-- exec -->
  225. $ wdiff genesis_nkj genesis_nrsv
  226. In the beginning {+when+} God created the heavens and the [-earth. The-] {+earth, the+} earth was [-without
  227. form, and void;-] {+a
  228. formless void+} and darkness [-was on-] {+covered+} the face of the [-deep. And the Spirit of-] {+deep, while a wind from+}
  229. God [-was hovering-] {+swept+} over the face of the waters. Then God said, "Let there be light";
  230. and there was light. And God saw [-the light,-] that [-it-] {+the light+} was good; and God
  231. [-divided-] {+separated+}
  232. the light from the darkness. God called the light Day, and the darkness
  233. [-He-] {+he+}
  234. called Night. [-So the-] {+And there was+} evening and [-the morning were-] {+there was morning,+} the first day.
  235. <!-- end -->
  236. Deleted words are surrounded by `[- -]` and inserted ones by `{+ +}`. You can
  237. even ask it to spit out HTML tags for insertion and deletion...
  238. $ wdiff -w '<del>' -x '</del>' -y '<ins>' -z '</ins>' genesis_nkj genesis_nrsv
  239. ...and come up with something your browser will render like this:
  240. <blockquote>
  241. <p>In the beginning <ins>when</ins> God created the heavens and the <del>earth. The</del> <ins>earth, the</ins> earth was <del>without
  242. form, and void;</del> <ins>a
  243. formless void</ins> and darkness <del>was on</del> <ins>covered</ins> the face of the <del>deep. And the Spirit of</del> <ins>deep, while a wind from</ins>
  244. God <del>was hovering</del> <ins>swept</ins> over the face of the waters. Then God said, "Let there be light";
  245. and there was light. And God saw <del>the light,</del> that <del>it</del> <ins>the light</ins> was good; and God
  246. <del>divided</del> <ins>separated</ins>
  247. the light from the darkness. God called the light Day, and the darkness
  248. <del>He</del> <ins>he</ins>
  249. called Night. <del>So the</del> <ins>And there was</ins> evening and <del>the morning were</del> <ins>there was morning,</ins> the first day.</p>
  250. </blockquote>
  251. Burton H. Throckmorton, Jr. this ain't. Still, it has its uses.