A book about the command line for humans.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2672 lines
95 KiB

7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
  1. <html lang=en>
  2. <head>
  3. <meta charset="utf-8">
  4. <title>userland: a book about the command line for humans</title>
  5. <link rel=stylesheet href="userland.css" />
  6. <script src="js/jquery.js" type="text/javascript"></script>
  7. </head>
  8. <body>
  9. <h1 class=bigtitle>userland</h1>
  10. <hr />
  11. <h1><a name=a-book-about-the-command-line-for-humans href=#a-book-about-the-command-line-for-humans>#</a> a book about the command line for humans</h1>
  12. <p>Late last year, <a href="//p1k3.com/2013/8/4">a side trip</a> into text utilities got me
  13. thinking about how much my writing habits depend on the Linux command line.
  14. This struck me as a good hook for talking about the tools I use every day
  15. with an audience of mixed technical background.</p>
  16. <p>So now I&rsquo;m writing a (short, haphazard) book. This isn&rsquo;t a book about system
  17. administration, or writing big software systems, or becoming a wizard. I am
  18. not a wizard, and I don&rsquo;t subscribe to the idea that wizardry is a requirement
  19. for using these tools. In fact I barely know what I&rsquo;m doing most of the time,
  20. but I still get some stuff done.</p>
  21. <p>My hope herein is to convey something useful to people who use computers every
  22. day, but for whom the command line environment seems mystifying, obscure, or
  23. generally uninviting. I intend to gloss over many complexities in favor of
  24. demonstrating a rough-and-ready toolset.</p>
  25. <p>This is a work in progress, and some sections may be unfinished or riddled with
  26. error. Incomplete sections will be marked with {notes in curly braces}.
  27. <a href="//p1k3.com/userland-book.git">p1k3.com/userland-book.git</a> should be considered
  28. the canonical git repo, but I&rsquo;m pushing everything to a <a href="https://github.com/brennen/userland-book">GitHub
  29. mirror</a>, and welcome feedback there.</p>
  30. <p>&ndash; bpb / <a href="//p1k3.com">p1k3</a> / <a href="https://twitter.com/brennen">@brennen</a></p>
  31. <div class=details>
  32. <h2 class=clicker><a name=copying href=#copying>#</a> copying</h2>
  33. <div class=full>
  34. <p>I may eventually dedicate this thing to the public domain, but for the time
  35. being please feel free to use it under the terms of Creative Commons BY-SA
  36. (Attribution / Share-Alike), whatever the latest version is. I promise I will
  37. not license it under more restrictive terms than that.</p>
  38. </div>
  39. </div>
  40. <div class=details>
  41. <h2 class=clicker><a name=contents href=#contents>#</a> contents</h2>
  42. <div class=full>
  43. <div class=contents><ul>
  44. <li><a href="#a-book-about-the-command-line-for-humans">a book about the command line for humans</a>
  45. <ul>
  46. <li><a href="#copying">copying</a></li>
  47. <li><a href="#contents">contents</a></li>
  48. </ul>
  49. </li>
  50. <li><a href="#the-command-line-as-literary-environment">1. the command line as literary environment</a>
  51. <ul>
  52. <li><a href="#terms-and-definitions">terms and definitions</a></li>
  53. <li><a href="#get-you-a-shell">get you a shell</a></li>
  54. <li><a href="#twisty-little-passages">twisty little passages</a></li>
  55. <li><a href="#cat">cat</a></li>
  56. <li><a href="#wildcards">wildcards</a></li>
  57. <li><a href="#sort">sort</a></li>
  58. <li><a href="#options">options</a></li>
  59. <li><a href="#uniq">uniq</a></li>
  60. <li><a href="#standard-IO">standard IO</a></li>
  61. <li><a href="#code-help-code-and-man-pages"><code>&ndash;help</code> and man pages</a></li>
  62. <li><a href="#wc">wc</a></li>
  63. <li><a href="#head-tail-and-cut">head, tail, and cut</a></li>
  64. <li><a href="#tab-separated-values">tab separated values</a></li>
  65. <li><a href="#finding-text-grep">finding text: grep</a></li>
  66. <li><a href="#now-you-have-n-problems-regex-and-rabbit-holes">now you have n problems: regex and rabbit holes</a></li>
  67. </ul>
  68. </li>
  69. <li><a href="#a-literary-problem">2. a literary problem</a></li>
  70. <li><a href="#programmerthink">3. programmerthink</a></li>
  71. <li><a href="#script">4. script</a>
  72. <ul>
  73. <li><a href="#learn-you-an-editor">learn you an editor</a></li>
  74. <li><a href="#d-i-y-utilities">d.i.y. utilities</a></li>
  75. <li><a href="#heavy-lifting">heavy lifting</a></li>
  76. <li><a href="#generality">generality</a></li>
  77. </ul>
  78. </li>
  79. <li><a href="#general-purpose-programmering">5. general purpose programmering</a></li>
  80. <li><a href="#one-of-these-things-is-not-like-the-others">6. one of these things is not like the others</a>
  81. <ul>
  82. <li><a href="#diff">diff</a></li>
  83. <li><a href="#wdiff">wdiff</a></li>
  84. </ul>
  85. </li>
  86. <li><a href="#the-internet-for-humans-and-how-the-command-line-can-help">7. the internet for humans, and how the command line can help</a>
  87. <ul>
  88. <li><a href="#reading-the-web">reading the web</a></li>
  89. <li><a href="#writing-the-web">writing the web</a></li>
  90. </ul>
  91. </li>
  92. <li><a href="#further-reading">8. further reading</a></li>
  93. </ul>
  94. </div>
  95. </div>
  96. </div>
  97. <hr />
  98. <h1><a name=the-command-line-as-literary-environment href=#the-command-line-as-literary-environment>#</a> 1. the command line as literary environment</h1>
  99. <p>There&rsquo;re a lot of ways to structure an introduction to the command line. I&rsquo;m
  100. going to start with writing as a point of departure because, aside from web
  101. development, it&rsquo;s what I use a computer for most. I want to shine a light on
  102. the humane potential of ideas that are usually understood as nerd trivia.
  103. Computers have utterly transformed the practice of writing within the space of
  104. my lifetime, but it seems to me that writers as a class miss out on many of the
  105. software tools and patterns taken as a given in more &ldquo;technical&rdquo; fields.</p>
  106. <p>Writing, particularly writing of any real scope or complexity, is very much a
  107. technical task. It makes demands, both physical and psychological, of its
  108. practitioners. As with woodworkers, graphic artists, and farmers, writers
  109. exhibit strong preferences in their tools, materials, and environment, and they
  110. do so because they&rsquo;re engaged in a physically and cognitively challenging task.</p>
  111. <p>My thesis is that the modern Linux command line is a pretty good environment
  112. for working with English prose and prosody, and that maybe this will illuminate
  113. the ways it could be useful in your own work with a computer, whatever that
  114. work happens to be.</p>
  115. <h2><a name=terms-and-definitions href=#terms-and-definitions>#</a> terms and definitions</h2>
  116. <p>What software are we actually talking about when we say &ldquo;the command line&rdquo;?</p>
  117. <p>For the purposes of this discussion, we&rsquo;re talking about an environment built
  118. on a very old paradigm called Unix.</p>
  119. <p style="text-align:center;"> <img src="images/jp_unix.jpg" height=320 width=470></p>
  120. <p>&hellip;except what classical Unix really looks like is this:</p>
  121. <p style="text-align:center;"> <img src="images/blinking.gif" width=470></p>
  122. <p>The Unix-like environment we&rsquo;re going to use isn&rsquo;t very classical, really.
  123. It&rsquo;s an operating system kernel called Linux, combined with a bunch of things
  124. written by other people (people in the GNU and Debian projects, and many
  125. others). Purists will tell you that this isn&rsquo;t properly Unix at all. In
  126. strict historical terms they&rsquo;re right, or at least a certain kind of right, but
  127. for the purposes of my cultural agenda I&rsquo;m going to ignore them right now.</p>
  128. <p style="text-align:center;"> <img src="images/debian.png"></p>
  129. <p>This is what&rsquo;s called a shell. There are many different shells, but they
  130. pretty much all operate on the same idea: You navigate a filesystem and run
  131. programs by typing commands. Commands can be combined in various ways to make
  132. programs of their own, and in fact the way you use the computer is often just
  133. to write little programs that invoke other programs, turtles-all-the-way-down
  134. style.</p>
  135. <p>The standard shell these days is something called Bash, so we&rsquo;ll use Bash.
  136. It&rsquo;s what you&rsquo;ll most often see in the wild. Like most shells, Bash is ugly
  137. and stupid in more ways than it is possible to easily summarize. It&rsquo;s also an
  138. incredibly powerful and expressive piece of software.</p>
  139. <h2><a name=get-you-a-shell href=#get-you-a-shell>#</a> get you a shell</h2>
  140. <p>{TODO: Make this section useful.}</p>
  141. <h2><a name=twisty-little-passages href=#twisty-little-passages>#</a> twisty little passages</h2>
  142. <p>Have you ever played a text-based adventure game or MUD, of the kind that
  143. describes a setting and takes commands for movement and so on? Readers of a
  144. certain age and temperament might recognize the opening of Crowther &amp; Woods'
  145. <em>Adventure</em>, the great-granddaddy of text adventure games:</p>
  148. DOWN A GULLY.
  149. &gt; GO EAST
  155. </code></pre>
  156. <p>In much the same way, you can think of the shell as a kind of environment you
  157. inhabit, the same way your character might inhabit an adventure game. Or as a
  158. sort of vehicle for getting around inside of computers. The difference is that
  159. instead of navigating around virtual rooms and hallways with commands like
  160. <code>LOOK</code> and <code>EAST</code>, you navigate between directories by typing commands like
  161. <code>ls</code> and <code>cd notes</code>:</p>
  162. <pre><code>$ ls
  163. code Downloads notes p1k3 photos scraps userland-book
  164. $ cd notes
  165. $ ls
  166. notes.txt sparkfun TODO.txt
  167. </code></pre>
  168. <p><code>ls</code> lists files. Some files are directories, which means they can contain
  169. other files, and you can step inside of them by typing <code>cd</code> (for <strong>c</strong>hange
  170. <strong>d</strong>irectory).</p>
  171. <p>In the Macintosh and Windows world, directories have been called
  172. &ldquo;folders&rdquo; for a long time now. This isn&rsquo;t the <em>worst</em> metaphor for what&rsquo;s
  173. going on, and it&rsquo;s so pervasive by now that it&rsquo;s not worth fighting about.
  174. It&rsquo;s also not exactly a <em>great</em> metaphor, since computer filesystems aren&rsquo;t
  175. built very much like the filing cabinets of yore. A directory acts a lot like
  176. a container of some sort, but it&rsquo;s an infinitely expandable one which may
  177. contain nested sub-spaces much larger than itself. Directories are frequently
  178. like the TARDIS: Bigger on the inside.</p>
  179. <h2><a name=cat href=#cat>#</a> cat</h2>
  180. <p>When you&rsquo;re in the shell, you have many tools at your disposal - programs that
  181. can be used on many different files, or chained together with other programs.
  182. They tend to have weird, cryptic names, but a lot of them do very simple
  183. things. Tasks that might be a menu item in a big program like Word, like
  184. counting the number of words in a document or finding a particular phrase, are
  185. often programs unto themselves. We&rsquo;ll start with something even more basic
  186. than that.</p>
  187. <p>Suppose you have some files, and you&rsquo;re curious what&rsquo;s in them. For example,
  188. suppose you&rsquo;ve got a list of authors you&rsquo;re planning to reference, and you just
  189. want to check its contents real quick-like. This is where our friend <code>cat</code>
  190. comes in:</p>
  191. <!-- exec -->
  192. <pre><code>$ cat authors_sff
  193. Ursula K. Le Guin
  194. Jo Walton
  195. Pat Cadigan
  196. John Ronald Reuel Tolkien
  197. Vanessa Veselka
  198. James Tiptree, Jr.
  199. John Brunner
  200. </code></pre>
  201. <!-- end -->
  202. <p>&ldquo;Why,&rdquo; you might be asking, &ldquo;is the command to dump out the contents of a file
  203. to a screen called <code>cat</code>? What do felines have to do with anything?&rdquo;</p>
  204. <p>It turns out that <code>cat</code> is actually short for &ldquo;concatenate&rdquo;, which is a long
  205. word basically meaning &ldquo;stick things together&rdquo;. In programming, we usually
  206. refer to sticking two bits of text together as &ldquo;string concatenation&rdquo;, probably
  207. because programmers like to feel like they&rsquo;re being very precise about very
  208. simple actions.</p>
  209. <p>Suppose you wanted to see the contents of a <em>set</em> of author lists:</p>
  210. <!-- exec -->
  211. <pre><code>$ cat authors_sff authors_contemporary_fic authors_nat_hist
  212. Ursula K. Le Guin
  213. Jo Walton
  214. Pat Cadigan
  215. John Ronald Reuel Tolkien
  216. Vanessa Veselka
  217. James Tiptree, Jr.
  218. John Brunner
  219. Eden Robinson
  220. Vanessa Veselka
  221. Miriam Toews
  222. Gwendolyn L. Waring
  223. </code></pre>
  224. <!-- end -->
  225. <h2><a name=wildcards href=#wildcards>#</a> wildcards</h2>
  226. <p>We&rsquo;re working with three filenames: <code>authors_sff</code>, <code>authors_contemporary_fic</code>,
  227. and <code>authors_nat_hist</code>. That&rsquo;s an awful lot of typing every time we want to do
  228. something to all three files. Fortunately, our shell offers a shorthand for
  229. &ldquo;all the files that start with <code>authors_</code>&rdquo;:</p>
  230. <!-- exec -->
  231. <pre><code>$ cat authors_*
  232. Eden Robinson
  233. Vanessa Veselka
  234. Miriam Toews
  235. Gwendolyn L. Waring
  236. Ursula K. Le Guin
  237. Jo Walton
  238. Pat Cadigan
  239. John Ronald Reuel Tolkien
  240. Vanessa Veselka
  241. James Tiptree, Jr.
  242. John Brunner
  243. </code></pre>
  244. <!-- end -->
  245. <p>In Bash-land, <code>*</code> basically means &ldquo;anything&rdquo;, and is known in the vernacular,
  246. somewhat poetically, as a &ldquo;wildcard&rdquo;. You should always be careful with
  247. wildcards, especially if you&rsquo;re doing anything destructive. They can and will
  248. surprise the unwary. Still, once you&rsquo;re used to the idea, they will save you a
  249. lot of RSI.</p>
  250. <h2><a name=sort href=#sort>#</a> sort</h2>
  251. <p>There&rsquo;s a problem here. Our author list is out of order, and thus confusing to
  252. reference. Fortunately, since one of the most basic things you can do to a
  253. list is to sort it, someone else has already solved this problem for us.
  254. Here&rsquo;s a command that will give us some organization:</p>
  255. <!-- exec -->
  256. <pre><code>$ sort authors_*
  257. Eden Robinson
  258. Gwendolyn L. Waring
  259. James Tiptree, Jr.
  260. John Brunner
  261. John Ronald Reuel Tolkien
  262. Jo Walton
  263. Miriam Toews
  264. Pat Cadigan
  265. Ursula K. Le Guin
  266. Vanessa Veselka
  267. Vanessa Veselka
  268. </code></pre>
  269. <!-- end -->
  270. <p>Does it bother you that they aren&rsquo;t sorted by last name? Me too. As a partial
  271. solution, we can ask <code>sort</code> to use the second &ldquo;field&rdquo; in each line as its sort
  272. <strong>k</strong>ey (by default, sort treats whitespace as a division between fields):</p>
  273. <!-- exec -->
  274. <pre><code>$ sort -k2 authors_*
  275. John Brunner
  276. Pat Cadigan
  277. Ursula K. Le Guin
  278. Gwendolyn L. Waring
  279. Eden Robinson
  280. John Ronald Reuel Tolkien
  281. James Tiptree, Jr.
  282. Miriam Toews
  283. Vanessa Veselka
  284. Vanessa Veselka
  285. Jo Walton
  286. </code></pre>
  287. <!-- end -->
  288. <p>That&rsquo;s closer, right? It sorted on &ldquo;Cadigan&rdquo; and &ldquo;Veselka&rdquo; instead of &ldquo;Pat&rdquo;
  289. and &ldquo;Vanessa&rdquo;. (Of course, it&rsquo;s still far from perfect, because the
  290. second field in each line isn&rsquo;t necessarily the person&rsquo;s last name.)</p>
  291. <h2><a name=options href=#options>#</a> options</h2>
  292. <p>Above, when we wanted to ask <code>sort</code> to behave differently, we gave it what is
  293. known as an option. Most programs with command-line interfaces will allow
  294. their behavior to be changed by adding various options. Options usually
  295. (but not always!) look like <code>-o</code> or <code>--option</code>.</p>
  296. <p>For example, if we wanted to see just the unique lines, irrespective of case,
  297. for a file called colors:</p>
  298. <!-- exec -->
  299. <pre><code>$ cat colors
  300. RED
  301. blue
  302. red
  303. BLUE
  304. Green
  305. green
  306. GREEN
  307. </code></pre>
  308. <!-- end -->
  309. <p>We could write this:</p>
  310. <!-- exec -->
  311. <pre><code>$ sort -uf colors
  312. blue
  313. Green
  314. RED
  315. </code></pre>
  316. <!-- end -->
  317. <p>Here <code>-u</code> stands for <strong>u</strong>nique and <code>-f</code> stands for <strong>f</strong>old case, which means
  318. to treat upper- and lower-case letters as the same for comparison purposes. You&rsquo;ll
  319. often see a group of short options following the <code>-</code> like this.</p>
  320. <h2><a name=uniq href=#uniq>#</a> uniq</h2>
  321. <p>Did you notice how Vanessa Veselka shows up twice in our list of authors?
  322. That&rsquo;s useful if we want to remember that she&rsquo;s in more than one category, but
  323. it&rsquo;s redundant if we&rsquo;re just worried about membership in the overall set of
  324. authors. We can make sure our list doesn&rsquo;t contain repeating lines by using
  325. <code>sort</code>, just like with that list of colors:</p>
  326. <!-- exec -->
  327. <pre><code>$ sort -u -k2 authors_*
  328. John Brunner
  329. Pat Cadigan
  330. Ursula K. Le Guin
  331. Gwendolyn L. Waring
  332. Eden Robinson
  333. John Ronald Reuel Tolkien
  334. James Tiptree, Jr.
  335. Miriam Toews
  336. Vanessa Veselka
  337. Jo Walton
  338. </code></pre>
  339. <!-- end -->
  340. <p>But there&rsquo;s another approach to this &ndash; <code>sort</code> is good at only displaying a line
  341. once, but suppose we wanted to see a count of how many different lists an
  342. author shows up on? <code>sort</code> doesn&rsquo;t do that, but a command called <code>uniq</code> does,
  343. if you give it the option <code>-c</code> for <strong>c</strong>ount.</p>
  344. <p><code>uniq</code> moves through the lines in its input, and if it sees a line more than
  345. once in sequence, it will only print that line once. If you have a bunch of
  346. files and you just want to see the unique lines across all of those files, you
  347. probably need to run them through <code>sort</code> first. How do you do that?</p>
  348. <!-- exec -->
  349. <pre><code>$ sort authors_* | uniq -c
  350. 1 Eden Robinson
  351. 1 Gwendolyn L. Waring
  352. 1 James Tiptree, Jr.
  353. 1 John Brunner
  354. 1 John Ronald Reuel Tolkien
  355. 1 Jo Walton
  356. 1 Miriam Toews
  357. 1 Pat Cadigan
  358. 1 Ursula K. Le Guin
  359. 2 Vanessa Veselka
  360. </code></pre>
  361. <!-- end -->
  362. <h2><a name=standard-IO href=#standard-IO>#</a> standard IO</h2>
  363. <p>The <code>|</code> is called a &ldquo;pipe&rdquo;. In the command above, it tells your shell that
  364. instead of printing the output of <code>sort authors_*</code> right to your terminal, it
  365. should send it to <code>uniq -c</code>.</p>
  366. <p style="text-align:center;"> <img src="images/pipe.gif"></p>
  367. <p>Pipes are some of the most important magic in the shell. When the people who
  368. built Unix in the first place give interviews about the stuff they remember
  369. from the early days, a lot of them reminisce about the invention of pipes and
  370. all of the new stuff it immediately made possible.</p>
  371. <p>Pipes help you control a thing called &ldquo;standard IO&rdquo;. In the world of the
  372. command line, programs take <strong>i</strong>nput and produce <strong>o</strong>utput. A pipe is a way
  373. to hook the output from one program to the input of another.</p>
  374. <p>Unlike a lot of the weirdly named things you&rsquo;ll encounter in software, the
  375. metaphor here is obvious and makes pretty good sense. It even kind of looks
  376. like a physical pipe.</p>
  377. <p>What if, instead of sending the output of one program to the input of another,
  378. you&rsquo;d like to store it in a file for later use?</p>
  379. <p>Check it out:</p>
  380. <!-- exec -->
  381. <pre><code>$ sort authors_* | uniq &gt; ./all_authors
  382. </code></pre>
  383. <!-- end -->
  384. <!-- exec -->
  385. <pre><code>$ cat all_authors
  386. Eden Robinson
  387. Gwendolyn L. Waring
  388. James Tiptree, Jr.
  389. John Brunner
  390. John Ronald Reuel Tolkien
  391. Jo Walton
  392. Miriam Toews
  393. Pat Cadigan
  394. Ursula K. Le Guin
  395. Vanessa Veselka
  396. </code></pre>
  397. <!-- end -->
  398. <p>I like to think of the <code>&gt;</code> as looking like a little funnel. It can be
  399. dangerous &ndash; you should always make sure that you&rsquo;re not going to clobber
  400. an existing file you actually want to keep.</p>
  401. <p>If you want to tack more stuff on to the end of an existing file, you can use
  402. <code>&gt;&gt;</code> instead. To test that, let&rsquo;s use <code>echo</code>, which prints out whatever string
  403. you give it on a line by itself:</p>
  404. <!-- exec -->
  405. <pre><code>$ echo 'hello' &gt; hello_world
  406. </code></pre>
  407. <!-- end -->
  408. <!-- exec -->
  409. <pre><code>$ echo 'world' &gt;&gt; hello_world
  410. </code></pre>
  411. <!-- end -->
  412. <!-- exec -->
  413. <pre><code>$ cat hello_world
  414. hello
  415. world
  416. </code></pre>
  417. <!-- end -->
  418. <p>You can also take a file and pull it directly back into the input of a given
  419. program, which is a bit like a funnel going the other direction:</p>
  420. <!-- exec -->
  421. <pre><code>$ nl &lt; all_authors
  422. 1 Eden Robinson
  423. 2 Gwendolyn L. Waring
  424. 3 James Tiptree, Jr.
  425. 4 John Brunner
  426. 5 John Ronald Reuel Tolkien
  427. 6 Jo Walton
  428. 7 Miriam Toews
  429. 8 Pat Cadigan
  430. 9 Ursula K. Le Guin
  431. 10 Vanessa Veselka
  432. </code></pre>
  433. <!-- end -->
  434. <p><code>nl</code> is just a way to <strong>n</strong>umber <strong>l</strong>ines. This command accomplishes pretty much
  435. the same thing as <code>cat all_authors | nl</code>, or <code>nl all_authors</code>. You won&rsquo;t see
  436. it used as often as <code>|</code> and <code>&gt;</code>, since most utilities can read files on their
  437. own, but it can save you typing <code>cat</code> quite as often.</p>
  438. <p>We&rsquo;ll use these features liberally from here on out.</p>
  439. <h2><a name=code-help-code-and-man-pages href=#code-help-code-and-man-pages>#</a> <code>--help</code> and man pages</h2>
  440. <p>You can change the behavior of most tools by giving them different options.
  441. This is all well and good if you already know what options are available,
  442. but what if you don&rsquo;t?</p>
  443. <p>Often, you can ask the tool itself:</p>
  444. <pre><code>$ sort --help
  445. Usage: sort [OPTION]... [FILE]...
  446. or: sort [OPTION]... --files0-from=F
  447. Write sorted concatenation of all FILE(s) to standard output.
  448. Mandatory arguments to long options are mandatory for short options too.
  449. Ordering options:
  450. -b, --ignore-leading-blanks ignore leading blanks
  451. -d, --dictionary-order consider only blanks and alphanumeric characters
  452. -f, --ignore-case fold lower case to upper case characters
  453. -g, --general-numeric-sort compare according to general numerical value
  454. -i, --ignore-nonprinting consider only printable characters
  455. -M, --month-sort compare (unknown) &lt; 'JAN' &lt; ... &lt; 'DEC'
  456. -h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)
  457. -n, --numeric-sort compare according to string numerical value
  458. -R, --random-sort sort by random hash of keys
  459. --random-source=FILE get random bytes from FILE
  460. -r, --reverse reverse the result of comparisons
  461. </code></pre>
  462. <p>&hellip;and so on. (It goes on for a while in this vein.)</p>
  463. <p>If that doesn&rsquo;t work, or doesn&rsquo;t provide enough info, the next thing to try is
  464. called a man page. (&ldquo;man&rdquo; is short for &ldquo;manual&rdquo;. It&rsquo;s sort of an unfortunate
  465. abbreviation.)</p>
  466. <pre><code>$ man sort
  467. SORT(1) User Commands SORT(1)
  468. NAME
  469. sort - sort lines of text files
  471. sort [OPTION]... [FILE]...
  472. sort [OPTION]... --files0-from=F
  474. Write sorted concatenation of all FILE(s) to standard output.
  475. </code></pre>
  476. <p>&hellip;and so on. Manual pages vary in quality, and it can take a while to get
  477. used to reading them, but they&rsquo;re very often the best place to look for help.</p>
  478. <p>If you&rsquo;re not sure what <em>program</em> you want to use to solve a given problem, you
  479. might try searching all the man pages on the system for a keyword. <code>man</code>
  480. itself has an option to let you do this - <code>man -k keyword</code> - but most systems
  481. also have a shortcut called <code>apropos</code>, which I like to use because it&rsquo;s easy to
  482. remember if you imagine yourself saying &ldquo;apropos of [some problem I have]&hellip;&rdquo;</p>
  483. <!-- exec -->
  484. <pre><code>$ apropos -s1 sort
  485. apt-sortpkgs (1) - Utility to sort package index files
  486. bunzip2 (1) - a block-sorting file compressor, v1.0.6
  487. bzip2 (1) - a block-sorting file compressor, v1.0.6
  488. comm (1) - compare two sorted files line by line
  489. sort (1) - sort lines of text files
  490. tsort (1) - perform topological sort
  491. </code></pre>
  492. <!-- end -->
  493. <p>It&rsquo;s useful to know that the manual represented by <code>man</code> has numbered sections
  494. for different kinds of manual pages. Most of what the average user needs to
  495. know about lives in section 1, &ldquo;User Commands&rdquo;, so you&rsquo;ll often see the names
  496. of different tools written like <code>sort(1)</code> or <code>cat(1)</code>. This can be a good way
  497. to make it clear in writing that you&rsquo;re talking about a specific piece of
  498. software rather than a verb or a small carnivorous mammal. (I specified <code>-s1</code>
  499. for section 1 above just to cut down on clutter, though in practice I usually
  500. don&rsquo;t bother.)</p>
  501. <p>Like other literary traditions, Unix is littered with this sort of convention.
  502. This one just happens to date from a time when the manual was still a physical
  503. book.</p>
  504. <h2><a name=wc href=#wc>#</a> wc</h2>
  505. <p><code>wc</code> stands for <strong>w</strong>ord <strong>c</strong>ount. It does about what you&rsquo;d expect - it
  506. counts the number of words in its input.</p>
  507. <pre><code>$ wc index.md
  508. 736 4117 24944 index.md
  509. </code></pre>
  510. <p>736 is the number of lines, 4117 the number of words, and 24944 the number of
  511. characters in the file I&rsquo;m writing right now. I use this constantly. Most
  512. obviously, it&rsquo;s a good way to get an idea of how much you&rsquo;ve written. <code>wc</code> is
  513. the tool I used to track my progress the last time I tried National Novel
  514. Writing Month:</p>
  515. <pre><code>$ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | grep total
  516. 6585 total
  517. </code></pre>
  518. <!-- exec -->
  519. <pre><code>$ cowsay 'embarrassing.'
  520. _______________
  521. &lt; embarrassing. &gt;
  522. ---------------
  523. \ ^__^
  524. \ (oo)\_______
  525. (__)\ )\/\
  526. ||----w |
  527. || ||
  528. </code></pre>
  529. <!-- end -->
  530. <p>Anyway. The less obvious thing about <code>wc</code> is that you can use it to count the
  531. output of other commands. Want to know <em>how many</em> unique authors we have?</p>
  532. <!-- exec -->
  533. <pre><code>$ sort authors_* | uniq | wc -l
  534. 10
  535. </code></pre>
  536. <!-- end -->
  537. <p>This kind of thing is trivial, but it comes in handy more often than you might
  538. think.</p>
  539. <h2><a name=head-tail-and-cut href=#head-tail-and-cut>#</a> head, tail, and cut</h2>
  540. <p>Remember our old pal <code>cat</code>, which just splats everything it&rsquo;s given back to
  541. standard output?</p>
  542. <p>Sometimes you&rsquo;ve got a piece of output that&rsquo;s more than you actually want to
  543. deal with at once. Maybe you just want to glance at the first few lines in a
  544. file:</p>
  545. <!-- exec -->
  546. <pre><code>$ head -3 colors
  547. RED
  548. blue
  549. red
  550. </code></pre>
  551. <!-- end -->
  552. <p>&hellip;or maybe you want to see the last thing in a list:</p>
  553. <!-- exec -->
  554. <pre><code>$ sort colors | uniq -i | tail -1
  555. red
  556. </code></pre>
  557. <!-- end -->
  558. <p>&hellip;or maybe you&rsquo;re only interested in the first &ldquo;field&rdquo; in some list. You might
  559. use <code>cut</code> here, asking it to treat spaces as delimiters between fields and
  560. return only the first field for each line of its input:</p>
  561. <!-- exec -->
  562. <pre><code>$ cut -d' ' -f1 ./authors_*
  563. Eden
  564. Vanessa
  565. Miriam
  566. Gwendolyn
  567. Ursula
  568. Jo
  569. Pat
  570. John
  571. Vanessa
  572. James
  573. John
  574. </code></pre>
  575. <!-- end -->
  576. <p>Suppose we&rsquo;re curious what the few most commonly occurring first names on our
  577. author list are? Here&rsquo;s an approach, silly but effective, that combines a lot
  578. of what we&rsquo;ve discussed so far and looks like plenty of one-liners I wind up
  579. writing in real life:</p>
  580. <!-- exec -->
  581. <pre><code>$ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
  582. 1 Ursula
  583. 2 John
  584. 2 Vanessa
  585. </code></pre>
  586. <!-- end -->
  587. <p>Let&rsquo;s walk through this one step by step:</p>
  588. <p>First, we have <code>cut</code> extract the first field of each line in our author lists.</p>
  589. <pre><code>cut -d' ' -f1 ./authors_*
  590. </code></pre>
  591. <p>Then we sort these results</p>
  592. <pre><code>| sort
  593. </code></pre>
  594. <p>and pass them to <code>uniq</code>, asking it for a case-insensitive count of each
  595. repeated line</p>
  596. <pre><code>| uniq -ci
  597. </code></pre>
  598. <p>then sort again, numerically,</p>
  599. <pre><code>| sort -n
  600. </code></pre>
  601. <p>and finally, we chop off everything but the last three lines:</p>
  602. <pre><code>| tail -3
  603. </code></pre>
  604. <p>If you wanted to make sure to count an individual author&rsquo;s first name
  605. only once, even if that author appears more than once in the files,
  606. you could instead do:</p>
  607. <!-- exec -->
  608. <pre><code>$ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
  609. 1 Ursula
  610. 1 Vanessa
  611. 2 John
  612. </code></pre>
  613. <!-- end -->
  614. <h2><a name=tab-separated-values href=#tab-separated-values>#</a> tab separated values</h2>
  615. <p>Notice above how we had to tell <code>cut</code> that &ldquo;fields&rdquo; in <code>authors_*</code> are
  616. delimited by spaces? It turns out that if you don&rsquo;t use <code>-d</code>, <code>cut</code> defaults
  617. to using tab characters for a delimiter.</p>
  618. <p>Tab characters are sort of weird little animals. You can&rsquo;t usually <em>see</em> them
  619. directly &ndash; they&rsquo;re like a space character that takes up more than one space
  620. when displayed. By convention, one tab is usually rendered as 8 spaces, but
  621. it&rsquo;s up to the software that&rsquo;s displaying the character what it wants to do.</p>
  622. <p>(In fact, it&rsquo;s more complicated than that: Tabs are often rendered as marking
  623. <em>tab stops</em>, which is a concept I remember from 7th grade typing classes, but
  624. haven&rsquo;t actually thought about in my day-to-day life for nearly 20 years.)</p>
  625. <p>Here&rsquo;s a version of our <code>all_authors</code> that&rsquo;s been rearranged so that the first
  626. field is the author&rsquo;s last name, the second is their first name, the third is
  627. their middle name or initial (if we know it) and the fourth is any suffix.
  628. Fields are separated by a single tab character:</p>
  629. <!-- exec -->
  630. <pre><code>$ cat all_authors.tsv
  631. Robinson Eden
  632. Waring Gwendolyn L.
  633. Tiptree James Jr.
  634. Brunner John
  635. Tolkien John Ronald Reuel
  636. Walton Jo
  637. Toews Miriam
  638. Cadigan Pat
  639. Le Guin Ursula K.
  640. Veselka Vanessa
  641. </code></pre>
  642. <!-- end -->
  643. <p>That looks kind of garbled, right? In order to make it a little more obvious
  644. what&rsquo;s happening, let&rsquo;s use <code>cat -T</code>, which displays tab characters as <code>^I</code>:</p>
  645. <!-- exec -->
  646. <pre><code>$ cat -T all_authors.tsv
  647. Robinson^IEden
  648. Waring^IGwendolyn^IL.
  649. Tiptree^IJames^I^IJr.
  650. Brunner^IJohn
  651. Tolkien^IJohn^IRonald Reuel
  652. Walton^IJo
  653. Toews^IMiriam
  654. Cadigan^IPat
  655. Le Guin^IUrsula^IK.
  656. Veselka^IVanessa
  657. </code></pre>
  658. <!-- end -->
  659. <p>It looks odd when displayed because some names are at or nearly at 8 characters long.
  660. &ldquo;Robinson&rdquo;, at 8 characters, overshoots the first tab stop, so &ldquo;Eden&rdquo; gets indented
  661. further than other first names, and so on.</p>
  662. <p>Fortunately, in order to make this more human-readable, we can pass it through
  663. <code>expand</code>, which turns tabs into a given number of spaces (8 by default):</p>
  664. <!-- exec -->
  665. <pre><code>$ expand -t14 all_authors.tsv
  666. Robinson Eden
  667. Waring Gwendolyn L.
  668. Tiptree James Jr.
  669. Brunner John
  670. Tolkien John Ronald Reuel
  671. Walton Jo
  672. Toews Miriam
  673. Cadigan Pat
  674. Le Guin Ursula K.
  675. Veselka Vanessa
  676. </code></pre>
  677. <!-- end -->
  678. <p>Now it&rsquo;s easy to sort by last name:</p>
  679. <!-- exec -->
  680. <pre><code>$ sort -k1 all_authors.tsv | expand -t14
  681. Brunner John
  682. Cadigan Pat
  683. Le Guin Ursula K.
  684. Robinson Eden
  685. Tiptree James Jr.
  686. Toews Miriam
  687. Tolkien John Ronald Reuel
  688. Veselka Vanessa
  689. Walton Jo
  690. Waring Gwendolyn L.
  691. </code></pre>
  692. <!-- end -->
  693. <p>Or just extract middle names and initials:</p>
  694. <!-- exec -->
  695. <pre><code>$ cut -f3 all_authors.tsv | grep .
  696. L.
  697. Ronald Reuel
  698. K.
  699. </code></pre>
  700. <!-- end -->
  701. <p>It probably won&rsquo;t surprise you to learn that there&rsquo;s a corresponding <code>paste</code>
  702. command, which takes two or more files and stitches them together with tab
  703. characters. Let&rsquo;s extract a couple of things from our author list and put them
  704. back together in a different order:</p>
  705. <!-- exec -->
  706. <pre><code>$ cut -f1 all_authors.tsv &gt; lastnames
  707. </code></pre>
  708. <!-- end -->
  709. <!-- exec -->
  710. <pre><code>$ cut -f2 all_authors.tsv &gt; firstnames
  711. </code></pre>
  712. <!-- end -->
  713. <!-- exec -->
  714. <pre><code>$ paste firstnames lastnames | sort -k2 | expand -t12
  715. John Brunner
  716. Pat Cadigan
  717. Ursula Le Guin
  718. Eden Robinson
  719. James Tiptree
  720. Miriam Toews
  721. John Tolkien
  722. Vanessa Veselka
  723. Jo Walton
  724. Gwendolyn Waring
  725. </code></pre>
  726. <!-- end -->
  727. <p>As these examples show, TSV is something very like a primitive spreadsheet: A
  728. way to represent information in columns and rows. In fact, it&rsquo;s a close cousin
  729. of CSV, which is often used as a lowest-common-denominator format for
  730. transferring spreadsheets, and which represents data something like this:</p>
  731. <pre><code>last,first,middle,suffix
  732. Tolkien,John,Ronald Reuel,
  733. Tiptree,James,,Jr.
  734. </code></pre>
  735. <p>The advantage of tabs is that they&rsquo;re supported by a bunch of the standard
  736. tools. A disadvantage is that they&rsquo;re kind of ugly and can be weird to deal
  737. with, but they&rsquo;re useful anyway, and character-delimited rows are often a
  738. good-enough way to hack your way through problems that call for basic
  739. structure.</p>
  740. <h2><a name=finding-text-grep href=#finding-text-grep>#</a> finding text: grep</h2>
  741. <p>After all those contortions, what if you actually just want to see <em>which lists</em>
  742. an individual author appears on?</p>
  743. <!-- exec -->
  744. <pre><code>$ grep 'Vanessa' ./authors_*
  745. ./authors_contemporary_fic:Vanessa Veselka
  746. ./authors_sff:Vanessa Veselka
  747. </code></pre>
  748. <!-- end -->
  749. <p><code>grep</code> takes a string to search for and, optionally, a list of files to search
  750. in. If you don&rsquo;t specify files, it&rsquo;ll look through standard input instead:</p>
  751. <!-- exec -->
  752. <pre><code>$ cat ./authors_* | grep 'Vanessa'
  753. Vanessa Veselka
  754. Vanessa Veselka
  755. </code></pre>
  756. <!-- end -->
  757. <p>Most of the time, piping the output of <code>cat</code> to <code>grep</code> is considered silly,
  758. because <code>grep</code> knows how to find things in files on its own. Many thousands of
  759. words have been written on this topic by leading lights of the nerd community.</p>
  760. <p>You&rsquo;ve probably noticed that this result doesn&rsquo;t contain filenames (and thus
  761. isn&rsquo;t very useful to us). That&rsquo;s because all <code>grep</code> saw was the lines in the
  762. files, not the names of the files themselves.</p>
  763. <h2><a name=now-you-have-n-problems-regex-and-rabbit-holes href=#now-you-have-n-problems-regex-and-rabbit-holes>#</a> now you have n problems: regex and rabbit holes</h2>
  764. <p>To close out this introductory chapter, let&rsquo;s spend a little time on a topic
  765. that will likely vex, confound, and (occasionally) delight you for as long as
  766. you are acquainted with the command line.</p>
  767. <p>When I was talking about <code>grep</code> a moment ago, I fudged the details more than a
  768. little by saying that it expects a string to search for. What <code>grep</code>
  769. <em>actually</em> expects is a <em>pattern</em>. Moreover, it expects a specific kind of
  770. pattern, what&rsquo;s known as a <em>regular expression</em>, a cumbersome phrase frequently
  771. shortened to regex.</p>
  772. <p>There&rsquo;s a lot of theory about what makes up a regular expression. Fortunately,
  773. very little of it matters to the short version that will let you get useful
  774. stuff done. The short version is that a regex is like using wildcards in the
  775. shell to match groups of files, but for text in general and with more magic.</p>
  776. <!-- exec -->
  777. <pre><code>$ grep 'Jo.*' ./authors_*
  778. ./authors_sff:Jo Walton
  779. ./authors_sff:John Ronald Reuel Tolkien
  780. ./authors_sff:John Brunner
  781. </code></pre>
  782. <!-- end -->
  783. <p>The pattern <code>Jo.*</code> says that we&rsquo;re looking for lines which contain a literal
  784. <code>Jo</code>, followed by any quantity (including none) of any character. In a regex,
  785. <code>.</code> means &ldquo;anything&rdquo; and <code>*</code> means &ldquo;any amount of the preceding thing&rdquo;.</p>
  786. <p><code>.</code> and <code>*</code> are magical. In the particular dialect of regexen understood
  787. by <code>grep</code>, other magical things include:</p>
  788. <table>
  789. <tr><td><code>^</code> </td> <td>start of a line </td></tr>
  790. <tr><td><code>$</code> </td> <td>end of a line </td></tr>
  791. <tr><td><code>[abc]</code></td> <td>one of a, b, or c </td></tr>
  792. <tr><td><code>[a-z]</code></td> <td>a character in the range a through z</td></tr>
  793. <tr><td><code>[0-9]</code></td> <td>a character in the range 0 through 9</td></tr>
  794. <tr><td><code>+</code> </td> <td>one or more of the preceding thing </td></tr>
  795. <tr><td><code>?</code> </td> <td>0 or 1 of the preceding thing </td></tr>
  796. <tr><td><code>*</code> </td> <td>any number of the preceding thing </td></tr>
  797. <tr><td><code>(foo|bar)</code></td> <td>"foo" or "bar"</td></tr>
  798. <tr><td><code>(foo)?</code></td> <td>optional "foo"</td></tr>
  799. </table>
  800. <p>It&rsquo;s actually a little more complicated than that: By default, if you want to
  801. use a lot of the magical characters, you have to prefix them with <code>\</code>. This is
  802. both ugly and confusing, so unless you&rsquo;re writing a very simple pattern, it&rsquo;s
  803. often easiest to call <code>grep -E</code>, for <strong>E</strong>xtended regular expressions, which
  804. means that lots of characters will have special meanings.</p>
  805. <p>Authors with 4-letter first names:</p>
  806. <!-- exec -->
  807. <pre><code>$ grep -iE '^[a-z]{4} ' ./authors_*
  808. ./authors_contemporary_fic:Eden Robinson
  809. ./authors_sff:John Ronald Reuel Tolkien
  810. ./authors_sff:John Brunner
  811. </code></pre>
  812. <!-- end -->
  813. <p>A count of authors named John:</p>
  814. <!-- exec -->
  815. <pre><code>$ grep -c '^John ' ./all_authors
  816. 2
  817. </code></pre>
  818. <!-- end -->
  819. <p>Lines in this file matching the words &ldquo;magic&rdquo; or &ldquo;magical&rdquo;:</p>
  820. <pre><code>$ grep -iE 'magic(al)?' ./index.md
  821. Pipes are some of the most important magic in the shell. When the people who
  822. shell to match groups of files, but with more magic.
  823. `.` and `*` are magical. In the particular dialect of regexen understood
  824. by `grep`, other magical things include:
  825. use a lot of the magical characters, you have to prefix them with `\`. This is
  826. Lines in this file matching the words "magic" or "magical":
  827. $ grep -iE 'magic(al)?' ./index.md
  828. </code></pre>
  829. <p>Find some &ldquo;-agic&rdquo; words in a big list of words:</p>
  830. <!-- exec -->
  831. <pre><code>$ grep -iE '(m|tr|pel)agic' /usr/share/dict/words
  832. magic
  833. magic's
  834. magical
  835. magically
  836. magician
  837. magician's
  838. magicians
  839. pelagic
  840. tragic
  841. tragically
  842. tragicomedies
  843. tragicomedy
  844. tragicomedy's
  845. </code></pre>
  846. <!-- end -->
  847. <p><code>grep</code> isn&rsquo;t the only - or even the most important - tool that makes use of
  848. regular expressions, but it&rsquo;s a good place to start because it&rsquo;s one of the
  849. fundamental building blocks for so many other operations. Filtering lists of
  850. things, matching patterns within collections, and writing concise descriptions
  851. of how text should be transformed are at the heart of a practical approach to
  852. Unix-like systems. Regexen turn out to be a seductively powerful way to do
  853. these things - so much so that they&rsquo;ve crept their way into text editors,
  854. databases, and full-featured programming languages.</p>
  855. <p>There&rsquo;s a dark side to all of this, for the truth about regular expressions is
  856. that they are ugly, inconsistent, brittle, and <em>incredibly</em> difficult to think
  857. clearly about. They take years to master and reward the wielder with great
  858. power, but they are also a trap: a temptation towards the path of cleverness
  859. masquerading as wisdom.</p>
  860. <p style="text-align:center;"></p>
  861. <p>I&rsquo;ll be returning to this theme, but for the time being let&rsquo;s move on. Now
  862. that we&rsquo;ve established, however haphazardly, some of the basics, let&rsquo;s consider
  863. their application to a real-world task.</p>
  864. <hr />
  865. <h1><a name=a-literary-problem href=#a-literary-problem>#</a> 2. a literary problem</h1>
  866. <p>The <a href="../literary_environment">previous chapter</a> introduced a bunch of tools
  867. using contrived examples. Now we&rsquo;ll look at a real problem, and work through a
  868. solution by building on tools we&rsquo;ve already covered.</p>
  869. <p>So on to the problem: I write poetry.</p>
  870. <p>{rimshot dot wav}</p>
  871. <p>Most of the poems I have written are not very good, but lately I&rsquo;ve been
  872. thinking that I&rsquo;d like to comb through the last ten years' worth and pull
  873. the least-embarrassing stuff into a single collection.</p>
  874. <p>I&rsquo;ve hinted at how the contents of my blog are stored as files, but let&rsquo;s take
  875. a look at the whole thing:</p>
  876. <pre><code>$ ls -F ~/p1k3/archives/
  877. 1997/ 2003/ 2009/ bones/ meta/
  878. 1998/ 2004/ 2010/ chapbook/ winfield/
  879. 1999/ 2005/ 2011/ cli/ wip/
  880. 2000/ 2006/ 2012/ colophon/
  881. 2001/ 2007/ 2013/ europe/
  882. 2002/ 2008/ 2014/ hack/
  883. </code></pre>
  884. <p>(<code>ls</code>, again, just lists files. <code>-F</code> tells it to append a character that shows
  885. it what type of file we&rsquo;re looking at, such as a trailing / for directories.
  886. <code>~</code> is a shorthand that means &ldquo;my home directory&rdquo;, which in this case is
  887. <code>/home/brennen</code>.)</p>
  888. <p>Each of the directories here holds other directories. The ones for each year
  889. have sub-directories for the months of the year, which in turn contain files
  890. for the days. The files are just little pieces of HTML and Markdown and some
  891. other stuff. Many years ago, before I really knew how to program, I wrote a
  892. script to glue them all together into a web page and serve them up to visitors.
  893. This sounds complicated, but all it really means is that if I want to write a
  894. blog entry, I just open a file and type some stuff. Here&rsquo;s an example for
  895. March 1st:</p>
  896. <!-- exec -->
  897. <pre><code>$ cat ~/p1k3/archives/2014/3/1
  898. &lt;h1&gt;Saturday, March 1&lt;/h1&gt;
  899. &lt;markdown&gt;
  900. Sometimes I'm going along on a Saturday morning, still a little dazed from the
  901. night before, and I think something like "I should just go write a detailed
  902. analysis of hooded sweatshirts". Mostly these thoughts don't survive contact
  903. with an actual keyboard. It's almost certainly for the best.
  904. &lt;/markdown&gt;
  905. </code></pre>
  906. <!-- end -->
  907. <p>And here&rsquo;s an older one that contains a short poem:</p>
  908. <!-- exec -->
  909. <pre><code>$ cat ~/p1k3/archives/2012/10/9
  910. &lt;h1&gt;tuesday, october 9&lt;/h1&gt;
  911. &lt;freeverse&gt;i am a stateful machine
  912. i exist in a manifold of consequence
  913. a clattering miscellany of impure functions
  914. and side effects&lt;/freeverse&gt;
  915. </code></pre>
  916. <!-- end -->
  917. <p>Notice that <code>&lt;freeverse&gt;</code> bit? It kind of looks like an HTML tag, but it&rsquo;s
  918. not. What it actually does is tell my blog script that it should format the
  919. text it contains like a poem. The specifics don&rsquo;t matter for our purposes
  920. (yet), but this convention is going to come in handy, because the first thing I
  921. want to do is get a list of all the entries that contain poems.</p>
  922. <p>Remember <code>grep</code>?</p>
  923. <pre><code>$ grep -ri '&lt;freeverse&gt;' ~/p1k3/archives &gt; ~/possible_poems
  924. </code></pre>
  925. <p>Let&rsquo;s step through this bit by bit:</p>
  926. <p>First, I&rsquo;m asking <code>grep</code> to search <strong>r</strong>ecursively, <strong>i</strong>gnoring case.
  927. &ldquo;Recursively&rdquo; just means that every time the program finds a directory, it
  928. should descend into that directory and search in any files there as well.</p>
  929. <pre><code>grep -ri
  930. </code></pre>
  931. <p>Next comes a pattern to search for. It&rsquo;s in single quotes because the
  932. characters <code>&lt;</code> and <code>&gt;</code> have a special meaning to the shell, and here we need
  933. the shell to understand that it should treat them as literal angle brackets
  934. instead.</p>
  935. <pre><code>'&lt;freeverse&gt;'
  936. </code></pre>
  937. <p>This is the path I want to search:</p>
  938. <pre><code>~/p1k3/archives
  939. </code></pre>
  940. <p>Finally, because there are so many entries to search, I know the process will
  941. be slow and produce a large list, so I tell the shell to redirect it to a file
  942. called <code>possible_poems</code> in my home directory:</p>
  943. <pre><code>&gt; ~/possible_poems
  944. </code></pre>
  945. <p>This is quite a few instances&hellip;</p>
  946. <pre><code>$ wc -l ~/possible_poems
  947. 679 /home/brennen/possible_poems
  948. </code></pre>
  949. <p>&hellip;and it&rsquo;s also not super-pretty to look at:</p>
  950. <pre><code>$ head -5 ~/possible_poems
  951. /home/brennen/p1k3/archives/2011/10/14:&lt;freeverse&gt;i've got this friend has a real knack
  952. /home/brennen/p1k3/archives/2011/4/25:&lt;freeverse&gt;i can't claim to strive for it
  953. /home/brennen/p1k3/archives/2011/8/10:&lt;freeverse&gt;one diminishes or becomes greater
  954. /home/brennen/p1k3/archives/2011/8/12:&lt;freeverse&gt;
  955. /home/brennen/p1k3/archives/2011/1/1:&lt;freeverse&gt;six years on
  956. </code></pre>
  957. <p>Still, it&rsquo;s a decent start. I can see paths to the files I have to check, and
  958. usually a first line. Since I use a fancy text editor, I can just go down the
  959. list opening each file in a new window and copying the stuff I&rsquo;m interested in
  960. to a new file.</p>
  961. <p>This is good enough for government work, but what if instead of jumping around
  962. between hundreds of files, I&rsquo;d rather read everything in one file and just weed
  963. out the bad ones as I go?</p>
  964. <pre><code>$ cat `grep -ril '&lt;freeverse&gt;' ~/p1k3/archives` &gt; ~/possible_poems_full
  965. </code></pre>
  966. <p>This probably bears some explaining. <code>grep</code> is still doing all the real work
  967. here. The main difference from before is that <code>-l</code> tells grep to just list any
  968. files it finds which contain a match.</p>
  969. <pre><code>`grep -ril '&lt;freeverse&gt;' ~/p1k3/archives`
  970. </code></pre>
  971. <p>Notice those backticks around the grep command? This part is a little
  972. trippier. It turns out that if you put backticks around something in a
  973. command, it&rsquo;ll get executed and replaced with its result, which in turn gets
  974. executed as part of the larger command. So what we&rsquo;re really saying is
  975. something like:</p>
  976. <pre><code>$ cat [all of the files in the blog directory with &lt;freeverse&gt; in them]
  977. </code></pre>
  978. <p>Did you catch that? I just wrote a command that rewrote itself as a
  979. <em>different</em>, more specific command. And it appears to have worked on the
  980. first try:</p>
  981. <pre><code>$ wc ~/possible_poems_full
  982. 17628 80980 528699 /home/brennen/possible_poems_full
  983. </code></pre>
  984. <p>Welcome to wizard school.</p>
  985. <hr />
  986. <h1><a name=programmerthink href=#programmerthink>#</a> 3. programmerthink</h1>
  987. <p>In the <a href="#a-literary-problem">preceding chapter</a>, I worked through accumulating
  988. a big piece of text from some other, smaller texts. I started with a bunch of
  989. files and wound up with one big file called <code>potential_poems_full</code>.</p>
  990. <p>Let&rsquo;s talk for a minute about how programmers approach problems like this one.
  991. What I&rsquo;ve just done is sort of an old-school humanities take on things:
  992. Metaphorically speaking, I took a book off the shelf and hauled it down to the
  993. copy machine to xerox a bunch of pages, and now I&rsquo;m going to start in on them
  994. with a highlighter and some Post-Its or something. A process like this will
  995. often trigger a cascade of questions in the programmer-mind:</p>
  996. <ul>
  997. <li>What if, halfway through the project, I realize my selection criteria were all
  998. wrong and have to backtrack?</li>
  999. <li>What if I discover corrections that also need to be made in the source documents?</li>
  1000. <li>What if I want to access metadata, like the original location of a file?</li>
  1001. <li>What if I want to quickly re-order the poems according to some new criteria?</li>
  1002. <li>Why am I storing the same text in two different places?</li>
  1003. </ul>
  1004. <p>A unifying theme of these questions is that they could all be answered by
  1005. involving a little more abstraction.</p>
  1006. <p style="text-align:center;"></p>
  1007. <p>Some kinds of abstraction are so common in the physical world that we can
  1008. forget they&rsquo;re part of a sophisticated technology. For example, a good deal of
  1009. bicycle maintenance can be accomplished with a cheap multi-tool containing a
  1010. few different sizes of hex wrench and a couple of screwdrivers.</p>
  1011. <p>A hex wrench or screwdriver doesn&rsquo;t really know anything about bicycles. All
  1012. it <em>really</em> knows about is fitting into a space and allowing torque to be
  1013. applied. Standardized fasteners and adjustment mechanisms on a bicycle ensure
  1014. that the work can be done anywhere, by anyone with a certain set of tools.
  1015. Standard tools mean that if you can work on a particular bike, you can work on
  1016. <em>most</em> bikes, and even on things that aren&rsquo;t bikes at all, but were designed by
  1017. people with the same abstractions in mind.</p>
  1018. <p>The relationship between a wrench, a bolt, and the purpose of a bolt is a lot
  1019. like something we call <em>indirection</em> in software. Programs like <code>grep</code> or
  1020. <code>cat</code> don&rsquo;t really know anything about poetry. All they <em>really</em> know about is
  1021. finding lines of text in input, or sticking inputs together. Files, lines, and
  1022. text are like standardized fasteners that allow a user who can work on one kind
  1023. of data (be it poetry, a list of authors, the source code of a program) to use
  1024. the same tools for other problems and other data.</p>
  1025. <p style="text-align:center;"></p>
  1026. <p>When I first started writing stuff on the web, I edited a page &ndash; a single HTML
  1027. file &ndash; by hand. When the entries on my nascent blog got old, I manually
  1028. cut-and-pasted them to archive files with names like <code>old_main97.html</code>, which
  1029. held all of the stuff I&rsquo;d written in 1997.</p>
  1030. <p>I&rsquo;m not holding this up as an example of youthful folly. In fact, it worked
  1031. fine, and just having a single, static file that you can open in any text
  1032. editor has turned out to be a <em>lot</em> more future-proof than the sophisticated
  1033. blogging software people were starting to write at the time.</p>
  1034. <p>And yet. Something about this habit nagged at my developing programmer mind
  1035. after a few years. It was just a little bit too manual and repetitive, a
  1036. little bit silly to have to write things like a table of contents by hand, or
  1037. move entries around by copy-and-pasting them to different files. Since I knew
  1038. the date for each entry, and wanted to make them navigable on that basis, why
  1039. not define a directory structure for the years and months, and then write a
  1040. file to hold each day? That way, all I&rsquo;d have to do is concatenate the files
  1041. in one directory to display any given month:</p>
  1042. <pre><code>$ cat ~/p1k3/archives/2014/1/* | head -10
  1043. &lt;h1&gt;Sunday, January 12&lt;/h1&gt;
  1044. &lt;h2&gt;the one casey is waiting for&lt;/h2&gt;
  1045. &lt;freeverse&gt;
  1046. after a while
  1047. the thing about drinking
  1048. is that it just feeds
  1049. what you drink to kill
  1050. and kills
  1051. </code></pre>
  1052. <p>I ultimately wound up writing a few thousand lines of Perl to do the actual
  1053. work, but the essential idea of the thing is still little more than invoking
  1054. <code>cat</code> on some stuff.</p>
  1055. <p>I didn&rsquo;t know the word for it at the time, but what I was reaching for was a
  1056. kind of indirection. By putting blog posts in a specific directory layout, I
  1057. was creating a simple model of the temporal structure that I considered their
  1058. most important property. Now, if I want to write commands that ask questions
  1059. about my blog posts or re-combine them in certain ways, I can address my
  1060. concerns to this model. Maybe, for example, I want a rough idea how many words
  1061. I&rsquo;ve written in blog posts so far in 2014:</p>
  1062. <pre><code>$ find ~/p1k3/archives/2014/ -type f | xargs cat | wc -w
  1063. 6677
  1064. </code></pre>
  1065. <p><code>xargs</code> is not the most intuitive command, but it&rsquo;s useful and common enough to
  1066. explain here. At the end of last chapter, when I said:</p>
  1067. <pre><code>$ cat `grep -ril '&lt;freeverse&gt;' ~/p1k3/archives` &gt; ~/possible_poems_full
  1068. </code></pre>
  1069. <p>I could also have written this as:</p>
  1070. <pre><code>$ grep -ril '&lt;freeverse&gt;' ~/p1k3/archives | xargs cat &gt; ~/possible_poems_full
  1071. </code></pre>
  1072. <p>What this does is take its input, which starts like:</p>
  1073. <pre><code>/home/brennen/p1k3/archives/2002/10/16
  1074. /home/brennen/p1k3/archives/2002/10/27
  1075. /home/brennen/p1k3/archives/2002/10/10
  1076. </code></pre>
  1077. <p>&hellip;and run <code>cat</code> on all the things in it:</p>
  1078. <pre><code>cat /home/brennen/p1k3/archives/2002/10/16 /home/brennen/p1k3/archives/2002/10/27 /home/brennen/p1k3/archives/2002/10/10 ...
  1079. </code></pre>
  1080. <p>It can be a better idea to use <code>xargs</code>, because while backticks are
  1081. incredibly useful, they have some limitations. If you&rsquo;re dealing with a very
  1082. large list of files, for example, you might exceed the maximum allowed length
  1083. for arguments to a command on your system. <code>xargs</code> is smart enough to know
  1084. that limit and run <code>cat</code> more than once if needed.</p>
  1085. <p><code>xargs</code> is actually sort of a pain to think about, and will make you jump
  1086. through some irritating hoops if you have spaces or other weirdness in your
  1087. filenames, but I wind up using it quite a bit.</p>
  1088. <p>Maybe I want to see a table of contents:</p>
  1089. <!-- exec -->
  1090. <pre><code>$ find ~/p1k3/archives/2014/ -type d | xargs ls -v | head -10
  1091. /home/brennen/p1k3/archives/2014/:
  1092. 1
  1093. 2
  1094. 3
  1095. 4
  1096. 5
  1097. /home/brennen/p1k3/archives/2014/1:
  1098. 5
  1099. 12
  1100. </code></pre>
  1101. <!-- end -->
  1102. <p>Or find the subtitles I used in 2013:</p>
  1103. <!-- exec -->
  1104. <pre><code>$ find ~/p1k3/archives/2012/ -type f | xargs perl -ne 'print "$1\n" if m{&lt;h2&gt;(.*?)&lt;/h2&gt;}'
  1105. pursuit
  1106. fragment
  1107. this poem again
  1108. i'll do better next time
  1109. timebinding animals
  1110. more observations on gear nerdery &amp;amp; utility fetishism
  1111. thrift
  1112. A miracle, in fact, means work
  1113. &lt;em&gt;technical notes for late october&lt;/em&gt;, or &lt;em&gt;it gets dork out earlier these days&lt;/em&gt;
  1114. radio
  1115. light enough to travel
  1116. 12:06am
  1117. "figures like Heinlein and Gingrich"
  1118. </code></pre>
  1119. <!-- end -->
  1120. <p>The crucial thing about this is that the filesystem <em>itself</em> is just like <code>cat</code>
  1121. and <code>grep</code>: It doesn&rsquo;t know anything about blogs (or poetry), and it&rsquo;s
  1122. basically indifferent to the actual <em>structure</em> of a file like
  1123. <code>~/p1k3/archives/2014/1/12</code>. What the filesystem knows is that there are files
  1124. with certain names in certain places. It need not know anything about the
  1125. <em>meaning</em> of those names in order to be useful; in fact, it&rsquo;s best if it stays
  1126. agnostic about the question, for this enables us to assign our own meaning to a
  1127. structure and manipulate that structure with standard tools.</p>
  1128. <p style="text-align:center;"></p>
  1129. <p>Back to the problem at hand: I have this collection of files, and I know how
  1130. to extract the ones that contain poems. My goal is to see all the poems and
  1131. collect the subset of them that I still find worthwhile. Just knowing how to
  1132. grep and then edit a big file solves my problem, in a basic sort of way. And
  1133. yet: Something about this nags at my mind. I find that, just as I can already
  1134. use standard tools and the filesystem to ask questions about all of my blog
  1135. posts in a given year or month, I would like to be able to ask questions about
  1136. the set of interesting poems.</p>
  1137. <p>If I want the freedom to execute many different sorts of commands against this
  1138. set of poems, it begins to seem that I need a model.</p>
  1139. <p>When programmers talk about models, they often mean something that people in
  1140. the sciences would recognize: We find ways to represent the arrangement of
  1141. facts so that we can think about them. A structured representation of things
  1142. often means that we can <em>change</em> those things, or at least derive new
  1143. understanding of them.</p>
  1144. <p style="text-align:center;"></p>
  1145. <p>At this point in the narrative, I could pretend that my next step is
  1146. immediately obvious, but in fact it&rsquo;s not. I spend a couple of days thinking
  1147. off and on about how to proceed, scribbling notes during bus rides and while
  1148. drinking beers at the pizza joint down the street. I assess and discard ideas
  1149. which fall into a handful of broad approaches:</p>
  1150. <ul>
  1151. <li>Store blog entries in a relational database system which would allow me to
  1152. associate them with data like &ldquo;this entry is in a collection called &lsquo;ok
  1153. poems&rsquo;&rdquo;.</li>
  1154. <li>Selectively build up a file containing the list of files with ok poems, and use
  1155. it to do other tasks.</li>
  1156. <li>Define a format for metadata that lives within entry files.</li>
  1157. <li>Turn each interesting file into a directory of its own which contains a file
  1158. with the original text and another file with metadata.</li>
  1159. </ul>
  1160. <p>I discard the relational database idea immediately: I like working with files,
  1161. and I don&rsquo;t feel like abandoning a model that&rsquo;s served me well for my entire
  1162. adult life.</p>
  1163. <p>Building up an index file to point at the other files I&rsquo;m working with has a
  1164. certain appeal. I&rsquo;m already most of the way there with the <code>grep</code> output in
  1165. <code>potential_poems</code>. It would be easy to write shell commands to add, remove,
  1166. sort, and search entries. Still, it doesn&rsquo;t feel like a very satisfying
  1167. solution unto itself. I&rsquo;d like to know that an entry is part of the collection
  1168. just by looking at the entry, without having to cross-reference it to a list
  1169. somewhere else.</p>
  1170. <p>What about putting some meaningful text in the file itself? I thought about
  1171. a bunch of different ways to do this, some of them really complicated, and
  1172. eventually arrived at this:</p>
  1173. <pre><code>&lt;!-- collection: ok-poems --&gt;
  1174. </code></pre>
  1175. <p>The <code>&lt;!-- --&gt;</code> bits are how you define a comment in HTML, which means that
  1176. neither my blog code nor web browsers nor my text editor have to know anything
  1177. about the format, but I can easily find files with certain values. Check it:</p>
  1178. <pre><code>$ find ~/p1k3/archives -type f | xargs perl -ne 'print "$ARGV[0]: $1 -&gt; $2\n" if m{&lt;!-- ([a-z]+): (.*?) --&gt;};'
  1179. /home/brennen/p1k3/archives/2014/2/9: collection -&gt; ok-poems
  1180. </code></pre>
  1181. <p>That&rsquo;s an ugly one-liner, and I haven&rsquo;t explained half of what it does, but the
  1182. comment format actually seems pretty workable for this. It&rsquo;s a little tacky to
  1183. look at, but it&rsquo;s simple and searchable.</p>
  1184. <p>Before we settle, though, let&rsquo;s turn to the notion of making each entry into a
  1185. directory that can contain some structured metadata in a separate file.
  1186. Imagine something like:</p>
  1187. <pre><code>$ ls ~/p1k3/archives/2013/2/9
  1188. index Meta
  1189. </code></pre>
  1190. <p>Here I use the name &ldquo;index&rdquo; for the main part of the entry because it&rsquo;s a
  1191. convention of web sites for the top-level page in a directory to be called
  1192. something like <code>index.html</code>. As it happens, my blog software already supports
  1193. this kind of file layout for entries which contain multiple parts, image files,
  1194. and so forth.</p>
  1195. <pre><code>$ head ~/p1k3/archives/2013/2/9/index
  1196. &lt;h1&gt;saturday, february 9&lt;/h1&gt;
  1197. &lt;freeverse&gt;
  1198. midwinter midafternoon; depressed as hell
  1199. sitting in a huge cabin in the rich-people mountains
  1200. writing a sprawl, pages, of melancholic midlife bullshit
  1201. outside the snow gives way to broken clouds and the
  1202. clear unyielding light of the high country sun fills
  1203. $ cat ~/p1k3/archives/2013/2/9/Meta
  1204. collection: ok-poems
  1205. </code></pre>
  1206. <p>It would then be easy to <code>find</code> files called <code>Meta</code> and grep them for
  1207. <code>collection: ok-poems</code>.</p>
  1208. <p>What if I put metadata right in the filename itself, and dispense with the grep
  1209. altogether?</p>
  1210. <pre><code>$ ls ~/p1k3/archives/2013/2/9
  1211. index meta-ok-poem
  1212. $ find ~/p1k3/archives -name 'meta-ok-poem'
  1213. /home/brennen/archives/2013/2/9/meta-ok-poem
  1214. </code></pre>
  1215. <p>There&rsquo;s a lot to like about this. For one thing, it&rsquo;s immediately visible in a
  1216. directory listing. For another, it doesn&rsquo;t require searching through thousands
  1217. of lines of text to extract a specific string. If a directory has a
  1218. <code>meta-ok-poem</code> in it, I can be pretty sure that it will contain an interesting
  1219. <code>index</code>.</p>
  1220. <p>What are the downsides? Well, it requires transforming lots of text files into
  1221. directories-containing-files. I might automate that process, but it&rsquo;s still a
  1222. little tedious and it makes the layout of the entry archive more complicated
  1223. overall. There&rsquo;s a cost to doing things this way. It lets me extend my
  1224. existing model of a blog entry to include arbitrary metadata, but it also adds
  1225. steps to writing or finding blog entries.</p>
  1226. <p>Abstractions usually cost you something. Is this one worth the hassle?
  1227. Sometimes the best way to answer that question is to start writing code that
  1228. handles a given abstraction.</p>
  1229. <hr />
  1230. <h1><a name=script href=#script>#</a> 4. script</h1>
  1231. <p>Back in chapter 1, I said that &ldquo;the way you use the computer is often just to write
  1232. little programs that invoke other programs&rdquo;. In fact, we&rsquo;ve already gone over a
  1233. bunch of these. Grepping through the text of a previous chapter should pull
  1234. up some good examples:</p>
  1235. <!-- exec -->
  1236. <pre><code>$ grep -E '[a-z]+.*\| ' ../literary_environment/index.md
  1237. $ sort authors_* | uniq -c
  1238. $ sort authors_* | uniq &gt; ./all_authors
  1239. the same thing as `cat all_authors | nl`, or `nl all_authors`. You won't see
  1240. $ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | grep total
  1241. $ sort authors_* | uniq | wc -l
  1242. $ sort colors | uniq -i | tail -1
  1243. $ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
  1244. $ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
  1245. $ sort -k1 all_authors.tsv | expand -t14
  1246. $ cut -f3 all_authors.tsv | grep .
  1247. $ paste firstnames lastnames | sort -k2 | expand -t12
  1248. $ cat ./authors_* | grep 'Vanessa'
  1249. </code></pre>
  1250. <!-- end -->
  1251. <p>None of these one-liners do all that much, but they all take input of one sort
  1252. or another and apply one or more transformations to it. They&rsquo;re little formal
  1253. sentences describing how to make one thing into another, which is as good a
  1254. definition of programming as most. Or at least this is a good way to describe
  1255. programming-in-the-small. (A lot of the programs we use day-to-day are more
  1256. like essays, novels, or interminable Fantasy series where every character you
  1257. like dies horribly than they are like individual sentences.)</p>
  1258. <p>One-liners like these are all well and good when you&rsquo;re staring at a terminal,
  1259. trying to figure something out - but what about when you&rsquo;ve already figured it out and
  1260. you want to repeat it in the future?</p>
  1261. <p>It turns out that Bash has you covered. Since shell commands are just text,
  1262. they can live in a text file as easily as they can be typed.</p>
  1263. <h2><a name=learn-you-an-editor href=#learn-you-an-editor>#</a> learn you an editor</h2>
  1264. <p>We&rsquo;ve skirted the topic so far, but now that we&rsquo;re talking about writing out
  1265. text files in earnest, you&rsquo;re going to want a text editor.</p>
  1266. <p>My editor is where I spend most of my time that isn&rsquo;t in a web browser, because
  1267. it&rsquo;s where I write both code and prose. It turns out that the features which
  1268. make a good code editor overlap a lot with the ones that make a good editor of
  1269. English sentences.</p>
  1270. <p>So what should you use? Well, there have been other contenders in recent
  1271. years, but in truth nothing comes close to dethroning the Great Old Ones of
  1272. text editing. Emacs is a creature both primal and sophisticated, like an
  1273. avatar of some interstellar civilization that evolved long before multicellular
  1274. life existed on earth and seeded the galaxy with incomprehensible artefacts and
  1275. colossal engineering projects. Vim is like a lovable chainsaw-studded robot
  1276. with the most elegant keyboard interface in history secretly emblazoned on its
  1277. shining diamond heart.</p>
  1278. <p>It&rsquo;s worth the time it takes to learn one of the serious editors, but there are
  1279. easier places to start. Nano, for example, is easy to pick up, and should be
  1280. available on most systems. To start it, just say:</p>
  1281. <pre><code>$ nano file
  1282. </code></pre>
  1283. <p>You should see something like this:</p>
  1284. <p style="text-align:center;"> <img src="images/nano.png" alt="nano" /></p>
  1285. <p>Arrow keys will move your cursor around, and typing stuff will make it appear
  1286. in the file. This is pretty much like every other editor you&rsquo;ve ever used. If
  1287. you haven&rsquo;t used Nano before, that stuff along the bottom of the terminal is a
  1288. reference to the most commonly used commands. <code>^</code> is a convention for &ldquo;Ctrl&rdquo;,
  1289. so <code>^O</code> means Ctrl-o (the case of the letter doesn&rsquo;t actually matter), which
  1290. will save the file you&rsquo;re working on. Ctrl-x will quit, which is probably the
  1291. first important thing to know about any given editor.</p>
  1292. <h2><a name=d-i-y-utilities href=#d-i-y-utilities>#</a> d.i.y. utilities</h2>
  1293. <p>So back to putting commands in text files. Here&rsquo;s a file I just created in
  1294. my editor:</p>
  1295. <!-- exec -->
  1296. <pre><code>$ cat okpoems
  1297. #!/bin/bash
  1298. # find all the marker files and get the name of
  1299. # the directory containing each
  1300. find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
  1301. exit 0
  1302. </code></pre>
  1303. <!-- end -->
  1304. <p>This is known as a script. There are a handful of things to notice here.
  1305. First, there&rsquo;s this fragment:</p>
  1306. <pre><code>#!/bin/bash
  1307. </code></pre>
  1308. <p>The <code>#!</code> right at the beginning, followed by the path to a program, is a
  1309. special sequence that lets the kernel know what program should be used to
  1310. interpret the contents of the file. <code>/bin/bash</code> is the path on the filesystem
  1311. where Bash itself lives. You might see this referred to as a shebang or a hash
  1312. bang.</p>
  1313. <p>Lines that start with a <code>#</code> are comments, used to describe the code to a human
  1314. reader. The <code>exit 0</code> tells Bash that the currently running script should exit
  1315. with a status of 0, which basically means &ldquo;nothing went wrong&rdquo;.</p>
  1316. <p>If you examine the directory listing for <code>okpoems</code>, you&rsquo;ll see something
  1317. important:</p>
  1318. <!-- exec -->
  1319. <pre><code>$ ls -l okpoems
  1320. -rwxrwxr-x 1 brennen brennen 163 Apr 19 00:08 okpoems
  1321. </code></pre>
  1322. <!-- end -->
  1323. <p>That looks pretty cryptic. For the moment, just remember that those little
  1324. <code>x</code>s in the first bit mean that the file has been marked e<strong>x</strong>ecutable. We
  1325. accomplish this by saying something like:</p>
  1326. <pre><code>$ chmod +x ./okpoems
  1327. </code></pre>
  1328. <p>Once that&rsquo;s done, it and the shebang line in combination mean that typing
  1329. <code>./okpoems</code> will have the same effect as typing <code>bash okpoems</code>:</p>
  1330. <!-- exec -->
  1331. <pre><code>$ ./okpoems
  1332. /home/brennen/p1k3/archives/2013/2/9
  1333. /home/brennen/p1k3/archives/2012/3/17
  1334. /home/brennen/p1k3/archives/2012/3/26
  1335. </code></pre>
  1336. <!-- end -->
  1337. <h2><a name=heavy-lifting href=#heavy-lifting>#</a> heavy lifting</h2>
  1338. <p><code>okpoems</code> demonstrates the basics, but it doesn&rsquo;t do very much. Here&rsquo;s
  1339. a script with a little more substance to it:</p>
  1340. <!-- exec -->
  1341. <pre><code>$ cat markpoem
  1342. #!/bin/bash
  1343. # $1 is the first parameter to our script
  1344. POEM=$1
  1345. # Complain and exit if we weren't given a path:
  1346. if [ ! $POEM ]; then
  1347. echo 'usage: markpoem &lt;path&gt;'
  1348. # Confusingly, an exit status of 0 means to the shell that everything went
  1349. # fine, while any other number means that something went wrong.
  1350. exit 64
  1351. fi
  1352. if [ ! -e $POEM ]; then
  1353. echo "$POEM not found"
  1354. exit 66
  1355. fi
  1356. echo "marking $POEM an ok poem"
  1357. POEM_BASENAME=$(basename $POEM)
  1358. # If the target is a plain file instead of a directory, make it into
  1359. # a directory and move the content into $POEM/index:
  1360. if [ -f $POEM ]; then
  1361. echo "making $POEM into a directory, moving content to"
  1362. echo " $POEM/index"
  1363. TEMPFILE="/tmp/$POEM_BASENAME.$(date +%s.%N)"
  1364. mv $POEM $TEMPFILE
  1365. mkdir $POEM
  1366. mv $TEMPFILE $POEM/index
  1367. fi
  1368. if [ -d $POEM ]; then
  1369. # touch(1) will either create the file or update its timestamp:
  1370. touch $POEM/meta-ok-poem
  1371. else
  1372. echo "something broke - why isn't $POEM a directory?"
  1373. file $POEM
  1374. fi
  1375. # Signal that all is copacetic:
  1376. echo kthxbai
  1377. exit 0
  1378. </code></pre>
  1379. <!-- end -->
  1380. <p>Both of these scripts are imperfect, but they were quick to write, they&rsquo;re made
  1381. out of standard commands, and I don&rsquo;t yet hate myself for them: All signs that
  1382. I&rsquo;m not totally on the wrong track with the <code>meta-ok-poem</code> abstraction, and
  1383. could live with it as part of an ongoing writing project. <code>okpoems</code> and
  1384. <code>markpoem</code> would also be easy to use with custom keybindings in my editor. In
  1385. a few more lines of code, I can build a system to wade through the list of
  1386. candidate files and quickly mark the interesting ones.</p>
  1387. <h2><a name=generality href=#generality>#</a> generality</h2>
  1388. <p>So what&rsquo;s lacking here? Well, probably a bunch of things, feature-wise. I can
  1389. imagine writing a script to unmark a poem, for example. That said, there&rsquo;s one
  1390. really glaring problem. &ldquo;Ok poem&rdquo; is only one kind of property a blog entry
  1391. might possess. Suppose I wanted a way to express that a poem is terrible?</p>
  1392. <p>It turns out I already know how to add properties to an entry. If I generalize
  1393. just a little, the tools become much more flexible.</p>
  1394. <!-- exec -->
  1395. <pre><code>$ ./addprop /home/brennen/p1k3/archives/2012/3/26 meta-terrible-poem
  1396. marking /home/brennen/p1k3/archives/2012/3/26 with meta-terrible-poem
  1397. kthxbai
  1398. </code></pre>
  1399. <!-- end -->
  1400. <!-- exec -->
  1401. <pre><code>$ ./findprop meta-terrible-poem
  1402. /home/brennen/p1k3/archives/2012/3/26
  1403. </code></pre>
  1404. <!-- end -->
  1405. <p><code>addprop</code> is only a little different from <code>markpoem</code>. It takes two parameters
  1406. instead of one - the target entry and a property to add.</p>
  1407. <!-- exec -->
  1408. <pre><code>$ cat addprop
  1409. #!/bin/bash
  1410. ENTRY=$1
  1411. PROPERTY=$2
  1412. # Complain and exit if we weren't given a path and a property:
  1413. if [[ ! $ENTRY || ! $PROPERTY ]]; then
  1414. echo "usage: addprop &lt;path&gt; &lt;property&gt;"
  1415. exit 64
  1416. fi
  1417. if [ ! -e $ENTRY ]; then
  1418. echo "$ENTRY not found"
  1419. exit 66
  1420. fi
  1421. echo "marking $ENTRY with $PROPERTY"
  1422. # If the target is a plain file instead of a directory, make it into
  1423. # a directory and move the content into $ENTRY/index:
  1424. if [ -f $ENTRY ]; then
  1425. echo "making $ENTRY into a directory, moving content to"
  1426. echo " $ENTRY/index"
  1427. # Get a safe temporary file:
  1428. TEMPFILE=`mktemp`
  1429. mv $ENTRY $TEMPFILE
  1430. mkdir $ENTRY
  1431. mv $TEMPFILE $ENTRY/index
  1432. fi
  1433. if [ -d $ENTRY ]; then
  1434. touch $ENTRY/$PROPERTY
  1435. else
  1436. echo "something broke - why isn't $ENTRY a directory?"
  1437. file $ENTRY
  1438. fi
  1439. echo kthxbai
  1440. exit 0
  1441. </code></pre>
  1442. <!-- end -->
  1443. <p>Meanwhile, <code>findprop</code> is more or less <code>okpoems</code>, but with a parameter for the
  1444. property to find:</p>
  1445. <!-- exec -->
  1446. <pre><code>$ cat findprop
  1447. #!/bin/bash
  1448. if [ ! $1 ]
  1449. then
  1450. echo "usage: findprop &lt;property&gt;"
  1451. exit
  1452. fi
  1453. # find all the marker files and get the name of
  1454. # the directory containing each
  1455. find ~/p1k3/archives -name $1 | xargs -n1 dirname
  1456. exit 0
  1457. </code></pre>
  1458. <!-- end -->
  1459. <p>These scripts aren&rsquo;t much more complicated than their poem-specific
  1460. counterparts, but now they can be used to solve problems I haven&rsquo;t even thought
  1461. of yet, and included in other scripts that need their functionality.</p>
  1462. <hr />
  1463. <h1><a name=general-purpose-programmering href=#general-purpose-programmering>#</a> 5. general purpose programmering</h1>
  1464. <p>I didn&rsquo;t set out to write a book about programming, <em>as such</em>, but because
  1465. programming and the command line are so inextricably linked, this text
  1466. draws near the subject almost of its own accord.</p>
  1467. <p>If you&rsquo;re not terribly interested in programming, this chapter can easily
  1468. enough be skipped. It&rsquo;s more in the way of philosophical rambling than
  1469. concrete instruction, and will be of most use to those with an existing
  1470. background in writing code.</p>
  1471. <p style="text-align:center;"> *</p>
  1472. <p>If you&rsquo;ve used computers for more than a few years, you&rsquo;re probably viscerally
  1473. aware that most software is fragile and most systems decay. In the time since
  1474. I took my first tentative steps into the little world of a computer (a friend&rsquo;s
  1475. dad&rsquo;s unidentifiable gaming machine, my own father&rsquo;s blue monochrome Zenith
  1476. laptop, the Apple II) the churn has been overwhelming. By now I&rsquo;ve learned my
  1477. way around vastly more software &ndash; operating systems, programming languages and
  1478. development environments, games, editors, chat clients, mail systems &ndash; than I
  1479. presently could use if I wanted to. Most of it has gone the way of some
  1480. ancient civilization, surviving (if at all) only in faint, half-understood
  1481. cultural echoes and occasional museum-piece displays. Every user of technology
  1482. becomes, in time, a refugee from an irretrievably recent past.</p>
  1483. <p>And yet, despite all this, the shell endures. Most of the ideas in this book
  1484. are older than I am. Most of them could have been applied in 1994 or
  1485. thereabouts, when I first logged on to multiuser systems running AT&amp;T Unix.
  1486. Since the early 1990s, systems built on a fundamental substrate of Unix-like
  1487. behavior and abstractions have proliferated wildly, becoming foundational at
  1488. once to the modern web, the ecosystem of free and open software, and the
  1489. technological dominance ca. 2014 of companies like Apple, Google, and Facebook.</p>
  1490. <p>Why is this, exactly?</p>
  1491. <p style="text-align:center;"> *</p>
  1492. <p>As I&rsquo;ve said (and hopefully shown), the commands you write in your shell
  1493. are essentially little programs. Like other programs, they can be stored
  1494. for later use and recombined with other commands, creating new uses for
  1495. your ideas.</p>
  1496. <p>It would be hard to say that there&rsquo;s any <em>one</em> reason command line environments
  1497. remain so vital after decades of evolution and hard-won refinement in computer
  1498. interfaces, but it seems like this combinatory nature is somewhere near the
  1499. heart of it. The command line often lacks the polish of other interfaces we
  1500. depend on, but in exchange it offers a richness and freedom of expression
  1501. rarely seen elsewhere, and invites its users to build upon its basic
  1502. facilities.</p>
  1503. <p>What is it that makes last chapter&rsquo;s <code>addprop</code> preferable to the more specific
  1504. <code>markpoem</code>? Let&rsquo;s look at an alternative implementation of <code>markpoem</code>:</p>
  1505. <!-- exec -->
  1506. <pre><code>$ cat simple_markpoem
  1507. #!/bin/bash
  1508. addprop $1 meta-ok-poem
  1509. </code></pre>
  1510. <!-- end -->
  1511. <p>Is this script trivial? Absolutely. It&rsquo;s so trivial that it barely seems to
  1512. exist, because I already wrote <code>addprop</code> to do all the heavy lifting and play
  1513. well with others, freeing us to imagine new uses for its central idea without
  1514. worrying about the implementation details.</p>
  1515. <p>Unlike <code>markpoem</code>, <code>addprop</code> doesn&rsquo;t know anything about poetry. All it knows
  1516. about, in fact, is putting a file (or three) in a particular place. And this
  1517. is in keeping with a basic insight of Unix: Pieces of software that do one
  1518. very simple thing generalize well. Good command line tools are like a hex
  1519. wrench, a hammer, a utility knife: They embody knowledge of turning, of
  1520. striking, of cutting &ndash; and with this kind of knowledge at hand, the user can
  1521. change the world even though no individual tool is made with complete knowledge
  1522. of the world as a whole. There&rsquo;s a lot of power in the accumulation of small
  1523. competencies.</p>
  1524. <p>Of course, if your code is only good at one thing, to be of any use, it has to
  1525. talk to code that&rsquo;s good at other things. There&rsquo;s another basic insight in the
  1526. Unix tradition: Tools should be composable. All those little programs have to
  1527. share some assumptions, have to speak some kind of trade language, in order to
  1528. combine usefully. Which is how we&rsquo;ve arrived at standard IO, pipelines,
  1529. filesystems, and text as as a lowest-common-denominator medium of exchange. If
  1530. you think about most of these things, they have some very rough edges, but they
  1531. give otherwise simple tools ways to communicate without becoming
  1532. super-complicated along the way.</p>
  1533. <p style="text-align:center;"> *</p>
  1534. <p>What is the command line?</p>
  1535. <p>The command line is an environment of tool use.</p>
  1536. <p>So are kitchens, workshops, libraries, and programming languages.</p>
  1537. <p style="text-align:center;"> *</p>
  1538. <p>Here&rsquo;s a confession: I don&rsquo;t like writing shell scripts very much, and I
  1539. can&rsquo;t blame anyone else for feeling the same way.</p>
  1540. <p>That doesn&rsquo;t mean you shouldn&rsquo;t <em>know</em> about them, or that you shouldn&rsquo;t
  1541. <em>write</em> them. I write little ones all the time, and the ability to puzzle
  1542. through other people&rsquo;s scripts comes in handy. Oftentimes, the best, most
  1543. tasteful way to automate something is to build a script out of the commonly
  1544. available commands. The standard tools are already there on millions of
  1545. machines. Many of them have been pretty well understood for a generation, and
  1546. most will probably be around for a generation or three to come. They do neat
  1547. stuff. Scripts let you build on ideas you&rsquo;ve already worked out, and give
  1548. repeatable operations a memorable, user-friendly name. They encourage reuse of
  1549. existing programs, and help express your ideas to people who&rsquo;ll come after you.</p>
  1550. <p>One of the reliable markers of powerful software is that it can be scripted: It
  1551. extends to its users some of the same power that its authors used in creating
  1552. it. Scriptable software is to some extent <em>living</em> software. It&rsquo;s a book that
  1553. you, the reader, get to help write.</p>
  1554. <p>In all these ways, shell scripts are wonderful, a little bit magical, and
  1555. quietly indispensable to the machinery of modern civilization.</p>
  1556. <p>Unfortunately, in all the ways that a shell like Bash is weird, finicky, and
  1557. covered in 40 years of incidental cruft, long-form Bash scripts are even worse.
  1558. Bash is a useful glue language, particularly if you&rsquo;re already comfortable
  1559. wiring commands together. Syntactic and conceptual innovations like pipes are
  1560. beautiful and necessary. What Bash is <em>not</em>, despite its power, is a very good
  1561. general purpose programming language. It&rsquo;s just not especially good at things
  1562. like math, or complex data structures, or not looking like a punctuation-heavy
  1563. variety of alphabet soup.</p>
  1564. <p>It turns out that there&rsquo;s a threshold of complexity beyond which life becomes
  1565. easier if you switch from shell scripting to a more robust language. Just
  1566. where this threshold is located varies a lot between users and problems, but I
  1567. often think about switching languages before a script gets bigger than I can
  1568. view on my screen all at once. <code>addprop</code> is a good example:</p>
  1569. <!-- exec -->
  1570. <pre><code>$ wc -l ../script/addprop
  1571. 41 ../script/addprop
  1572. </code></pre>
  1573. <!-- end -->
  1574. <p>41 lines is a touch over what fits on one screen in the editor I usually use.
  1575. If I were going to add much in the way of features, I&rsquo;d think pretty hard about
  1576. porting it to another language first.</p>
  1577. <p>What&rsquo;s cool is that if you know a language like C, Python, Perl, Ruby, PHP, or
  1578. JavaScript, your code can participate in the shell environment as a first class
  1579. citizen simply by respecting the conventions of standard IO, files, and command
  1580. line arguments. Often, in order to create a useful utility, it&rsquo;s only
  1581. necessary to deal with <code>STDIN</code>, or operate on a particular sort of file, and
  1582. most languages offer simple conventions for doing these things.</p>
  1583. <p style="text-align:center;"> *</p>
  1584. <p>I think the shell can be taught and understood as a humane environment, despite
  1585. all of its ugliness and complication, because it offers the materials of its
  1586. own construction to its users, whatever their concerns. The writer, the
  1587. philosopher, the scientist, the programmer: Files and text and pipes know
  1588. little enough about these things, but in their very indifference to the
  1589. specifics of any one complex purpose, they&rsquo;re adaptable to the basic needs of
  1590. many. Simple utilities which enact simple kinds of knowledge survive and
  1591. recombine because there is a wisdom to be found in small things.</p>
  1592. <p>Files and text know nothing about poetry, nothing in particular of the human
  1593. soul. Neither do pen and ink, printing presses or codex books, but somehow we
  1594. got Shakespeare and Montaigne.</p>
  1595. <hr />
  1596. <h1><a name=one-of-these-things-is-not-like-the-others href=#one-of-these-things-is-not-like-the-others>#</a> 6. one of these things is not like the others</h1>
  1597. <p>If you&rsquo;re the sort of person who took a few detours into the history of
  1598. religion in college, you might be familiar with some of the ways people used to
  1599. do textual comparison. When pen, paper, and typesetting were what scholars had
  1600. to work with, they did some surprisingly sophisticated things in order to
  1601. expose the relationships between multiple pieces of text.</p>
  1602. <p>{photo: some textual criticism tools}</p>
  1603. <p>Here&rsquo;s a book I got in college: <em>Gospel Parallels: A Comparison of the
  1604. Synoptic Gospels</em>, by Burton H. Throckmorton, Jr. It breaks up three books
  1605. from the Bible by the stories and themes that they contain, and shows the
  1606. overlapping sections of each book that contain parallel texts. You can work
  1607. your way through and see what parts only show up in one book, or in two but not
  1608. the other, or in all three. These kinds of tools support all sorts of
  1609. theoretical stuff about which books copied each other and how, and what other
  1610. sources they might have copied that we&rsquo;ve since lost.</p>
  1611. <p>This is some <em>incredibly</em> dry material, even if you kind of dig thinking about
  1612. questions like how and when an important religious book was written and
  1613. compiled. It takes a special temperament to actually sit poring over
  1614. fragmentary texts in ancient languages and do these painstaking comparisons.
  1615. Even if you&rsquo;re a writer or editor and work with a lot of revisions of a text,
  1616. there&rsquo;s a good chance you rarely do this kind of comparison on your own work,
  1617. because that shit is <em>tedious</em>.</p>
  1618. <h2><a name=diff href=#diff>#</a> diff</h2>
  1619. <p>It turns out that academics aren&rsquo;t the only people who need tools for comparing
  1620. different versions of a text. Working programmers, in fact, need to do this
  1621. <em>constantly</em>. Programmers are also happiest when putting off the <em>actual</em> task
  1622. at hand to solve some incidental problem that cropped up along the way, so by
  1623. now there are a lot of ways to say &ldquo;here&rsquo;s how this file is different from this
  1624. file&rdquo;, or &ldquo;here&rsquo;s how this file is different from itself a year ago&rdquo;.</p>
  1625. <p>Let&rsquo;s look at a couple of shell scripts from an earlier chapter:</p>
  1626. <!-- exec -->
  1627. <pre><code>$ cat ../script/okpoems
  1628. #!/bin/bash
  1629. # find all the marker files and get the name of
  1630. # the directory containing each
  1631. find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
  1632. exit 0
  1633. </code></pre>
  1634. <!-- end -->
  1635. <!-- exec -->
  1636. <pre><code>$ cat ../script/findprop
  1637. #!/bin/bash
  1638. if [ ! $1 ]
  1639. then
  1640. echo "usage: findprop &lt;property&gt;"
  1641. exit
  1642. fi
  1643. # find all the marker files and get the name of
  1644. # the directory containing each
  1645. find ~/p1k3/archives -name $1 | xargs -n1 dirname
  1646. exit 0
  1647. </code></pre>
  1648. <!-- end -->
  1649. <p>It&rsquo;s pretty obvious these are similar files, but do we know what <em>exactly</em>
  1650. changed between them at a glance? It wouldn&rsquo;t be hard to figure out, once. If
  1651. you wanted to be really certain about it, you could print them out, set them
  1652. side by side, and go over them with a highlighter.</p>
  1653. <p>Now imagine doing that for a bunch of files, some of them hundreds or even
  1654. thousands of lines long. I&rsquo;ve actually done that before, but I didn&rsquo;t feel
  1655. smart while I was doing it. This is a job for software.</p>
  1656. <!-- exec -->
  1657. <pre><code>$ diff ../script/okpoems ../script/findprop
  1658. 2a3,8
  1659. &gt; if [ ! $1 ]
  1660. &gt; then
  1661. &gt; echo "usage: findprop &lt;property&gt;"
  1662. &gt; exit
  1663. &gt; fi
  1664. &gt;
  1665. 5c11
  1666. &lt; find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
  1667. ---
  1668. &gt; find ~/p1k3/archives -name $1 | xargs -n1 dirname
  1669. </code></pre>
  1670. <!-- end -->
  1671. <p>That&rsquo;s not the most human-friendly output, but it&rsquo;s a little simpler than it
  1672. seems at first glance. It&rsquo;s basically just a way of describing the changes
  1673. needed to turn <code>okpoems</code> into <code>findprop</code>. The string <code>2a3,8</code> can be read as
  1674. &ldquo;at line 2, add lines 3 through 8&rdquo;. Lines with a <code>&gt;</code> in front of them are
  1675. added. <code>5c11</code> can be read as &ldquo;line 5 in the original file becomes line 11 in
  1676. the new file&rdquo;, and the <code>&lt;</code> line is replaced with the <code>&gt;</code> line. If you wanted,
  1677. you could take a copy of the original file and apply these instructions by hand
  1678. in your text editor, and you&rsquo;d wind up with the new file.</p>
  1679. <p>A lot of people (me included) prefer what&rsquo;s known as a &ldquo;unified&rdquo; diff, because
  1680. it&rsquo;s easier to read and offers context for the changed lines. We can ask for
  1681. one of these with <code>diff -u</code>:</p>
  1682. <!-- exec -->
  1683. <pre><code>$ diff -u ../script/okpoems ../script/findprop
  1684. --- ../script/okpoems 2014-04-19 00:08:03.321230818 -0600
  1685. +++ ../script/findprop 2014-04-21 21:51:29.360846449 -0600
  1686. @@ -1,7 +1,13 @@
  1687. #!/bin/bash
  1688. +if [ ! $1 ]
  1689. +then
  1690. + echo "usage: findprop &lt;property&gt;"
  1691. + exit
  1692. +fi
  1693. +
  1694. # find all the marker files and get the name of
  1695. # the directory containing each
  1696. -find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
  1697. +find ~/p1k3/archives -name $1 | xargs -n1 dirname
  1698. exit 0
  1699. </code></pre>
  1700. <!-- end -->
  1701. <p>That&rsquo;s a little longer, and has some metadata we might not always care about,
  1702. but if you look for lines starting with <code>+</code> and <code>-</code>, it&rsquo;s easy to read as
  1703. &ldquo;added these, took away these&rdquo;. This diff tells us at a glance that we added
  1704. some lines to complain if we didn&rsquo;t get a command line argument, and replaced
  1705. <code>'meta-ok-poem'</code> in the <code>find</code> command with that argument. Since it shows us
  1706. some context, we have a pretty good idea where those lines are in the file
  1707. and what they&rsquo;re for.</p>
  1708. <p>What if we don&rsquo;t care exactly <em>how</em> the files differ, but only whether they
  1709. do?</p>
  1710. <!-- exec -->
  1711. <pre><code>$ diff -q ../script/okpoems ../script/findprop
  1712. Files ../script/okpoems and ../script/findprop differ
  1713. </code></pre>
  1714. <!-- end -->
  1715. <p>I use <code>diff</code> a lot in the course of my day job, because I spend a lot of time
  1716. needing to know just how two programs differ. Just as importantly, I often
  1717. need to know how (or whether!) the <em>output</em> of programs differs. As a concrete
  1718. example, I want to make sure that <code>findprop meta-ok-poem</code> is really a suitable
  1719. replacement for <code>okpoems</code>. Since I expect their output to be identical, I can
  1720. do this:</p>
  1721. <!-- exec -->
  1722. <pre><code>$ ../script/okpoems &gt; okpoem_output
  1723. </code></pre>
  1724. <!-- end -->
  1725. <!-- exec -->
  1726. <pre><code>$ ../script/findprop meta-ok-poem &gt; findprop_output
  1727. </code></pre>
  1728. <!-- end -->
  1729. <!-- exec -->
  1730. <pre><code>$ diff -s okpoem_output findprop_output
  1731. Files okpoem_output and findprop_output are identical
  1732. </code></pre>
  1733. <!-- end -->
  1734. <p>The <code>-s</code> just means that <code>diff</code> should explicitly tell us if files are the
  1735. <strong>s</strong>ame. Otherwise, it&rsquo;d output nothing at all, because there aren&rsquo;t any
  1736. differences.</p>
  1737. <p>As with many other tools, <code>diff</code> doesn&rsquo;t very much care whether it&rsquo;s looking at
  1738. shell scripts or a list of filenames or what-have-you. If you read the man
  1739. page, you&rsquo;ll find some features geared towards people writing C-like
  1740. programming languages, but its real specialty is just text files with lines
  1741. made out of characters, which works well for lots of code, but certainly could
  1742. be applied to English prose.</p>
  1743. <p>Since I have a couple of versions ready to hand, let&rsquo;s apply this to a text
  1744. with some well-known variations and a bit of a literary legacy. Here&rsquo;s the
  1745. first day of the Genesis creation narrative in a couple of English
  1746. translations:</p>
  1747. <!-- exec -->
  1748. <pre><code>$ cat genesis_nkj
  1749. In the beginning God created the heavens and the earth. The earth was without
  1750. form, and void; and darkness was on the face of the deep. And the Spirit of
  1751. God was hovering over the face of the waters. Then God said, "Let there be
  1752. light"; and there was light. And God saw the light, that it was good; and God
  1753. divided the light from the darkness. God called the light Day, and the darkness
  1754. He called Night. So the evening and the morning were the first day.
  1755. </code></pre>
  1756. <!-- end -->
  1757. <!-- exec -->
  1758. <pre><code>$ cat genesis_nrsv
  1759. In the beginning when God created the heavens and the earth, the earth was a
  1760. formless void and darkness covered the face of the deep, while a wind from
  1761. God swept over the face of the waters. Then God said, "Let there be light";
  1762. and there was light. And God saw that the light was good; and God separated
  1763. the light from the darkness. God called the light Day, and the darkness he
  1764. called Night. And there was evening and there was morning, the first day.
  1765. </code></pre>
  1766. <!-- end -->
  1767. <p>What happens if we diff them?</p>
  1768. <!-- exec -->
  1769. <pre><code>$ diff -u genesis_nkj genesis_nrsv
  1770. --- genesis_nkj 2014-05-11 16:28:29.692508461 -0600
  1771. +++ genesis_nrsv 2014-05-11 16:28:29.744508459 -0600
  1772. @@ -1,6 +1,6 @@
  1773. -In the beginning God created the heavens and the earth. The earth was without
  1774. -form, and void; and darkness was on the face of the deep. And the Spirit of
  1775. -God was hovering over the face of the waters. Then God said, "Let there be
  1776. -light"; and there was light. And God saw the light, that it was good; and God
  1777. -divided the light from the darkness. God called the light Day, and the darkness
  1778. -He called Night. So the evening and the morning were the first day.
  1779. +In the beginning when God created the heavens and the earth, the earth was a
  1780. +formless void and darkness covered the face of the deep, while a wind from
  1781. +God swept over the face of the waters. Then God said, "Let there be light";
  1782. +and there was light. And God saw that the light was good; and God separated
  1783. +the light from the darkness. God called the light Day, and the darkness he
  1784. +called Night. And there was evening and there was morning, the first day.
  1785. </code></pre>
  1786. <!-- end -->
  1787. <p>Kind of useless, right? If a given line differs by so much as a character,
  1788. it&rsquo;s not the same line. This highlights the limitations of <code>diff</code> for comparing
  1789. things that</p>
  1790. <ul>
  1791. <li>aren&rsquo;t logically grouped by line</li>
  1792. <li>aren&rsquo;t easily thought of as versions of the same text with some lines changed</li>
  1793. </ul>
  1794. <p>We could edit the files into a more logically defined structure, like
  1795. one-line-per-verse, and try again:</p>
  1796. <!-- exec -->
  1797. <pre><code>$ diff -u genesis_nkj_by_verse genesis_nrsv_by_verse
  1798. --- genesis_nkj_by_verse 2014-05-11 16:51:14.312457198 -0600
  1799. +++ genesis_nrsv_by_verse 2014-05-11 16:53:02.484453134 -0600
  1800. @@ -1,5 +1,5 @@
  1801. -In the beginning God created the heavens and the earth.
  1802. -The earth was without form, and void; and darkness was on the face of the deep. And the Spirit of God was hovering over the face of the waters.
  1803. +In the beginning when God created the heavens and the earth,
  1804. +the earth was a formless void and darkness covered the face of the deep, while a wind from God swept over the face of the waters.
  1805. Then God said, "Let there be light"; and there was light.
  1806. -And God saw the light, that it was good; and God divided the light from the darkness.
  1807. -God called the light Day, and the darkness He called Night. So the evening and the morning were the first day.
  1808. +And God saw that the light was good; and God separated the light from the darkness.
  1809. +God called the light Day, and the darkness he called Night. And there was evening and there was morning, the first day.
  1810. </code></pre>
  1811. <!-- end -->
  1812. <p>It might be a little more descriptive, but editing all that text just for a
  1813. quick comparison felt suspiciously like work, and anyway the output still
  1814. doesn&rsquo;t seem very useful.</p>
  1815. <h2><a name=wdiff href=#wdiff>#</a> wdiff</h2>
  1816. <p>For cases like this, I&rsquo;m fond of a tool called <code>wdiff</code>:</p>
  1817. <!-- exec -->
  1818. <pre><code>$ wdiff genesis_nkj genesis_nrsv
  1819. In the beginning {+when+} God created the heavens and the [-earth. The-] {+earth, the+} earth was [-without
  1820. form, and void;-] {+a
  1821. formless void+} and darkness [-was on-] {+covered+} the face of the [-deep. And the Spirit of-] {+deep, while a wind from+}
  1822. God [-was hovering-] {+swept+} over the face of the waters. Then God said, "Let there be light";
  1823. and there was light. And God saw [-the light,-] that [-it-] {+the light+} was good; and God
  1824. [-divided-] {+separated+}
  1825. the light from the darkness. God called the light Day, and the darkness
  1826. [-He-] {+he+}
  1827. called Night. [-So the-] {+And there was+} evening and [-the morning were-] {+there was morning,+} the first day.
  1828. </code></pre>
  1829. <!-- end -->
  1830. <p>Deleted words are surrounded by <code>[- -]</code> and inserted ones by <code>{+ +}</code>. You can
  1831. even ask it to spit out HTML tags for insertion and deletion&hellip;</p>
  1832. <pre><code>$ wdiff -w '&lt;del&gt;' -x '&lt;/del&gt;' -y '&lt;ins&gt;' -z '&lt;/ins&gt;' genesis_nkj genesis_nrsv
  1833. </code></pre>
  1834. <p>&hellip;and come up with something your browser will render like this:</p>
  1835. <blockquote>
  1836. <p>In the beginning <ins>when</ins> God created the heavens and the <del>earth. The</del> <ins>earth, the</ins> earth was <del>without
  1837. form, and void;</del> <ins>a
  1838. formless void</ins> and darkness <del>was on</del> <ins>covered</ins> the face of the <del>deep. And the Spirit of</del> <ins>deep, while a wind from</ins>
  1839. God <del>was hovering</del> <ins>swept</ins> over the face of the waters. Then God said, "Let there be light";
  1840. and there was light. And God saw <del>the light,</del> that <del>it</del> <ins>the light</ins> was good; and God
  1841. <del>divided</del> <ins>separated</ins>
  1842. the light from the darkness. God called the light Day, and the darkness
  1843. <del>He</del> <ins>he</ins>
  1844. called Night. <del>So the</del> <ins>And there was</ins> evening and <del>the morning were</del> <ins>there was morning,</ins> the first day.</p>
  1845. </blockquote>
  1846. <p>Burton H. Throckmorton, Jr. this ain&rsquo;t. Still, it has its uses.</p>
  1847. <hr />
  1848. <h1><a name=the-internet-for-humans-and-how-the-command-line-can-help href=#the-internet-for-humans-and-how-the-command-line-can-help>#</a> 7. the internet for humans, and how the command line can help</h1>
  1849. <p>Web browsers are really complicated these days. They&rsquo;re full of rendering
  1850. engines, audio and video players, programming languages, development tools,
  1851. databases &ndash; you name it, and there&rsquo;s a fair chance it&rsquo;s in there somewhere.
  1852. The modern web browser is kitchen sink software, and to make matters worse, it
  1853. is <em>totally surrounded</em> by technobabble. It can take <em>years</em> to come to terms
  1854. with the ocean of words about web stuff and sort out the meaningful ones from
  1855. the snake oil and bureaucratic mysticism.</p>
  1856. <p>All of which can make the web itself seem like a really complicated landscape,
  1857. and obscure the simplicity of its basic design, which is this:</p>
  1858. <p>Some programs pass text files around to one another.</p>
  1859. <p>It&rsquo;s more complicated than that, of course, but the gist of it is that the web
  1860. is made out of URLs, &ldquo;Uniform Resource Locators&rdquo;, which are paths to things.
  1861. If you squint, these look kind of like paths to files on your filesystem.</p>
  1862. <p>Let&rsquo;s illustrate this. I&rsquo;ve written a really simple web page that lives at
  1863. <code>http://p1k3.com/hello_world.html</code>.</p>
  1864. <h2><a name=reading-the-web href=#reading-the-web>#</a> reading the web</h2>
  1865. <p>{to come}</p>
  1866. <h2><a name=writing-the-web href=#writing-the-web>#</a> writing the web</h2>
  1867. <p>{to come}</p>
  1868. <hr />
  1869. <h1><a name=further-reading href=#further-reading>#</a> 8. further reading</h1>
  1870. <ul>
  1871. <li><em>The Unix Programming Environment</em> - Brian W. Kernighan, Rob Pike</li>
  1872. <li><a href="https://www.youtube.com/watch?v=tc4ROCJYbm0">AT&amp;T Archives: The UNIX Operating System</a> (YouTube)</li>
  1873. </ul>
  1874. <hr />
  1875. <script>
  1876. $(document).ready(function () {
  1877. // ☜ ☝ ☞ ☟ ☆ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪
  1878. var closed_sigil = 'show';
  1879. var open_sigil = 'hide';
  1880. var togglesigil = function (elem) {
  1881. var sigil = $(elem).html();
  1882. if (sigil === closed_sigil) {
  1883. $(elem).html(open_sigil);
  1884. } else {
  1885. $(elem).html(closed_sigil);
  1886. }
  1887. };
  1888. $(".details").each(function () {
  1889. var $this = $(this);
  1890. var $button = $('<button class=clicker-button>' + closed_sigil + '</button>');
  1891. var $details_full = $(this).find('.full');
  1892. $button.click(function (e) {
  1893. e.preventDefault();
  1894. $details_full.toggle({
  1895. duration: 550
  1896. });
  1897. togglesigil(this);
  1898. });
  1899. $(this).find('.clicker').append($button);
  1900. $button.show();
  1901. });
  1902. $('.details .full').hide();
  1903. });
  1904. </script>
  1905. </body>
  1906. </html>