A book about the command line for humans.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3183 lines
112 KiB

<!DOCTYPE html>
<html lang=en>
<meta charset="utf-8">
<title>userland: a book about the command line for humans</title>
<link rel=stylesheet href="userland.css" />
<link rel="alternate" type="application/atom+xml" title="changes" href="//p1k3.com/userland-book/feed.xml" />
<script src="js/jquery.js" type="text/javascript"></script>
<h1 class=bigtitle>userland</h1>
<hr />
<h1><a name=a-book-about-the-command-line-for-humans href=#a-book-about-the-command-line-for-humans>#</a> a book about the command line for humans</h1>
<p>In the fall of 2013, <a href="//p1k3.com/2013/8/4">thinking about</a> text utilities got
me thinking in turn about how my writing habits depend on the Linux command
line. This seems like a good hook for explaining some tools I use every day,
so now I&rsquo;m writing a short, haphazard book.</p>
<p>This isn&rsquo;t a book about system administration, writing complex software, or
becoming a wizard. I am not a wizard, and I don&rsquo;t subscribe to the idea that
wizardry is required to use these tools. In fact, I barely know what I&rsquo;m doing
most of the time. I still get some stuff done.</p>
<p>This is a work in progress. It probably gets some stuff wrong.</p>
<p>&ndash; bpb / <a href="https://p1k3.com">p1k3</a> / <a href="https://twitter.com/brennen">@brennen</a></p>
<div class=details>
<h2 class=clicker><a name=contents href=#contents>#</a> contents</h2>
<div class=full>
<div class=contents><ul>
<li><a href="#a-book-about-the-command-line-for-humans">a book about the command line for humans</a>
<li><a href="#contents">contents</a></li>
<li><a href="#get-you-a-shell">0. get you a shell</a>
<li><a href="#get-an-account-on-a-social-unix-server">get an account on a social unix server</a></li>
<li><a href="#use-a-raspberry-pi-or-beaglebone">use a raspberry pi or beaglebone</a></li>
<li><a href="#use-a-virtual-machine">use a virtual machine</a></li>
<li><a href="#the-command-line-as-literary-environment">1. the command line as literary environment</a>
<li><a href="#terms-and-definitions">terms and definitions</a></li>
<li><a href="#twisty-little-passages">twisty little passages</a></li>
<li><a href="#cat">cat</a></li>
<li><a href="#wildcards">wildcards</a></li>
<li><a href="#sort">sort</a></li>
<li><a href="#options">options</a></li>
<li><a href="#uniq">uniq</a></li>
<li><a href="#standard-IO">standard IO</a></li>
<li><a href="#code-help-code-and-man-pages"><code>&ndash;help</code> and man pages</a></li>
<li><a href="#wc">wc</a></li>
<li><a href="#head-tail-and-cut">head, tail, and cut</a></li>
<li><a href="#tab-separated-values">tab separated values</a></li>
<li><a href="#finding-text-grep">finding text: grep</a></li>
<li><a href="#now-you-have-n-problems">now you have n problems</a></li>
<li><a href="#a-literary-problem">2. a literary problem</a></li>
<li><a href="#programmerthink">3. programmerthink</a></li>
<li><a href="#script">4. script</a>
<li><a href="#learn-you-an-editor">learn you an editor</a></li>
<li><a href="#d-i-y-utilities">d.i.y. utilities</a></li>
<li><a href="#heavy-lifting">heavy lifting</a></li>
<li><a href="#generality">generality</a></li>
<li><a href="#general-purpose-programmering">5. general purpose programmering</a></li>
<li><a href="#one-of-these-things-is-not-like-the-others">6. one of these things is not like the others</a>
<li><a href="#diff">diff</a></li>
<li><a href="#wdiff">wdiff</a></li>
<li><a href="#the-command-line-as-as-a-shared-world">7. the command line as as a shared world</a></li>
<li><a href="#the-command-line-and-the-web">8. the command line and the web</a></li>
<li><a href="#a-miscellany-of-tools-and-techniques">9. a miscellany of tools and techniques</a>
<li><a href="#dict">dict</a></li>
<li><a href="#aspell">aspell</a></li>
<li><a href="#mostcommon">mostcommon</a></li>
<li><a href="#cal-and-ncal">cal and ncal</a></li>
<li><a href="#seq">seq</a></li>
<li><a href="#shuf">shuf</a></li>
<li><a href="#ptx">ptx</a></li>
<li><a href="#figlet">figlet</a></li>
<li><a href="#cowsay">cowsay</a></li>
<li><a href="#endmatter">endmatter</a>
<li><a href="#further-reading">further reading</a></li>
<li><a href="#code">code</a></li>
<li><a href="#copying">copying</a></li>
<hr />
<h1><a name=get-you-a-shell href=#get-you-a-shell>#</a> 0. get you a shell</h1>
<p>You don&rsquo;t have to have a shell at hand to get something out of this book.
Still, as with most practical subjects, you&rsquo;ll learn more if you try things out
as you go. You shouldn&rsquo;t feel guilty about skipping this section. It will
always be here later if you need it.</p>
<p>Not so long ago, it was common for schools and ISPs to hand out shell accounts
on big shared systems. People learned the command line as a side effect of
reading their e-mail.</p>
<p>That doesn&rsquo;t happen as often now, but in the meanwhile computers have become
relatively cheap and free software is abundant. If you&rsquo;re reading this on the
web, you can probably get access to a shell. Some options follow.</p>
<h2><a name=get-an-account-on-a-social-unix-server href=#get-an-account-on-a-social-unix-server>#</a> get an account on a social unix server</h2>
<p>Check out <a href="https://tilde.town/">tilde.town</a>:</p>
<blockquote><p>tilde.town is an intentional digital community for making art, socializing, and
learning. Unlike many online spaces, users interact with tilde.town through a
direct connection instead of a web site. This means using a tool called ssh and
other text based tools.</p></blockquote>
<h2><a name=use-a-raspberry-pi-or-beaglebone href=#use-a-raspberry-pi-or-beaglebone>#</a> use a raspberry pi or beaglebone</h2>
<p>Do you have a single-board computer laying around? Perfect. If you already
run the standard Raspbian, Debian on a BeagleBone, or a similar-enough Linux,
you don&rsquo;t need much else. I wrote most of this text on a Raspberry Pi, and the
example commands should all work there.</p>
<h2><a name=use-a-virtual-machine href=#use-a-virtual-machine>#</a> use a virtual machine</h2>
<p>A few options:</p>
<li><a href="https://docs.vagrantup.com/v2/getting-started/index.html">Use Vagrant to spin up a machine in Virtualbox</a></li>
<li><a href="https://www.digitalocean.com/community/tutorials/how-to-create-your-first-digitalocean-droplet-virtual-server">Use DigitalOcean to create a remotely-hosted VM running Linux</a></li>
<hr />
<h1><a name=the-command-line-as-literary-environment href=#the-command-line-as-literary-environment>#</a> 1. the command line as literary environment</h1>
<p>There&rsquo;re a lot of ways to structure an introduction to the command line. I&rsquo;m
going to start with writing as a point of departure because, aside from web
development, it&rsquo;s what I use a computer for most. I want to shine a light on
the humane potential of ideas that are usually understood as nerd trivia.
Computers have utterly transformed the practice of writing within the space of
my lifetime, but it seems to me that writers as a class miss out on many of the
software tools and patterns taken as a given in more &ldquo;technical&rdquo; fields.</p>
<p>Writing, particularly writing of any real scope or complexity, is very much a
technical task. It makes demands, both physical and psychological, of its
practitioners. As with woodworkers, graphic artists, and farmers, writers
exhibit strong preferences in their tools, materials, and environment, and they
do so because they&rsquo;re engaged in a physically and cognitively challenging task.</p>
<p>My thesis is that the modern Linux command line is a pretty good environment
for working with English prose and prosody, and that maybe this will illuminate
the ways it could be useful in your own work with a computer, whatever that
work happens to be.</p>
<h2><a name=terms-and-definitions href=#terms-and-definitions>#</a> terms and definitions</h2>
<p>What software are we actually talking about when we say &ldquo;the command line&rdquo;?</p>
<p>For the purposes of this discussion, we&rsquo;re talking about an environment built
on a very old paradigm called Unix.</p>
<p style="text-align:center;"> <img src="images/jp_unix.jpg" height=320 width=470></p>
<p>&hellip;except what classical Unix really looks like is this:</p>
<p style="text-align:center;"> <img src="images/blinking.gif" width=470></p>
<p>The Unix-like environment we&rsquo;re going to use isn&rsquo;t very classical, really.
It&rsquo;s an operating system kernel called Linux, combined with a bunch of things
written by other people (people in the GNU and Debian projects, and many
others). Purists will tell you that this isn&rsquo;t properly Unix at all. In
strict historical terms they&rsquo;re right, or at least a certain kind of right, but
for the purposes of my cultural agenda I&rsquo;m going to ignore them right now.</p>
<p style="text-align:center;"> <img src="images/debian.png"></p>
<p>This is what&rsquo;s called a shell. There are many different shells, but they
pretty much all operate on the same idea: You navigate a filesystem and run
programs by typing commands. Commands can be combined in various ways to make
programs of their own, and in fact the way you use the computer is often just
to write little programs that invoke other programs, turtles-all-the-way-down
<p>The standard shell these days is something called Bash, so we&rsquo;ll use Bash.
It&rsquo;s what you&rsquo;ll most often see in the wild. Like most shells, Bash is ugly
and stupid in more ways than it is possible to easily summarize. It&rsquo;s also an
incredibly powerful and expressive piece of software.</p>
<h2><a name=twisty-little-passages href=#twisty-little-passages>#</a> twisty little passages</h2>
<p>Have you ever played a text-based adventure game or MUD, of the kind that
describes a setting and takes commands for movement and so on? Readers of a
certain age and temperament might recognize the opening of Crowther &amp; Woods'
<em>Adventure</em>, the great-granddaddy of text adventure games:</p>
&gt; GO EAST
<p>You can think of the shell as a kind of environment you inhabit, in much the
way your character inhabits an adventure game. The difference is that instead
of navigating around virtual rooms and hallways with commands like <code>LOOK</code> and
<code>EAST</code>, you navigate between directories by typing commands like <code>ls</code> and <code>cd
<pre><code>$ ls
code Downloads notes p1k3 photos scraps userland-book
$ cd notes
$ ls
notes.txt sparkfun TODO.txt
<p><code>ls</code> lists files. Some files are directories, which means they can contain
other files, and you can step inside of them by typing <code>cd</code> (for <strong>c</strong>hange
<p>In the Macintosh and Windows world, directories have been called
&ldquo;folders&rdquo; for a long time now. This isn&rsquo;t the <em>worst</em> metaphor for what&rsquo;s
going on, and it&rsquo;s so pervasive by now that it&rsquo;s not worth fighting about.
It&rsquo;s also not exactly a <em>great</em> metaphor, since computer filesystems aren&rsquo;t
built very much like the filing cabinets of yore. A directory acts a lot like
a container of some sort, but it&rsquo;s an infinitely expandable one which may
contain nested sub-spaces much larger than itself. Directories are frequently
like the TARDIS: Bigger on the inside.</p>
<h2><a name=cat href=#cat>#</a> cat</h2>
<p>When you&rsquo;re in the shell, you have many tools at your disposal - programs that
can be used on many different files, or chained together with other programs.
They tend to have weird, cryptic names, but a lot of them do very simple
things. Tasks that might be a menu item in a big program like Word, like
counting the number of words in a document or finding a particular phrase, are
often programs unto themselves. We&rsquo;ll start with something even more basic
than that.</p>
<p>Suppose you have some files, and you&rsquo;re curious what&rsquo;s in them. For example,
suppose you&rsquo;ve got a list of authors you&rsquo;re planning to reference, and you just
want to check its contents real quick-like. This is where our friend <code>cat</code>
comes in:</p>
<!-- exec -->
<pre><code>$ cat authors_sff
Ursula K. Le Guin
Jo Walton
Pat Cadigan
John Ronald Reuel Tolkien
Vanessa Veselka
James Tiptree, Jr.
John Brunner
<!-- end -->
<p>&ldquo;Why,&rdquo; you might be asking, &ldquo;is the command to dump out the contents of a file
to a screen called <code>cat</code>? What do felines have to do with anything?&rdquo;</p>
<p>It turns out that <code>cat</code> is actually short for &ldquo;catenate&rdquo;, which is a long
word basically meaning &ldquo;stick things together&rdquo;. In programming, we usually
refer to sticking two bits of text together as &ldquo;string concatenation&rdquo;, probably
because programmers like to feel like they&rsquo;re being very precise about very
simple actions.</p>
<p>Suppose you wanted to see the contents of a <em>set</em> of author lists:</p>
<!-- exec -->
<pre><code>$ cat authors_sff authors_contemporary_fic authors_nat_hist
Ursula K. Le Guin
Jo Walton
Pat Cadigan
John Ronald Reuel Tolkien
Vanessa Veselka
James Tiptree, Jr.
John Brunner
Eden Robinson
Vanessa Veselka
Miriam Toews
Gwendolyn L. Waring
<!-- end -->
<h2><a name=wildcards href=#wildcards>#</a> wildcards</h2>
<p>We&rsquo;re working with three filenames: <code>authors_sff</code>, <code>authors_contemporary_fic</code>,
and <code>authors_nat_hist</code>. That&rsquo;s an awful lot of typing every time we want to do
something to all three files. Fortunately, our shell offers a shorthand for
&ldquo;all the files that start with <code>authors_</code>&rdquo;:</p>
<!-- exec -->
<pre><code>$ cat authors_*
Eden Robinson
Vanessa Veselka
Miriam Toews
Gwendolyn L. Waring
Ursula K. Le Guin
Jo Walton
Pat Cadigan
John Ronald Reuel Tolkien
Vanessa Veselka
James Tiptree, Jr.
John Brunner
<!-- end -->
<p>In Bash-land, <code>*</code> basically means &ldquo;anything&rdquo;, and is known in the vernacular,
somewhat poetically, as a &ldquo;wildcard&rdquo;. You should always be careful with
wildcards, especially if you&rsquo;re doing anything destructive. They can and will
surprise the unwary. Still, once you&rsquo;re used to the idea, they will save you a
lot of RSI.</p>
<h2><a name=sort href=#sort>#</a> sort</h2>
<p>There&rsquo;s a problem here. Our author list is out of order, and thus confusing to
reference. Fortunately, since one of the most basic things you can do to a
list is to sort it, someone else has already solved this problem for us.
Here&rsquo;s a command that will give us some organization:</p>
<!-- exec -->
<pre><code>$ sort authors_*
Eden Robinson
Gwendolyn L. Waring
James Tiptree, Jr.
John Brunner
John Ronald Reuel Tolkien
Jo Walton
Miriam Toews
Pat Cadigan
Ursula K. Le Guin
Vanessa Veselka
Vanessa Veselka
<!-- end -->
<p>Does it bother you that they aren&rsquo;t sorted by last name? Me too. As a partial
solution, we can ask <code>sort</code> to use the second &ldquo;field&rdquo; in each line as its sort
<strong>k</strong>ey (by default, sort treats whitespace as a division between fields):</p>
<!-- exec -->
<pre><code>$ sort -k2 authors_*
John Brunner
Pat Cadigan
Ursula K. Le Guin
Gwendolyn L. Waring
Eden Robinson
John Ronald Reuel Tolkien
James Tiptree, Jr.
Miriam Toews
Vanessa Veselka
Vanessa Veselka
Jo Walton
<!-- end -->
<p>That&rsquo;s closer, right? It sorted on &ldquo;Cadigan&rdquo; and &ldquo;Veselka&rdquo; instead of &ldquo;Pat&rdquo;
and &ldquo;Vanessa&rdquo;. (Of course, it&rsquo;s still far from perfect, because the
second field in each line isn&rsquo;t necessarily the person&rsquo;s last name.)</p>
<h2><a name=options href=#options>#</a> options</h2>
<p>Above, when we wanted to ask <code>sort</code> to behave differently, we gave it what is
known as an option. Most programs with command-line interfaces will allow
their behavior to be changed by adding various options. Options usually
(but not always!) look like <code>-o</code> or <code>--option</code>.</p>
<p>For example, if we wanted to see just the unique lines, irrespective of case,
for a file called colors:</p>
<!-- exec -->
<pre><code>$ cat colors
<!-- end -->
<p>We could write this:</p>
<!-- exec -->
<pre><code>$ sort -uf colors
<!-- end -->
<p>Here <code>-u</code> stands for <strong>u</strong>nique and <code>-f</code> stands for <strong>f</strong>old case, which means
to treat upper- and lower-case letters as the same for comparison purposes. You&rsquo;ll
often see a group of short options following the <code>-</code> like this.</p>
<h2><a name=uniq href=#uniq>#</a> uniq</h2>
<p>Did you notice how Vanessa Veselka shows up twice in our list of authors?
That&rsquo;s useful if we want to remember that she&rsquo;s in more than one category, but
it&rsquo;s redundant if we&rsquo;re just worried about membership in the overall set of
authors. We can make sure our list doesn&rsquo;t contain repeating lines by using
<code>sort</code>, just like with that list of colors:</p>
<!-- exec -->
<pre><code>$ sort -u -k2 authors_*
John Brunner
Pat Cadigan
Ursula K. Le Guin
Gwendolyn L. Waring
Eden Robinson
John Ronald Reuel Tolkien
James Tiptree, Jr.
Miriam Toews
Vanessa Veselka
Jo Walton
<!-- end -->
<p>But there&rsquo;s another approach to this &mdash; <code>sort</code> is good at only displaying a line
once, but suppose we wanted to see a count of how many different lists an
author shows up on? <code>sort</code> doesn&rsquo;t do that, but a command called <code>uniq</code> does,
if you give it the option <code>-c</code> for <strong>c</strong>ount.</p>
<p><code>uniq</code> moves through the lines in its input, and if it sees a line more than
once in sequence, it will only print that line once. If you have a bunch of
files and you just want to see the unique lines across all of those files, you
probably need to run them through <code>sort</code> first. How do you do that?</p>
<!-- exec -->
<pre><code>$ sort authors_* | uniq -c
1 Eden Robinson
1 Gwendolyn L. Waring
1 James Tiptree, Jr.
1 John Brunner
1 John Ronald Reuel Tolkien
1 Jo Walton
1 Miriam Toews
1 Pat Cadigan
1 Ursula K. Le Guin
2 Vanessa Veselka
<!-- end -->
<h2><a name=standard-IO href=#standard-IO>#</a> standard IO</h2>
<p>The <code>|</code> is called a &ldquo;pipe&rdquo;. In the command above, it tells your shell that
instead of printing the output of <code>sort authors_*</code> right to your terminal, it
should send it to <code>uniq -c</code>.</p>
<p style="text-align:center;"> <img src="images/pipe.gif"></p>
<p>Pipes are some of the most important magic in the shell. When the people who
built Unix in the first place give interviews about the stuff they remember
from the early days, a lot of them reminisce about the invention of pipes and
all of the new stuff it immediately made possible.</p>
<p>Pipes help you control a thing called &ldquo;standard IO&rdquo;. In the world of the
command line, programs take <strong>i</strong>nput and produce <strong>o</strong>utput. A pipe is a way
to hook the output from one program to the input of another.</p>
<p>Unlike a lot of the weirdly named things you&rsquo;ll encounter in software, the
metaphor here is obvious and makes pretty good sense. It even kind of looks
like a physical pipe.</p>
<p>What if, instead of sending the output of one program to the input of another,
you&rsquo;d like to store it in a file for later use?</p>
<p>Check it out:</p>
<!-- exec -->
<pre><code>$ sort authors_* | uniq &gt; ./all_authors
<!-- end -->
<!-- exec -->
<pre><code>$ cat all_authors
Eden Robinson
Gwendolyn L. Waring
James Tiptree, Jr.
John Brunner
John Ronald Reuel Tolkien
Jo Walton
Miriam Toews
Pat Cadigan
Ursula K. Le Guin
Vanessa Veselka
<!-- end -->
<p>I like to think of the <code>&gt;</code> as looking like a little funnel. It can be
dangerous &mdash; you should always make sure that you&rsquo;re not going to clobber
an existing file you actually want to keep.</p>
<p>If you want to tack more stuff on to the end of an existing file, you can use
<code>&gt;&gt;</code> instead. To test that, let&rsquo;s use <code>echo</code>, which prints out whatever string
you give it on a line by itself:</p>
<!-- exec -->
<pre><code>$ echo 'hello' &gt; hello_world
<!-- end -->
<!-- exec -->
<pre><code>$ echo 'world' &gt;&gt; hello_world
<!-- end -->
<!-- exec -->
<pre><code>$ cat hello_world
<!-- end -->
<p>You can also take a file and pull it directly back into the input of a given
program, which is a bit like a funnel going the other direction:</p>
<!-- exec -->
<pre><code>$ nl &lt; all_authors
1 Eden Robinson
2 Gwendolyn L. Waring
3 James Tiptree, Jr.
4 John Brunner
5 John Ronald Reuel Tolkien
6 Jo Walton
7 Miriam Toews
8 Pat Cadigan
9 Ursula K. Le Guin
10 Vanessa Veselka
<!-- end -->
<p><code>nl</code> is just a way to <strong>n</strong>umber <strong>l</strong>ines. This command accomplishes pretty much
the same thing as <code>cat all_authors | nl</code>, or <code>nl all_authors</code>. You won&rsquo;t see
it used as often as <code>|</code> and <code>&gt;</code>, since most utilities can read files on their
own, but it can save you typing <code>cat</code> quite as often.</p>
<p>We&rsquo;ll use these features liberally from here on out.</p>
<h2><a name=code-help-code-and-man-pages href=#code-help-code-and-man-pages>#</a> <code>--help</code> and man pages</h2>
<p>You can change the behavior of most tools by giving them different options.
This is all well and good if you already know what options are available,
but what if you don&rsquo;t?</p>
<p>Often, you can ask the tool itself:</p>
<pre><code>$ sort --help
Usage: sort [OPTION]... [FILE]...
or: sort [OPTION]... --files0-from=F
Write sorted concatenation of all FILE(s) to standard output.
Mandatory arguments to long options are mandatory for short options too.
Ordering options:
-b, --ignore-leading-blanks ignore leading blanks
-d, --dictionary-order consider only blanks and alphanumeric characters
-f, --ignore-case fold lower case to upper case characters
-g, --general-numeric-sort compare according to general numerical value
-i, --ignore-nonprinting consider only printable characters
-M, --month-sort compare (unknown) &lt; 'JAN' &lt; ... &lt; 'DEC'
-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)
-n, --numeric-sort compare according to string numerical value
-R, --random-sort sort by random hash of keys
--random-source=FILE get random bytes from FILE
-r, --reverse reverse the result of comparisons
<p>&hellip;and so on. (It goes on for a while in this vein.)</p>
<p>If that doesn&rsquo;t work, or doesn&rsquo;t provide enough info, the next thing to try is
called a man page. (&ldquo;man&rdquo; is short for &ldquo;manual&rdquo;. It&rsquo;s sort of an unfortunate
<pre><code>$ man sort
SORT(1) User Commands SORT(1)
sort - sort lines of text files
sort [OPTION]... [FILE]...
sort [OPTION]... --files0-from=F
Write sorted concatenation of all FILE(s) to standard output.
<p>&hellip;and so on. Manual pages vary in quality, and it can take a while to get
used to reading them, but they&rsquo;re very often the best place to look for help.</p>
<p>If you&rsquo;re not sure what <em>program</em> you want to use to solve a given problem, you
might try searching all the man pages on the system for a keyword. <code>man</code>
itself has an option to let you do this - <code>man -k keyword</code> - but most systems
also have a shortcut called <code>apropos</code>, which I like to use because it&rsquo;s easy to
remember if you imagine yourself saying &ldquo;apropos of [some problem I have]&hellip;&rdquo;</p>
<!-- exec -->
<pre><code>$ apropos -s1 sort
apt-sortpkgs (1) - Utility to sort package index files
bunzip2 (1) - a block-sorting file compressor, v1.0.6
bzip2 (1) - a block-sorting file compressor, v1.0.6
comm (1) - compare two sorted files line by line
sort (1) - sort lines of text files
tsort (1) - perform topological sort
<!-- end -->
<p>It&rsquo;s useful to know that the manual represented by <code>man</code> has numbered sections
for different kinds of manual pages. Most of what the average user needs to
know about lives in section 1, &ldquo;User Commands&rdquo;, so you&rsquo;ll often see the names
of different tools written like <code>sort(1)</code> or <code>cat(1)</code>. This can be a good way
to make it clear in writing that you&rsquo;re talking about a specific piece of
software rather than a verb or a small carnivorous mammal. (I specified <code>-s1</code>
for section 1 above just to cut down on clutter, though in practice I usually
don&rsquo;t bother.)</p>
<p>Like other literary traditions, Unix is littered with this sort of convention.
This one just happens to date from a time when the manual was still a physical
<h2><a name=wc href=#wc>#</a> wc</h2>
<p><code>wc</code> stands for <strong>w</strong>ord <strong>c</strong>ount. It does about what you&rsquo;d expect - it
counts the number of words in its input.</p>
<pre><code>$ wc index.md
736 4117 24944 index.md
<p>736 is the number of lines, 4117 the number of words, and 24944 the number of
characters in the file I&rsquo;m writing right now. I use this constantly. Most
obviously, it&rsquo;s a good way to get an idea of how much you&rsquo;ve written. <code>wc</code> is
the tool I used to track my progress the last time I tried National Novel
Writing Month:</p>
<pre><code>$ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | tail -1
6585 total
<!-- exec -->
<pre><code>$ cowsay 'embarrassing.'
&lt; embarrassing. &gt;
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
<!-- end -->
<p>Anyway. The less obvious thing about <code>wc</code> is that you can use it to count the
output of other commands. Want to know <em>how many</em> unique authors we have?</p>
<!-- exec -->
<pre><code>$ sort authors_* | uniq | wc -l
<!-- end -->
<p>This kind of thing is trivial, but it comes in handy more often than you might
<h2><a name=head-tail-and-cut href=#head-tail-and-cut>#</a> head, tail, and cut</h2>
<p>Remember our old pal <code>cat</code>, which just splats everything it&rsquo;s given back to
standard output?</p>
<p>Sometimes you&rsquo;ve got a piece of output that&rsquo;s more than you actually want to
deal with at once. Maybe you just want to glance at the first few lines in a
<!-- exec -->
<pre><code>$ head -3 colors
<!-- end -->
<p>&hellip;or maybe you want to see the last thing in a list:</p>
<!-- exec -->
<pre><code>$ sort colors | uniq -i | tail -1
<!-- end -->
<p>&hellip;or maybe you&rsquo;re only interested in the first &ldquo;field&rdquo; in some list. You might
use <code>cut</code> here, asking it to treat spaces as delimiters between fields and
return only the first field for each line of its input:</p>
<!-- exec -->
<pre><code>$ cut -d' ' -f1 ./authors_*
<!-- end -->
<p>Suppose we&rsquo;re curious what the few most commonly occurring first names on our
author list are? Here&rsquo;s an approach, silly but effective, that combines a lot
of what we&rsquo;ve discussed so far and looks like plenty of one-liners I wind up
writing in real life:</p>
<!-- exec -->
<pre><code>$ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
1 Ursula
2 John
2 Vanessa
<!-- end -->
<p>Let&rsquo;s walk through this one step by step:</p>
<p>First, we have <code>cut</code> extract the first field of each line in our author lists.</p>
<pre><code>cut -d' ' -f1 ./authors_*
<p>Then we sort these results</p>
<pre><code>| sort
<p>and pass them to <code>uniq</code>, asking it for a case-insensitive count of each
repeated line</p>
<pre><code>| uniq -ci
<p>then sort again, numerically,</p>
<pre><code>| sort -n
<p>and finally, we chop off everything but the last three lines:</p>
<pre><code>| tail -3
<p>If you wanted to make sure to count an individual author&rsquo;s first name
only once, even if that author appears more than once in the files,
you could instead do:</p>
<!-- exec -->
<pre><code>$ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
1 Ursula
1 Vanessa
2 John
<!-- end -->
<h2><a name=tab-separated-values href=#tab-separated-values>#</a> tab separated values</h2>
<p>Notice above how we had to tell <code>cut</code> that &ldquo;fields&rdquo; in <code>authors_*</code> are
delimited by spaces? It turns out that if you don&rsquo;t use <code>-d</code>, <code>cut</code> defaults
to using tab characters for a delimiter.</p>
<p>Tab characters are sort of weird little animals. You can&rsquo;t usually <em>see</em> them
directly &mdash; they&rsquo;re like a space character that takes up more than one space
when displayed. By convention, one tab is usually rendered as 8 spaces, but
it&rsquo;s up to the software that&rsquo;s displaying the character what it wants to do.</p>
<p>(In fact, it&rsquo;s more complicated than that: Tabs are often rendered as marking
<em>tab stops</em>, which is a concept I remember from 7th grade typing classes, but
haven&rsquo;t actually thought about in my day-to-day life for nearly 20 years.)</p>
<p>Here&rsquo;s a version of our <code>all_authors</code> that&rsquo;s been rearranged so that the first
field is the author&rsquo;s last name, the second is their first name, the third is
their middle name or initial (if we know it) and the fourth is any suffix.
Fields are separated by a single tab character:</p>
<!-- exec -->
<pre><code>$ cat all_authors.tsv
Robinson Eden
Waring Gwendolyn L.
Tiptree James Jr.
Brunner John
Tolkien John Ronald Reuel
Walton Jo
Toews Miriam
Cadigan Pat
Le Guin Ursula K.
Veselka Vanessa
<!-- end -->
<p>That looks kind of garbled, right? In order to make it a little more obvious
what&rsquo;s happening, let&rsquo;s use <code>cat -T</code>, which displays tab characters as <code>^I</code>:</p>
<!-- exec -->
<pre><code>$ cat -T all_authors.tsv
Tolkien^IJohn^IRonald Reuel
Le Guin^IUrsula^IK.
<!-- end -->
<p>It looks odd when displayed because some names are at or nearly at 8 characters long.
&ldquo;Robinson&rdquo;, at 8 characters, overshoots the first tab stop, so &ldquo;Eden&rdquo; gets indented
further than other first names, and so on.</p>
<p>Fortunately, in order to make this more human-readable, we can pass it through
<code>expand</code>, which turns tabs into a given number of spaces (8 by default):</p>
<!-- exec -->
<pre><code>$ expand -t14 all_authors.tsv
Robinson Eden
Waring Gwendolyn L.
Tiptree James Jr.
Brunner John
Tolkien John Ronald Reuel
Walton Jo
Toews Miriam
Cadigan Pat
Le Guin Ursula K.
Veselka Vanessa
<!-- end -->
<p>Now it&rsquo;s easy to sort by last name:</p>
<!-- exec -->
<pre><code>$ sort -k1 all_authors.tsv | expand -t14
Brunner John
Cadigan Pat
Le Guin Ursula K.
Robinson Eden
Tiptree James Jr.
Toews Miriam
Tolkien John Ronald Reuel
Veselka Vanessa
Walton Jo
Waring Gwendolyn L.
<!-- end -->
<p>Or just extract middle names and initials:</p>
<!-- exec -->
<pre><code>$ cut -f3 all_authors.tsv
Ronald Reuel
<!-- end -->
<p>It probably won&rsquo;t surprise you to learn that there&rsquo;s a corresponding <code>paste</code>
command, which takes two or more files and stitches them together with tab
characters. Let&rsquo;s extract a couple of things from our author list and put them
back together in a different order:</p>
<!-- exec -->
<pre><code>$ cut -f1 all_authors.tsv &gt; lastnames
<!-- end -->
<!-- exec -->
<pre><code>$ cut -f2 all_authors.tsv &gt; firstnames
<!-- end -->
<!-- exec -->
<pre><code>$ paste firstnames lastnames | sort -k2 | expand -t12
John Brunner
Pat Cadigan
Ursula Le Guin
Eden Robinson
James Tiptree
Miriam Toews
John Tolkien
Vanessa Veselka
Jo Walton
Gwendolyn Waring
<!-- end -->
<p>As these examples show, TSV is something very like a primitive spreadsheet: A
way to represent information in columns and rows. In fact, it&rsquo;s a close cousin
of CSV, which is often used as a lowest-common-denominator format for
transferring spreadsheets, and which represents data something like this:</p>
Tolkien,John,Ronald Reuel,
<p>The advantage of tabs is that they&rsquo;re supported by a bunch of the standard
tools. A disadvantage is that they&rsquo;re kind of ugly and can be weird to deal
with, but they&rsquo;re useful anyway, and character-delimited rows are often a
good-enough way to hack your way through problems that call for basic
<h2><a name=finding-text-grep href=#finding-text-grep>#</a> finding text: grep</h2>
<p>After all those contortions, what if you actually just want to see <em>which lists</em>
an individual author appears on?</p>
<!-- exec -->
<pre><code>$ grep 'Vanessa' ./authors_*
./authors_contemporary_fic:Vanessa Veselka
./authors_sff:Vanessa Veselka
<!-- end -->
<p><code>grep</code> takes a string to search for and, optionally, a list of files to search
in. If you don&rsquo;t specify files, it&rsquo;ll look through standard input instead:</p>
<!-- exec -->
<pre><code>$ cat ./authors_* | grep 'Vanessa'
Vanessa Veselka
Vanessa Veselka
<!-- end -->
<p>Most of the time, piping the output of <code>cat</code> to <code>grep</code> is considered silly,
because <code>grep</code> knows how to find things in files on its own. Many thousands of
words have been written on this topic by leading lights of the nerd community.</p>
<p>You&rsquo;ve probably noticed that this result doesn&rsquo;t contain filenames (and thus
isn&rsquo;t very useful to us). That&rsquo;s because all <code>grep</code> saw was the lines in the
files, not the names of the files themselves.</p>
<h2><a name=now-you-have-n-problems href=#now-you-have-n-problems>#</a> now you have n problems</h2>
<p>To close out this introductory chapter, let&rsquo;s spend a little time on a topic
that will likely vex, confound, and (occasionally) delight you for as long as
you are acquainted with the command line.</p>
<p>When I was talking about <code>grep</code> a moment ago, I fudged the details more than a
little by saying that it expects a string to search for. What <code>grep</code>
<em>actually</em> expects is a <em>pattern</em>. Moreover, it expects a specific kind of
pattern, what&rsquo;s known as a <em>regular expression</em>, a cumbersome phrase frequently
shortened to regex.</p>
<p>There&rsquo;s a lot of theory about what makes up a regular expression. Fortunately,
very little of it matters to the short version that will let you get useful
stuff done. The short version is that a regex is like using wildcards in the
shell to match groups of files, but for text in general and with more magic.</p>
<!-- exec -->
<pre><code>$ grep 'Jo.*' ./authors_*
./authors_sff:Jo Walton
./authors_sff:John Ronald Reuel Tolkien
./authors_sff:John Brunner
<!-- end -->
<p>The pattern <code>Jo.*</code> says that we&rsquo;re looking for lines which contain a literal
<code>Jo</code>, followed by any quantity (including none) of any character. In a regex,
<code>.</code> means &ldquo;anything&rdquo; and <code>*</code> means &ldquo;any amount of the preceding thing&rdquo;.</p>
<p><code>.</code> and <code>*</code> are magical. In the particular dialect of regexen understood
by <code>grep</code>, other magical things include:</p>
<tr><td><code>^</code> </td> <td>start of a line </td></tr>
<tr><td><code>$</code> </td> <td>end of a line </td></tr>
<tr><td><code>[abc]</code></td> <td>one of a, b, or c </td></tr>
<tr><td><code>[a-z]</code></td> <td>a character in the range a through z</td></tr>
<tr><td><code>[0-9]</code></td> <td>a character in the range 0 through 9</td></tr>
<tr><td><code>+</code> </td> <td>one or more of the preceding thing </td></tr>
<tr><td><code>?</code> </td> <td>0 or 1 of the preceding thing </td></tr>
<tr><td><code>*</code> </td> <td>any number of the preceding thing </td></tr>
<tr><td><code>(foo|bar)</code></td> <td>"foo" or "bar"</td></tr>
<tr><td><code>(foo)?</code></td> <td>optional "foo"</td></tr>
<p>It&rsquo;s actually a little more complicated than that: By default, if you want to
use a lot of the magical characters, you have to prefix them with <code>\</code>. This is
both ugly and confusing, so unless you&rsquo;re writing a very simple pattern, it&rsquo;s
often easiest to call <code>grep -E</code>, for <strong>E</strong>xtended regular expressions, which
means that lots of characters will have special meanings.</p>
<p>Authors with 4-letter first names:</p>
<!-- exec -->
<pre><code>$ grep -iE '^[a-z]{4} ' ./authors_*
./authors_contemporary_fic:Eden Robinson
./authors_sff:John Ronald Reuel Tolkien
./authors_sff:John Brunner
<!-- end -->
<p>A count of authors named John:</p>
<!-- exec -->
<pre><code>$ grep -c '^John ' ./all_authors
<!-- end -->
<p>Lines in this file matching the words &ldquo;magic&rdquo; or &ldquo;magical&rdquo;:</p>
<pre><code>$ grep -iE 'magic(al)?' ./index.md
Pipes are some of the most important magic in the shell. When the people who
shell to match groups of files, but with more magic.
`.` and `*` are magical. In the particular dialect of regexen understood
by `grep`, other magical things include:
use a lot of the magical characters, you have to prefix them with `\`. This is
Lines in this file matching the words "magic" or "magical":
$ grep -iE 'magic(al)?' ./index.md
<p>Find some &ldquo;-agic&rdquo; words in a big list of words:</p>
<!-- exec -->
<pre><code>$ grep -iE '(m|tr|pel)agic' /usr/share/dict/words
<!-- end -->
<p><code>grep</code> isn&rsquo;t the only - or even the most important - tool that makes use of
regular expressions, but it&rsquo;s a good place to start because it&rsquo;s one of the
fundamental building blocks for so many other operations. Filtering lists of
things, matching patterns within collections, and writing concise descriptions
of how text should be transformed are at the heart of a practical approach to
Unix-like systems. Regexen turn out to be a seductively powerful way to do
these things - so much so that they&rsquo;ve crept their way into text editors,
databases, and full-featured programming languages.</p>
<p>There&rsquo;s a dark side to all of this, for the truth about regular expressions is
that they are ugly, inconsistent, brittle, and <em>incredibly</em> difficult to think
clearly about. They take years to master and reward the wielder with great
power, but they are also a trap: a temptation towards the path of cleverness
masquerading as wisdom.</p>
<p style="text-align:center;"></p>
<p>I&rsquo;ll be returning to this theme, but for the time being let&rsquo;s move on. Now
that we&rsquo;ve established, however haphazardly, some of the basics, let&rsquo;s consider
their application to a real-world task.</p>
<hr />
<h1><a name=a-literary-problem href=#a-literary-problem>#</a> 2. a literary problem</h1>
<p>The <a href="../literary_environment">previous chapter</a> introduced a bunch of tools
using contrived examples. Now we&rsquo;ll look at a real problem, and work through a
solution by building on tools we&rsquo;ve already covered.</p>
<p>So on to the problem: I write poetry.</p>
<p>{rimshot dot wav}</p>
<p>Most of the poems I have written are not very good, but lately I&rsquo;ve been
thinking that I&rsquo;d like to comb through the last ten years' worth and pull
the least-embarrassing stuff into a single collection.</p>
<p>I&rsquo;ve hinted at how the contents of my blog are stored as files, but let&rsquo;s take
a look at the whole thing:</p>
<pre><code>$ ls -F ~/p1k3/archives/
1997/ 2003/ 2009/ bones/ meta/
1998/ 2004/ 2010/ chapbook/ winfield/
1999/ 2005/ 2011/ cli/ wip/
2000/ 2006/ 2012/ colophon/
2001/ 2007/ 2013/ europe/
2002/ 2008/ 2014/ hack/
<p>(<code>ls</code>, again, just lists files. <code>-F</code> tells it to append a character that shows
it what type of file we&rsquo;re looking at, such as a trailing / for directories.
<code>~</code> is a shorthand that means &ldquo;my home directory&rdquo;, which in this case is
<p>Each of the directories here holds other directories. The ones for each year
have sub-directories for the months of the year, which in turn contain files
for the days. The files are just little pieces of HTML and Markdown and some
other stuff. Many years ago, before I had much of an idea how to program, I
wrote a script to glue them all together into a web page and serve them up to
visitors. This all sounds complicated, but all it really means is that if I
want to write a blog entry, I just open a file and type some stuff. Here&rsquo;s an
example for March 1st:</p>
<!-- exec -->
<pre><code>$ cat ~/p1k3/archives/2014/3/1
&lt;h1&gt;Saturday, March 1&lt;/h1&gt;
Sometimes I'm going along on a Saturday morning, still a little dazed from the
night before, and I think something like "I should just go write a detailed
analysis of hooded sweatshirts". Mostly these thoughts don't survive contact
with an actual keyboard. It's almost certainly for the best.
<!-- end -->
<p>And here&rsquo;s an older one that contains a short poem:</p>
<!-- took this one out of exec block 'cause later i
made a dir out of it... -->
<pre><code>$ cat ~/p1k3/archives/2012/10/9
&lt;h1&gt;tuesday, october 9&lt;/h1&gt;
&lt;freeverse&gt;i am a stateful machine
i exist in a manifold of consequence
a clattering miscellany of impure functions
and side effects&lt;/freeverse&gt;
<p>Notice that <code>&lt;freeverse&gt;</code> bit? It kind of looks like an HTML tag, but it&rsquo;s
not. What it actually does is tell my blog script that it should format the
text it contains like a poem. The specifics don&rsquo;t matter for our purposes
(yet), but this convention is going to come in handy, because the first thing I
want to do is get a list of all the entries that contain poems.</p>
<p>Remember <code>grep</code>?</p>
<pre><code>$ grep -ri '&lt;freeverse&gt;' ~/p1k3/archives &gt; ~/possible_poems
<p>Let&rsquo;s step through this bit by bit:</p>
<p>First, I&rsquo;m asking <code>grep</code> to search <strong>r</strong>ecursively, <strong>i</strong>gnoring case.
&ldquo;Recursively&rdquo; just means that every time the program finds a directory, it
should descend into that directory and search in any files there as well.</p>
<pre><code>grep -ri
<p>Next comes a pattern to search for. It&rsquo;s in single quotes because the
characters <code>&lt;</code> and <code>&gt;</code> have a special meaning to the shell, and here we need
the shell to understand that it should treat them as literal angle brackets
<p>This is the path I want to search:</p>
<p>Finally, because there are so many entries to search, I know the process will
be slow and produce a large list, so I tell the shell to redirect it to a file
called <code>possible_poems</code> in my home directory:</p>
<pre><code>&gt; ~/possible_poems
<p>This is quite a few instances&hellip;</p>
<pre><code>$ wc -l ~/possible_poems
679 /home/brennen/possible_poems
<p>&hellip;and it&rsquo;s also not super-pretty to look at:</p>
<pre><code>$ head -5 ~/possible_poems
/home/brennen/p1k3/archives/2011/10/14:&lt;freeverse&gt;i've got this friend has a real knack
/home/brennen/p1k3/archives/2011/4/25:&lt;freeverse&gt;i can't claim to strive for it
/home/brennen/p1k3/archives/2011/8/10:&lt;freeverse&gt;one diminishes or becomes greater
/home/brennen/p1k3/archives/2011/1/1:&lt;freeverse&gt;six years on
<p>Still, it&rsquo;s a decent start. I can see paths to the files I have to check, and
usually a first line. Since I use a fancy text editor, I can just go down the
list opening each file in a new window and copying the stuff I&rsquo;m interested in
to a new file.</p>
<p>This is good enough for government work, but what if instead of jumping around
between hundreds of files, I&rsquo;d rather read everything in one file and just weed
out the bad ones as I go?</p>
<pre><code>$ cat `grep -ril '&lt;freeverse&gt;' ~/p1k3/archives` &gt; ~/possible_poems_full
<p>This probably bears some explaining. <code>grep</code> is still doing all the real work
here. The main difference from before is that <code>-l</code> tells grep to just list any
files it finds which contain a match.</p>
<pre><code>`grep -ril '&lt;freeverse&gt;' ~/p1k3/archives`
<p>Notice those backticks around the grep command? This part is a little
trippier. It turns out that if you put backticks around something in a
command, it&rsquo;ll get executed and replaced with its result, which in turn gets
executed as part of the larger command. So what we&rsquo;re really saying is
something like:</p>
<pre><code>$ cat [all of the files in the blog directory with &lt;freeverse&gt; in them]
<p>Did you catch that? I just wrote a command that rewrote itself as a
<em>different</em>, more specific command. And it appears to have worked on the
first try:</p>
<pre><code>$ wc ~/possible_poems_full
17628 80980 528699 /home/brennen/possible_poems_full
<p>Welcome to wizard school.</p>
<hr />
<h1><a name=programmerthink href=#programmerthink>#</a> 3. programmerthink</h1>
<p>In the <a href="#a-literary-problem">preceding chapter</a>, I worked through accumulating
a big piece of text from some other, smaller texts. I started with a bunch of
files and wound up with one big file called <code>potential_poems_full</code>.</p>
<p>Let&rsquo;s talk for a minute about how programmers approach problems like this one.
What I&rsquo;ve just done is sort of an old-school humanities take on things:
Metaphorically speaking, I took a book off the shelf and hauled it down to the
copy machine to xerox a bunch of pages, and now I&rsquo;m going to start in on them
with a highlighter and some Post-Its or something. A process like this will
often trigger a cascade of questions in the programmer-mind:</p>
<li>What if, halfway through the project, I realize my selection criteria were all
wrong and have to backtrack?</li>
<li>What if I discover corrections that also need to be made in the source documents?</li>
<li>What if I want to access metadata, like the original location of a file?</li>
<li>What if I want to quickly re-order the poems according to some new criteria?</li>
<li>Why am I storing the same text in two different places?</li>
<p>A unifying theme of these questions is that they could all be answered by
involving a little more abstraction.</p>
<p style="text-align:center;"></p>
<p>Some kinds of abstraction are so common in the physical world that we can
forget they&rsquo;re part of a sophisticated technology. For example, a good deal of
bicycle maintenance can be accomplished with a cheap multi-tool containing a
few different sizes of hex wrench and a couple of screwdrivers.</p>
<p>A hex wrench or screwdriver doesn&rsquo;t really know anything about bicycles. All
it <em>really</em> knows about is fitting into a space and allowing torque to be
applied. Standardized fasteners and adjustment mechanisms on a bicycle ensure
that the work can be done anywhere, by anyone with a certain set of tools.
Standard tools mean that if you can work on a particular bike, you can work on
<em>most</em> bikes, and even on things that aren&rsquo;t bikes at all, but were designed by
people with the same abstractions in mind.</p>
<p>The relationship between a wrench, a bolt, and the purpose of a bolt is a lot
like something we call <em>indirection</em> in software. Programs like <code>grep</code> or
<code>cat</code> don&rsquo;t really know anything about poetry. All they <em>really</em> know about is
finding lines of text in input, or sticking inputs together. Files, lines, and
text are like standardized fasteners that allow a user who can work on one kind
of data (be it poetry, a list of authors, the source code of a program) to use
the same tools for other problems and other data.</p>
<p style="text-align:center;"></p>
<p>When I first started writing stuff on the web, I edited a page &mdash; a single HTML
file &mdash; by hand. When the entries on my nascent blog got old, I manually
cut-and-pasted them to archive files with names like <code>old_main97.html</code>, which
held all of the stuff I&rsquo;d written in 1997.</p>
<p>I&rsquo;m not holding this up as an example of youthful folly. In fact, it worked
fine, and just having a single, static file that you can open in any text
editor has turned out to be a <em>lot</em> more future-proof than the sophisticated
blogging software people were starting to write at the time.</p>
<p>And yet. Something about this habit nagged at my developing programmer mind
after a few years. It was just a little bit too manual and repetitive, a
little bit silly to have to write things like a table of contents by hand, or
move entries around by copy-and-pasting them to different files. Since I knew
the date for each entry, and wanted to make them navigable on that basis, why
not define a directory structure for the years and months, and then write a
file to hold each day? That way, all I&rsquo;d have to do is concatenate the files
in one directory to display any given month:</p>
<pre><code>$ cat ~/p1k3/archives/2014/1/* | head -10
&lt;h1&gt;Sunday, January 12&lt;/h1&gt;
&lt;h2&gt;the one casey is waiting for&lt;/h2&gt;
after a while
the thing about drinking
is that it just feeds
what you drink to kill
and kills
<p>I ultimately wound up writing a few thousand lines of Perl to do the actual
work, but the essential idea of the thing is still little more than invoking
<code>cat</code> on some stuff.</p>
<p>I didn&rsquo;t know the word for it at the time, but what I was reaching for was a
kind of indirection. By putting blog posts in a specific directory layout, I
was creating a simple model of the temporal structure that I considered their
most important property. Now, if I want to write commands that ask questions
about my blog posts or re-combine them in certain ways, I can address my
concerns to this model. Maybe, for example, I want a rough idea how many words
I&rsquo;ve written in blog posts so far in 2014:</p>
<pre><code>$ find ~/p1k3/archives/2014/ -type f | xargs cat | wc -w
<p><code>xargs</code> is not the most intuitive command, but it&rsquo;s useful and common enough to
explain here. At the end of last chapter, when I said:</p>
<pre><code>$ cat `grep -ril '&lt;freeverse&gt;' ~/p1k3/archives` &gt; ~/possible_poems_full
<p>I could also have written this as:</p>
<pre><code>$ grep -ril '&lt;freeverse&gt;' ~/p1k3/archives | xargs cat &gt; ~/possible_poems_full
<p>What this does is take its input, which starts like:</p>
<p>&hellip;and run <code>cat</code> on all the things in it:</p>
<pre><code>cat /home/brennen/p1k3/archives/2002/10/16 /home/brennen/p1k3/archives/2002/10/27 /home/brennen/p1k3/archives/2002/10/10 ...
<p>It can be a better idea to use <code>xargs</code>, because while backticks are
incredibly useful, they have some limitations. If you&rsquo;re dealing with a very
large list of files, for example, you might exceed the maximum allowed length
for arguments to a command on your system. <code>xargs</code> is smart enough to know
that limit and run <code>cat</code> more than once if needed.</p>
<p><code>xargs</code> is actually sort of a pain to think about, and will make you jump
through some irritating hoops if you have spaces or other weirdness in your
filenames, but I wind up using it quite a bit.</p>
<p>Maybe I want to see a table of contents:</p>
<!-- exec -->
<pre><code>$ find ~/p1k3/archives/2014/ -type d | xargs ls -v | head -10
<!-- end -->
<p>Or find the subtitles I used in 2013:</p>
<!-- exec -->
<pre><code>$ find ~/p1k3/archives/2012/ -type f | xargs perl -ne 'print "$1\n" if m{&lt;h2&gt;(.*?)&lt;/h2&gt;}'
this poem again
i'll do better next time
timebinding animals
more observations on gear nerdery &amp;amp; utility fetishism
A miracle, in fact, means work
&lt;em&gt;technical notes for late october&lt;/em&gt;, or &lt;em&gt;it gets dork out earlier these days&lt;/em&gt;
light enough to travel
"figures like Heinlein and Gingrich"
<!-- end -->
<p>The crucial thing about this is that the filesystem <em>itself</em> is just like <code>cat</code>
and <code>grep</code>: It doesn&rsquo;t know anything about blogs (or poetry), and it&rsquo;s
basically indifferent to the actual <em>structure</em> of a file like
<code>~/p1k3/archives/2014/1/12</code>. What the filesystem knows is that there are files
with certain names in certain places. It need not know anything about the
<em>meaning</em> of those names in order to be useful; in fact, it&rsquo;s best if it stays
agnostic about the question, for this enables us to assign our own meaning to a
structure and manipulate that structure with standard tools.</p>
<p style="text-align:center;"></p>
<p>Back to the problem at hand: I have this collection of files, and I know how
to extract the ones that contain poems. My goal is to see all the poems and
collect the subset of them that I still find worthwhile. Just knowing how to
grep and then edit a big file solves my problem, in a basic sort of way. And
yet: Something about this nags at my mind. I find that, just as I can already
use standard tools and the filesystem to ask questions about all of my blog
posts in a given year or month, I would like to be able to ask questions about
the set of interesting poems.</p>
<p>If I want the freedom to execute many different sorts of commands against this
set of poems, it begins to seem that I need a model.</p>
<p>When programmers talk about models, they often mean something that people in
the sciences would recognize: We find ways to represent the arrangement of
facts so that we can think about them. A structured representation of things
often means that we can <em>change</em> those things, or at least derive new
understanding of them.</p>
<p style="text-align:center;"></p>
<p>At this point in the narrative, I could pretend that my next step is
immediately obvious, but in fact it&rsquo;s not. I spend a couple of days thinking
off and on about how to proceed, scribbling notes during bus rides and while
drinking beers at the pizza joint down the street. I assess and discard ideas
which fall into a handful of broad approaches:</p>
<li>Store blog entries in a relational database system which would allow me to
associate them with data like &ldquo;this entry is in a collection called &lsquo;ok
<li>Selectively build up a file containing the list of files with ok poems, and use
it to do other tasks.</li>
<li>Define a format for metadata that lives within entry files.</li>
<li>Turn each interesting file into a directory of its own which contains a file
with the original text and another file with metadata.</li>
<p>I discard the relational database idea immediately: I like working with files,
and I don&rsquo;t feel like abandoning a model that&rsquo;s served me well for my entire
adult life.</p>
<p>Building up an index file to point at the other files I&rsquo;m working with has a
certain appeal. I&rsquo;m already most of the way there with the <code>grep</code> output in
<code>potential_poems</code>. It would be easy to write shell commands to add, remove,
sort, and search entries. Still, it doesn&rsquo;t feel like a very satisfying
solution unto itself. I&rsquo;d like to know that an entry is part of the collection
just by looking at the entry, without having to cross-reference it to a list
somewhere else.</p>
<p>What about putting some meaningful text in the file itself? I thought about
a bunch of different ways to do this, some of them really complicated, and
eventually arrived at this:</p>
<pre><code>&lt;!-- collection: ok-poems --&gt;
<p>The <code>&lt;!-- --&gt;</code> bits are how you define a comment in HTML, which means that
neither my blog code nor web browsers nor my text editor have to know anything
about the format, but I can easily find files with certain values. Check it:</p>
<pre><code>$ find ~/p1k3/archives -type f | xargs perl -ne 'print "$ARGV[0]: $1 -&gt; $2\n" if m{&lt;!-- ([a-z]+): (.*?) --&gt;};'
/home/brennen/p1k3/archives/2014/2/9: collection -&gt; ok-poems
<p>That&rsquo;s an ugly one-liner, and I haven&rsquo;t explained half of what it does, but the
comment format actually seems pretty workable for this. It&rsquo;s a little tacky to
look at, but it&rsquo;s simple and searchable.</p>
<p>Before we settle, though, let&rsquo;s turn to the notion of making each entry into a
directory that can contain some structured metadata in a separate file.
Imagine something like:</p>
<pre><code>$ ls ~/p1k3/archives/2013/2/9
index Meta
<p>Here I use the name &ldquo;index&rdquo; for the main part of the entry because it&rsquo;s a
convention of web sites for the top-level page in a directory to be called
something like <code>index.html</code>. As it happens, my blog software already supports
this kind of file layout for entries which contain multiple parts, image files,
and so forth.</p>
<pre><code>$ head ~/p1k3/archives/2013/2/9/index
&lt;h1&gt;saturday, february 9&lt;/h1&gt;
midwinter midafternoon; depressed as hell
sitting in a huge cabin in the rich-people mountains
writing a sprawl, pages, of melancholic midlife bullshit
outside the snow gives way to broken clouds and the
clear unyielding light of the high country sun fills
$ cat ~/p1k3/archives/2013/2/9/Meta
collection: ok-poems
<p>It would then be easy to <code>find</code> files called <code>Meta</code> and grep them for
<code>collection: ok-poems</code>.</p>
<p>What if I put metadata right in the filename itself, and dispense with the grep
<pre><code>$ ls ~/p1k3/archives/2013/2/9
index meta-ok-poem
$ find ~/p1k3/archives -name 'meta-ok-poem'
<p>There&rsquo;s a lot to like about this. For one thing, it&rsquo;s immediately visible in a
directory listing. For another, it doesn&rsquo;t require searching through thousands
of lines of text to extract a specific string. If a directory has a
<code>meta-ok-poem</code> in it, I can be pretty sure that it will contain an interesting
<p>What are the downsides? Well, it requires transforming lots of text files into
directories-containing-files. I might automate that process, but it&rsquo;s still a
little tedious and it makes the layout of the entry archive more complicated
overall. There&rsquo;s a cost to doing things this way. It lets me extend my
existing model of a blog entry to include arbitrary metadata, but it also adds
steps to writing or finding blog entries.</p>
<p>Abstractions usually cost you something. Is this one worth the hassle?
Sometimes the best way to answer that question is to start writing code that
handles a given abstraction.</p>
<hr />
<h1><a name=script href=#script>#</a> 4. script</h1>
<p>Back in chapter 1, I said that &ldquo;the way you use the computer is often just to write
little programs that invoke other programs&rdquo;. In fact, we&rsquo;ve already gone over a
bunch of these. Grepping through the text of a previous chapter should pull
up some good examples:</p>
<!-- exec -->
<pre><code>$ grep -E '\$ [a-z]+.*\| ' ../literary_environment/index.md
$ sort authors_* | uniq -c
$ sort authors_* | uniq &gt; ./all_authors
$ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | tail -1
$ sort authors_* | uniq | wc -l
$ sort colors | uniq -i | tail -1
$ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
$ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
$ sort -k1 all_authors.tsv | expand -t14
$ paste firstnames lastnames | sort -k2 | expand -t12
$ cat ./authors_* | grep 'Vanessa'
<!-- end -->
<p>None of these one-liners do all that much, but they all take input of one sort
or another and apply one or more transformations to it. They&rsquo;re little formal
sentences describing how to make one thing into another, which is as good a
definition of programming as most. Or at least this is a good way to describe
programming-in-the-small. (A lot of the programs we use day-to-day are more
like essays, novels, or interminable Fantasy series where every character you
like dies horribly than they are like individual sentences.)</p>
<p>One-liners like these are all well and good when you&rsquo;re staring at a terminal,
trying to figure something out - but what about when you&rsquo;ve already figured it out and
you want to repeat it in the future?</p>
<p>It turns out that Bash has you covered. Since shell commands are just text,
they can live in a text file as easily as they can be typed.</p>
<h2><a name=learn-you-an-editor href=#learn-you-an-editor>#</a> learn you an editor</h2>
<p>We&rsquo;ve skirted the topic so far, but now that we&rsquo;re talking about writing out
text files in earnest, you&rsquo;re going to want a text editor.</p>
<p>My editor is where I spend most of my time that isn&rsquo;t in a web browser, because
it&rsquo;s where I write both code and prose. It turns out that the features which
make a good code editor overlap a lot with the ones that make a good editor of
English sentences.</p>
<p>So what should you use? Well, there have been other contenders in recent
years, but in truth nothing comes close to dethroning the Great Old Ones of
text editing. Emacs is a creature both primal and sophisticated, like an
avatar of some interstellar civilization that evolved long before multicellular
life existed on earth and seeded the galaxy with incomprehensible artefacts and
colossal engineering projects. Vim is like a lovable chainsaw-studded robot
with the most elegant keyboard interface in history secretly emblazoned on its
shining diamond heart.</p>
<p>It&rsquo;s worth the time it takes to learn one of the serious editors, but there are
easier places to start. Nano, for example, is easy to pick up, and should be
available on most systems. To start it, just say:</p>
<pre><code>$ nano file
<p>You should see something like this:</p>
<p style="text-align:center;"> <img src="images/nano.png" alt="nano" /></p>
<p>Arrow keys will move your cursor around, and typing stuff will make it appear
in the file. This is pretty much like every other editor you&rsquo;ve ever used. If
you haven&rsquo;t used Nano before, that stuff along the bottom of the terminal is a
reference to the most commonly used commands. <code>^</code> is a convention for &ldquo;Ctrl&rdquo;,
so <code>^O</code> means Ctrl-o (the case of the letter doesn&rsquo;t actually matter), which
will save the file you&rsquo;re working on. Ctrl-x will quit, which is probably the
first important thing to know about any given editor.</p>
<h2><a name=d-i-y-utilities href=#d-i-y-utilities>#</a> d.i.y. utilities</h2>
<p>So back to putting commands in text files. Here&rsquo;s a file I just created in
my editor:</p>
<!-- exec -->
<pre><code>$ cat okpoems
# find all the marker files and get the name of
# the directory containing each
find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
exit 0
<!-- end -->
<p>This is known as a script. There are a handful of things to notice here.
First, there&rsquo;s this fragment:</p>
<p>The <code>#!</code> right at the beginning, followed by the path to a program, is a
special sequence that lets the kernel know what program should be used to
interpret the contents of the file. <code>/bin/bash</code> is the path on the filesystem
where Bash itself lives. You might see this referred to as a shebang or a hash
<p>Lines that start with a <code>#</code> are comments, used to describe the code to a human
reader. The <code>exit 0</code> tells Bash that the currently running script should exit
with a status of 0, which basically means &ldquo;nothing went wrong&rdquo;.</p>
<p>If you examine the directory listing for <code>okpoems</code>, you&rsquo;ll see something
<!-- exec -->
<pre><code>$ ls -l okpoems
-rwxrwxr-x 1 brennen brennen 163 Apr 19 00:08 okpoems
<!-- end -->
<p>That looks pretty cryptic. For the moment, just remember that those little
<code>x</code>s in the first bit mean that the file has been marked e<strong>x</strong>ecutable. We
accomplish this by saying something like:</p>
<pre><code>$ chmod +x ./okpoems
<p>Once that&rsquo;s done, it and the shebang line in combination mean that typing
<code>./okpoems</code> will have the same effect as typing <code>bash okpoems</code>:</p>
<!-- exec -->
<pre><code>$ ./okpoems
<!-- end -->
<h2><a name=heavy-lifting href=#heavy-lifting>#</a> heavy lifting</h2>
<p><code>okpoems</code> demonstrates the basics, but it doesn&rsquo;t do very much. Here&rsquo;s
a script with a little more substance to it:</p>
<!-- exec -->
<pre><code>$ cat markpoem
# $1 is the first parameter to our script
# Complain and exit if we weren't given a path:
if [ ! $POEM ]; then
echo 'usage: markpoem &lt;path&gt;'
# Confusingly, an exit status of 0 means to the shell that everything went
# fine, while any other number means that something went wrong.
exit 64
if [ ! -e $POEM ]; then
echo "$POEM not found"
exit 66
echo "marking $POEM an ok poem"
# If the target is a plain file instead of a directory, make it into
# a directory and move the content into $POEM/index:
if [ -f $POEM ]; then
echo "making $POEM into a directory, moving content to"
echo " $POEM/index"
TEMPFILE="/tmp/$POEM_BASENAME.$(date +%s.%N)"
mkdir $POEM
mv $TEMPFILE $POEM/index
if [ -d $POEM ]; then
# touch(1) will either create the file or update its timestamp:
touch $POEM/meta-ok-poem
echo "something broke - why isn't $POEM a directory?"
file $POEM
# Signal that all is copacetic:
echo kthxbai
exit 0
<!-- end -->
<p>Both of these scripts are imperfect, but they were quick to write, they&rsquo;re made
out of standard commands, and I don&rsquo;t yet hate myself for them: All signs that
I&rsquo;m not totally on the wrong track with the <code>meta-ok-poem</code> abstraction, and
could live with it as part of an ongoing writing project. <code>okpoems</code> and
<code>markpoem</code> would also be easy to use with custom keybindings in my editor. In
a few more lines of code, I can build a system to wade through the list of
candidate files and quickly mark the interesting ones.</p>
<h2><a name=generality href=#generality>#</a> generality</h2>
<p>So what&rsquo;s lacking here? Well, probably a bunch of things, feature-wise. I can
imagine writing a script to unmark a poem, for example. That said, there&rsquo;s one
really glaring problem. &ldquo;Ok poem&rdquo; is only one kind of property a blog entry
might possess. Suppose I wanted a way to express that a poem is terrible?</p>
<p>It turns out I already know how to add properties to an entry. If I generalize
just a little, the tools become much more flexible.</p>
<!-- exec -->
<pre><code>$ ./addprop /home/brennen/p1k3/archives/2012/3/26 meta-terrible-poem
marking /home/brennen/p1k3/archives/2012/3/26 with meta-terrible-poem
<!-- end -->
<!-- exec -->
<pre><code>$ ./findprop meta-terrible-poem
<!-- end -->
<p><code>addprop</code> is only a little different from <code>markpoem</code>. It takes two parameters
instead of one - the target entry and a property to add.</p>
<!-- exec -->
<pre><code>$ cat addprop
# Complain and exit if we weren't given a path and a property:
if [[ ! $ENTRY || ! $PROPERTY ]]; then
echo "usage: addprop &lt;path&gt; &lt;property&gt;"
exit 64
if [ ! -e $ENTRY ]; then
echo "$ENTRY not found"
exit 66
echo "marking $ENTRY with $PROPERTY"
# If the target is a plain file instead of a directory, make it into
# a directory and move the content into $ENTRY/index:
if [ -f $ENTRY ]; then
echo "making $ENTRY into a directory, moving content to"
echo " $ENTRY/index"
# Get a safe temporary file:
mkdir $ENTRY
if [ -d $ENTRY ]; then
echo "something broke - why isn't $ENTRY a directory?"
file $ENTRY
echo kthxbai
exit 0
<!-- end -->
<p>Meanwhile, <code>findprop</code> is more or less <code>okpoems</code>, but with a parameter for the
property to find:</p>
<!-- exec -->
<pre><code>$ cat findprop
if [ ! $1 ]
echo "usage: findprop &lt;property&gt;"
# find all the marker files and get the name of
# the directory containing each
find ~/p1k3/archives -name $1 | xargs -n1 dirname
exit 0
<!-- end -->
<p>These scripts aren&rsquo;t much more complicated than their poem-specific
counterparts, but now they can be used to solve problems I haven&rsquo;t even thought
of yet, and included in other scripts that need their functionality.</p>
<hr />
<h1><a name=general-purpose-programmering href=#general-purpose-programmering>#</a> 5. general purpose programmering</h1>
<p>I didn&rsquo;t set out to write a book about programming, <em>as such</em>, but because
programming and the command line are so inextricably linked, this text
draws near the subject almost of its own accord.</p>
<p>If you&rsquo;re not terribly interested in programming, this chapter can easily
enough be skipped. It&rsquo;s more in the way of philosophical rambling than
concrete instruction, and will be of most use to those with an existing
background in writing code.</p>
<p style="text-align:center;"></p>
<p>If you&rsquo;ve used computers for more than a few years, you&rsquo;re probably viscerally
aware that most software is fragile and most systems decay. In the time since
I took my first tentative steps into the little world of a computer (a friend&rsquo;s
dad&rsquo;s unidentifiable gaming machine, my own father&rsquo;s blue monochrome Zenith
laptop, the Apple II) the churn has been overwhelming. By now I&rsquo;ve learned my
way around vastly more software &mdash; operating systems, programming languages and
development environments, games, editors, chat clients, mail systems &mdash; than I
presently could use if I wanted to. Most of it has gone the way of some
ancient civilization, surviving (if at all) only in faint, half-understood
cultural echoes and occasional museum-piece displays. Every user of technology
becomes, in time, a refugee from an irretrievably recent past.</p>
<p>And yet, despite all this, the shell endures. Most of the ideas in this book
are older than I am. Most of them could have been applied in 1994 or
thereabouts, when I first logged on to multiuser systems running AT&amp;T Unix.
Since the early 1990s, systems built on a fundamental substrate of Unix-like
behavior and abstractions have proliferated wildly, becoming foundational at
once to the modern web, the ecosystem of free and open software, and the
technological dominance ca. 2014 of companies like Apple, Google, and Facebook.</p>
<p>Why is this, exactly?</p>
<p style="text-align:center;"></p>
<p>As I&rsquo;ve said (and hopefully shown), the commands you write in your shell
are essentially little programs. Like other programs, they can be stored
for later use and recombined with other commands, creating new uses for
your ideas.</p>
<p>It would be hard to say that there&rsquo;s any <em>one</em> reason command line environments
remain so vital after decades of evolution and hard-won refinement in computer
interfaces, but it seems like this combinatory nature is somewhere near the
heart of it. The command line often lacks the polish of other interfaces we
depend on, but in exchange it offers a richness and freedom of expression
rarely seen elsewhere, and invites its users to build upon its basic
<p>What is it that makes last chapter&rsquo;s <code>addprop</code> preferable to the more specific
<code>markpoem</code>? Let&rsquo;s look at an alternative implementation of <code>markpoem</code>:</p>
<!-- exec -->
<pre><code>$ cat simple_markpoem
addprop $1 meta-ok-poem
<!-- end -->
<p>Is this script trivial? Absolutely. It&rsquo;s so trivial that it barely seems to
exist, because I already wrote <code>addprop</code> to do all the heavy lifting and play
well with others, freeing us to imagine new uses for its central idea without
worrying about the implementation details.</p>
<p>Unlike <code>markpoem</code>, <code>addprop</code> doesn&rsquo;t know anything about poetry. All it knows
about, in fact, is putting a file (or three) in a particular place. And this
is in keeping with a basic insight of Unix: Pieces of software that do one
very simple thing generalize well. Good command line tools are like a hex
wrench, a hammer, a utility knife: They embody knowledge of turning, of
striking, of cutting &mdash; and with this kind of knowledge at hand, the user can
change the world even though no individual tool is made with complete knowledge
of the world as a whole. There&rsquo;s a lot of power in the accumulation of small
<p>Of course, if your code is only good at one thing, to be of any use, it has to
talk to code that&rsquo;s good at other things. There&rsquo;s another basic insight in the
Unix tradition: Tools should be composable. All those little programs have to
share some assumptions, have to speak some kind of trade language, in order to
combine usefully. Which is how we&rsquo;ve arrived at standard IO, pipelines,
filesystems, and text as as a lowest-common-denominator medium of exchange. If
you think about most of these things, they have some very rough edges, but they
give otherwise simple tools ways to communicate without becoming
super-complicated along the way.</p>
<p style="text-align:center;"></p>
<p>What is the command line?</p>
<p>The command line is an environment of tool use.</p>
<p>So are kitchens, workshops, libraries, and programming languages.</p>
<p style="text-align:center;"></p>
<p>Here&rsquo;s a confession: I don&rsquo;t like writing shell scripts very much, and I
can&rsquo;t blame anyone else for feeling the same way.</p>
<p>That doesn&rsquo;t mean you shouldn&rsquo;t <em>know</em> about them, or that you shouldn&rsquo;t
<em>write</em> them. I write little ones all the time, and the ability to puzzle
through other people&rsquo;s scripts comes in handy. Oftentimes, the best, most
tasteful way to automate something is to build a script out of the commonly
available commands. The standard tools are already there on millions of
machines. Many of them have been pretty well understood for a generation, and
most will probably be around for a generation or three to come. They do neat
stuff. Scripts let you build on ideas you&rsquo;ve already worked out, and give
repeatable operations a memorable, user-friendly name. They encourage reuse of
existing programs, and help express your ideas to people who&rsquo;ll come after you.</p>
<p>One of the reliable markers of powerful software is that it can be scripted: It
extends to its users some of the same power that its authors used in creating
it. Scriptable software is to some extent <em>living</em> software. It&rsquo;s a book that
you, the reader, get to help write.</p>
<p>In all these ways, shell scripts are wonderful, a little bit magical, and
quietly indispensable to the machinery of modern civilization.</p>
<p>Unfortunately, in all the ways that a shell like Bash is weird, finicky, and
covered in 40 years of incidental cruft, long-form Bash scripts are even worse.
Bash is a useful glue language, particularly if you&rsquo;re already comfortable
wiring commands together. Syntactic and conceptual innovations like pipes are
beautiful and necessary. What Bash is <em>not</em>, despite its power, is a very good
general purpose programming language. It&rsquo;s just not especially good at things
like math, or complex data structures, or not looking like a punctuation-heavy
variety of alphabet soup.</p>
<p>It turns out that there&rsquo;s a threshold of complexity beyond which life becomes
easier if you switch from shell scripting to a more robust language. Just
where this threshold is located varies a lot between users and problems, but I
often think about switching languages before a script gets bigger than I can
view on my screen all at once. <code>addprop</code> is a good example:</p>
<!-- exec -->
<pre><code>$ wc -l ../script/addprop
41 ../script/addprop
<!-- end -->
<p>41 lines is a touch over what fits on one screen in the editor I usually use.
If I were going to add much in the way of features, I&rsquo;d think pretty hard about
porting it to another language first.</p>
<p>What&rsquo;s cool is that if you know a language like C, Python, Perl, Ruby, PHP, or
JavaScript, your code can participate in the shell environment as a first class
citizen simply by respecting the conventions of standard IO, files, and command
line arguments. Often, in order to create a useful utility, it&rsquo;s only
necessary to deal with <code>STDIN</code>, or operate on a particular sort of file, and
most languages offer simple conventions for doing these things.</p>
<p style="text-align:center;"> *</p>
<p>I think the shell can be taught and understood as a humane environment, despite
all of its ugliness and complication, because it offers the materials of its
own construction to its users, whatever their concerns. The writer, the
philosopher, the scientist, the programmer: Files and text and pipes know
little enough about these things, but in their very indifference to the
specifics of any one complex purpose, they&rsquo;re adaptable to the basic needs of
many. Simple utilities which enact simple kinds of knowledge survive and
recombine because there is a wisdom to be found in small things.</p>
<p>Files and text know nothing about poetry, nothing in particular of the human
soul. Neither do pen and ink, printing presses or codex books, but somehow we
got Shakespeare and Montaigne.</p>
<hr />
<h1><a name=one-of-these-things-is-not-like-the-others href=#one-of-these-things-is-not-like-the-others>#</a> 6. one of these things is not like the others</h1>
<p>If you&rsquo;re the sort of person who took a few detours into the history of
religion in college, you might be familiar with some of the ways people used to
do textual comparison. When pen, paper, and typesetting were what scholars had
to work with, they did some fairly sophisticated things in order to expose the
relationships between multiple pieces of text.</p>
<p style="text-align:center;"> <img src="images/throckmorton_small.jpg" height=320 width=470></p>
<p>Here&rsquo;s a book I got in college: <em>Gospel Parallels: A Comparison of the
Synoptic Gospels</em>, Burton H. Throckmorton, Jr., Ed. It breaks up three books
from the New Testament by the stories and themes that they contain, and shows
the overlapping sections of each book that contain parallel texts. You can
work your way through and see what parts only show up in one book, or in two
but not the other, or in all three. Pages are arranged like so:</p>
| MAT | MAR | LUK |
| Stuff | | |
| | Stuff | |
| | Stuff | Stuff |
| | Stuff | |
| | Stuff | |
| | | |
<p>The way I understand it, a book like this one only scratches the surface of the
field. Tools like this support a lot of theory about which books copied each
other and how, and what other sources they might have copied that we&rsquo;ve since
<p>This is some <em>incredibly</em> dry material, even if you kind of dig thinking about
the questions it addresses. It takes a special temperament to actually sit
poring over fragmentary texts in ancient languages and do these painstaking
comparisons. Even if you&rsquo;re a writer or editor and work with a lot of
revisions of a text, there&rsquo;s a good chance you rarely do this kind of
comparison on your own work, because that shit is <em>tedious</em>.</p>
<h2><a name=diff href=#diff>#</a> diff</h2>
<p>It turns out that academics aren&rsquo;t the only people who need tools for comparing
different versions of a text. Working programmers, in fact, need to do this
<em>constantly</em>. Programmers are also happiest when putting off the <em>actual</em> task
at hand to solve some incidental problem that cropped up along the way, so by
now there are a lot of ways to say &ldquo;here&rsquo;s how this file is different from this
file&rdquo;, or &ldquo;here&rsquo;s how this file is different from itself a year ago&rdquo;.</p>
<p>Let&rsquo;s look at a couple of shell scripts from an earlier chapter:</p>
<!-- exec -->
<pre><code>$ cat ../script/okpoems
# find all the marker files and get the name of
# the directory containing each
find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
exit 0
<!-- end -->
<!-- exec -->
<pre><code>$ cat ../script/findprop
if [ ! $1 ]
echo "usage: findprop &lt;property&gt;"
# find all the marker files and get the name of
# the directory containing each
find ~/p1k3/archives -name $1 | xargs -n1 dirname
exit 0
<!-- end -->
<p>It&rsquo;s pretty obvious these are similar files, but do we know what <em>exactly</em>
changed between them at a glance? It wouldn&rsquo;t be hard to figure out, once. If
you wanted to be really certain about it, you could print them out, set them
side by side, and go over them with a highlighter.</p>
<p>Now imagine doing that for a bunch of files, some of them hundreds or thousands
of lines long. I&rsquo;ve actually done that before, colored markers and all, but I
didn&rsquo;t feel smart while I was doing it. This is a job for software.</p>
<!-- exec -->
<pre><code>$ diff ../script/okpoems ../script/findprop
&gt; if [ ! $1 ]
&gt; then
&gt; echo "usage: findprop &lt;property&gt;"
&gt; exit
&gt; fi
&lt; find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
&gt; find ~/p1k3/archives -name $1 | xargs -n1 dirname
<!-- end -->
<p>That&rsquo;s not the most human-friendly output, but it&rsquo;s a little simpler than it
seems at first glance. It&rsquo;s basically just a way of describing the changes
needed to turn <code>okpoems</code> into <code>findprop</code>. The string <code>2a3,8</code> can be read as
&ldquo;at line 2, add lines 3 through 8&rdquo;. Lines with a <code>&gt;</code> in front of them are
added. <code>5c11</code> can be read as &ldquo;line 5 in the original file becomes line 11 in
the new file&rdquo;, and the <code>&lt;</code> line is replaced with the <code>&gt;</code> line. If you wanted,
you could take a copy of the original file and apply these instructions by hand
in your text editor, and you&rsquo;d wind up with the new file.</p>
<p>A lot of people (me included) prefer what&rsquo;s known as a &ldquo;unified&rdquo; diff, because
it&rsquo;s easier to read and offers context for the changed lines. We can ask for
one of these with <code>diff -u</code>:</p>
<!-- exec -->
<pre><code>$ diff -u ../script/okpoems ../script/findprop
--- ../script/okpoems 2014-04-19 00:08:03.321230818 -0600
+++ ../script/findprop 2014-04-21 21:51:29.360846449 -0600
@@ -1,7 +1,13 @@
+if [ ! $1 ]
+ echo "usage: findprop &lt;property&gt;"
+ exit
# find all the marker files and get the name of
# the directory containing each
-find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
+find ~/p1k3/archives -name $1 | xargs -n1 dirname
exit 0
<!-- end -->
<p>That&rsquo;s a little longer, and has some metadata we might not always care about,
but if you look for lines starting with <code>+</code> and <code>-</code>, it&rsquo;s easy to read as
&ldquo;added these, took away these&rdquo;. This diff tells us at a glance that we added
some lines to complain if we didn&rsquo;t get a command line argument, and replaced
<code>'meta-ok-poem'</code> in the <code>find</code> command with that argument. Since it shows us
some context, we have a pretty good idea where those lines are in the file
and what they&rsquo;re for.</p>
<p>What if we don&rsquo;t care exactly <em>how</em> the files differ, but only whether they
<!-- exec -->
<pre><code>$ diff -q ../script/okpoems ../script/findprop
Files ../script/okpoems and ../script/findprop differ
<!-- end -->
<p>I use <code>diff</code> a lot in the course of my day job, because I spend a lot of time
needing to know just how two programs differ. Just as importantly, I often
need to know how (or whether!) the <em>output</em> of programs differs. As a concrete
example, I want to make sure that <code>findprop meta-ok-poem</code> is really a suitable
replacement for <code>okpoems</code>. Since I expect their output to be identical, I can
do this:</p>
<!-- exec -->
<pre><code>$ ../script/okpoems &gt; okpoem_output
<!-- end -->
<!-- exec -->
<pre><code>$ ../script/findprop meta-ok-poem &gt; findprop_output
<!-- end -->
<!-- exec -->
<pre><code>$ diff -s okpoem_output findprop_output
Files okpoem_output and findprop_output are identical
<!-- end -->
<p>The <code>-s</code> just means that <code>diff</code> should explicitly tell us if files are the
<strong>s</strong>ame. Otherwise, it&rsquo;d output nothing at all, because there aren&rsquo;t any
<p>As with many other tools, <code>diff</code> doesn&rsquo;t very much care whether it&rsquo;s looking at
shell scripts or a list of filenames or what-have-you. If you read the man
page, you&rsquo;ll find some features geared towards people writing C-like
programming languages, but its real specialty is just text files with lines
made out of characters, which works well for lots of code, but certainly could
be applied to English prose.</p>
<p>Since I have a couple of versions ready to hand, let&rsquo;s apply this to a text
with some well-known variations and a bit of a literary legacy. Here&rsquo;s the
first day of the Genesis creation narrative in a couple of English
<!-- exec -->
<pre><code>$ cat genesis_nkj
In the beginning God created the heavens and the earth. The earth was without
form, and void; and darkness was on the face of the deep. And the Spirit of
God was hovering over the face of the waters. Then God said, "Let there be
light"; and there was light. And God saw the light, that it was good; and God
divided the light from the darkness. God called the light Day, and the darkness
He called Night. So the evening and the morning were the first day.
<!-- end -->
<!-- exec -->
<pre><code>$ cat genesis_nrsv
In the beginning when God created the heavens and the earth, the earth was a
formless void and darkness covered the face of the deep, while a wind from
God swept over the face of the waters. Then God said, "Let there be light";
and there was light. And God saw that the light was good; and God separated
the light from the darkness. God called the light Day, and the darkness he
called Night. And there was evening and there was morning, the first day.
<!-- end -->
<p>What happens if we diff them?</p>
<!-- exec -->
<pre><code>$ diff -u genesis_nkj genesis_nrsv
--- genesis_nkj 2014-05-11 16:28:29.692508461 -0600
+++ genesis_nrsv 2014-05-11 16:28:29.744508459 -0600
@@ -1,6 +1,6 @@
-In the beginning God created the heavens and the earth. The earth was without
-form, and void; and darkness was on the face of the deep. And the Spirit of
-God was hovering over the face of the waters. Then God said, "Let there be
-light"; and there was light. And God saw the light, that it was good; and God
-divided the light from the darkness. God called the light Day, and the darkness
-He called Night. So the evening and the morning were the first day.
+In the beginning when God created the heavens and the earth, the earth was a
+formless void and darkness covered the face of the deep, while a wind from
+God swept over the face of the waters. Then God said, "Let there be light";
+and there was light. And God saw that the light was good; and God separated
+the light from the darkness. God called the light Day, and the darkness he
+called Night. And there was evening and there was morning, the first day.
<!-- end -->
<p>Kind of useless, right? If a given line differs by so much as a character,
it&rsquo;s not the same line. This highlights the limitations of <code>diff</code> for comparing
things that</p>
<li>aren&rsquo;t logically grouped by line</li>
<li>aren&rsquo;t easily thought of as versions of the same text with some lines changed</li>
<p>We could edit the files into a more logically defined structure, like
one-line-per-verse, and try again:</p>
<!-- exec -->
<pre><code>$ diff -u genesis_nkj_by_verse genesis_nrsv_by_verse
--- genesis_nkj_by_verse 2014-05-11 16:51:14.312457198 -0600
+++ genesis_nrsv_by_verse 2014-05-11 16:53:02.484453134 -0600
@@ -1,5 +1,5 @@
-In the beginning God created the heavens and the earth.
-The earth was without form, and void; and darkness was on the face of the deep. And the Spirit of God was hovering over the face of the waters.
+In the beginning when God created the heavens and the earth,
+the earth was a formless void and darkness covered the face of the deep, while a wind from God swept over the face of the waters.
Then God said, "Let there be light"; and there was light.
-And God saw the light, that it was good; and God divided the light from the darkness.
-God called the light Day, and the darkness He called Night. So the evening and the morning were the first day.
+And God saw that the light was good; and God separated the light from the darkness.
+God called the light Day, and the darkness he called Night. And there was evening and there was morning, the first day.
<!-- end -->
<p>It might be a little more descriptive, but editing all that text just for a
quick comparison felt suspiciously like work, and anyway the output still
doesn&rsquo;t seem very useful.</p>
<h2><a name=wdiff href=#wdiff>#</a> wdiff</h2>
<p>For cases like this, I&rsquo;m fond of a tool called <code>wdiff</code>:</p>
<!-- exec -->
<pre><code>$ wdiff genesis_nkj genesis_nrsv
In the beginning {+when+} God created the heavens and the [-earth. The-] {+earth, the+} earth was [-without
form, and void;-] {+a
formless void+} and darkness [-was on-] {+covered+} the face of the [-deep. And the Spirit of-] {+deep, while a wind from+}
God [-was hovering-] {+swept+} over the face of the waters. Then God said, "Let there be light";
and there was light. And God saw [-the light,-] that [-it-] {+the light+} was good; and God
[-divided-] {+separated+}
the light from the darkness. God called the light Day, and the darkness
[-He-] {+he+}
called Night. [-So the-] {+And there was+} evening and [-the morning were-] {+there was morning,+} the first day.
<!-- end -->
<p>Deleted words are surrounded by <code>[- -]</code> and inserted ones by <code>{+ +}</code>. You can
even ask it to spit out HTML tags for insertion and deletion&hellip;</p>
<pre><code>$ wdiff -w '&lt;del&gt;' -x '&lt;/del&gt;' -y '&lt;ins&gt;' -z '&lt;/ins&gt;' genesis_nkj genesis_nrsv
<p>&hellip;and come up with something your browser will render like this:</p>
<p>In the beginning <ins>when</ins> God created the heavens and the <del>earth. The</del> <ins>earth, the</ins> earth was <del>without
form, and void;</del> <ins>a
formless void</ins> and darkness <del>was on</del> <ins>covered</ins> the face of the <del>deep. And the Spirit of</del> <ins>deep, while a wind from</ins>
God <del>was hovering</del> <ins>swept</ins> over the face of the waters. Then God said, "Let there be light";
and there was light. And God saw <del>the light,</del> that <del>it</del> <ins>the light</ins> was good; and God
<del>divided</del> <ins>separated</ins>
the light from the darkness. God called the light Day, and the darkness
<del>He</del> <ins>he</ins>
called Night. <del>So the</del> <ins>And there was</ins> evening and <del>the morning were</del> <ins>there was morning,</ins> the first day.</p>
<p>Burton H. Throckmorton, Jr. this ain&rsquo;t. Still, it has its uses.</p>
<hr />
<h1><a name=the-command-line-as-as-a-shared-world href=#the-command-line-as-as-a-shared-world>#</a> 7. the command line as as a shared world</h1>
<p>In an earlier chapter, I wrote:</p>
<blockquote><p>You can think of the shell as a kind of environment you inhabit, in much
the way your character inhabits an adventure game.</p></blockquote>
<p>It turns out that sometimes there are other human inhabitants of this
<p>Unix was built on a model known as &ldquo;time-sharing&rdquo;. This is an idea with a lot
of history, but the very short version is that when computers were rare and
expensive, it made sense for lots of people to be able to use them at once.
This is part of the story of how ideas like e-mail and chat were originally
born, well before networks took over the world: As ways for the many users of
one computer to communicate on the same machine.</p>
<p>Says Dennis Ritchie:</p>
<blockquote><p>What we wanted to preserve was not just a good environment in which to do
programming, but a system around which a fellowship could form. We knew from
experience that the essence of communal computing, as supplied by
remote-access, time-shared machines, is not just to type programs into a
terminal instead of a keypunch, but to encourage close communication.</p></blockquote>
<p>Times have changed, and while it&rsquo;s mundane to use software that&rsquo;s shared
between many users, it&rsquo;s not nearly as common as it once was for a bunch of us
to be logged into the same computer all at once.</p>
<p style="text-align:center;"></p>
<p>In the mid 1990s, when I was first exposed to Unix, it was by opening up a
program called NCSA Telnet on one of the Macs at school and connecting to a
server called mother.esu1.k12.ne.us.</p>
<p>NCSA Telnet was a terminal, not unlike the kind that you use to open a shell on
your own Linux computer, a piece of software that itself emulated actual,
physical hardware from an earlier era. Hardware terminals were basically very
simple computers with keyboards, screens, and just enough networking brains to
talk to a <em>real</em> computer somewhere else. You&rsquo;ll still come across these
scattered around big institutional environments. The last time I looked over
the shoulder of an airline checkin desk clerk, for example, I saw green
monochrome text that was probably coming from an IBM mainframe somewhere
far away.</p>
<p>Part of what was exciting about being logged into a computer somewhere else
was that you could <em>talk to people</em>.</p>
<p style="text-align:center;"></p>
<p><em>{This chapter is a work in progress.}</em></p>
<hr />
<h1><a name=the-command-line-and-the-web href=#the-command-line-and-the-web>#</a> 8. the command line and the web</h1>
<p>Web browsers are really complicated these days. They&rsquo;re full of rendering
engines, audio and video players, programming languages, development tools,
databases &mdash; you name it, and there&rsquo;s a fair chance it&rsquo;s in there somewhere.
The modern web browser is kitchen sink software, and to make matters worse, it
is <em>totally surrounded</em> by technobabble. It can take <em>years</em> to come to terms
with the ocean of words about web stuff and sort out the meaningful ones from
the snake oil and bureaucratic mysticism.</p>
<p>All of which can make the web itself seem like a really complicated landscape,
and obscure the simplicity of its basic design, which is this:</p>
<p>Some programs pass text around to one another.</p>
<p>Which might sound familiar.</p>
<p>The gist of it is that the web is made out of URLs, &ldquo;Uniform Resource
Locators&rdquo;, which are paths to things. If you squint, these look kind of like
paths to files on your filesystem. When you visit a URL in your browser, it
asks a server for a certain path, and the server gives it back some text. When
you click a button to submit a form, your browser sends some text to the server
and waits to see what it says back. The text that gets passed around is
(usually) written in a language with particular significance to web browsers,
but if you look at it directly, it&rsquo;s a format that humans can understand.</p>
<p>Let&rsquo;s illustrate this. I&rsquo;ve written a really simple web page that lives at
<a href="http://p1k3.com/hello_world.html"><code>http://p1k3.com/hello_world.html</code></a>.</p>
<pre><code>$ curl 'https://p1k3.com/hello_world.html'
&lt;title&gt;hello, world&lt;/title&gt;
&lt;h1&gt;hi everybody&lt;/h1&gt;
&lt;p&gt;How are things?&lt;/p&gt;
<p><code>curl</code> is a program with lots and lots of features &mdash; it too is a little bit
of a kitchen sink &mdash; but it has one core purpose, which is to grab things from
URLs and spit them back out. It&rsquo;s a little bit like <code>cat</code> for things that live
on the web. Try the above command with just about any URL you can think of,
and you&rsquo;ll probably get <em>something</em> back. Let&rsquo;s try this book:</p>
<pre><code>$ curl 'https://p1k3.com/userland-book/' | head
&lt;!DOCTYPE html&gt;
&lt;html lang=en&gt;
&lt;meta charset="utf-8"&gt;
&lt;title&gt;userland: a book about the command line for humans&lt;/title&gt;
&lt;link rel=stylesheet href="userland.css" /&gt;
&lt;script src="js/jquery.js" type="text/javascript"&gt;&lt;/script&gt;
<p><code>hello_world.html</code> and <code>userland-book</code> are both written in HyperText Markup
Language. HTML is just text with a specific kind of structure. It&rsquo;s been
around for quite a while now, and has grown up a lot in 20 years, but at heart
it still looks a lot <a href="http://info.cern.ch/hypertext/WWW/TheProject.html">like it did in 1991</a>.</p>
<p>The basic idea is that the contents of a web page are marked up with tags.
A tag looks like this:</p>
<pre><code>&lt;title&gt;hi!&lt;/title&gt; -,
| | |
| `- content |
| `- closing tag
`-opening tag
<p>Sometimes you&rsquo;ll see tags with what are known as &ldquo;attributes&rdquo;:</p>
<pre><code>&lt;a href="https://p1k3.com/userland-book"&gt;userland&lt;/a&gt;
<p>This is how links are written in HTML. <code>href="..."</code> tells the browser where to
go when the user clicks on &ldquo;<a href="http://p1k3.com/userland-book">userland</a>&rdquo;.</p>
<p>Tags are a way to describe not so much what something <em>looks like</em> as what
something <em>means</em>. Browsers are, in large part, big collections of knowledge
about the meanings of tags and ways to represent those meanings.</p>
<p>While the browser you use day-to-day has (probably) a graphical interface and
does all sorts of things impossible to render in a terminal, some of the
earliest web browsers were entirely text-based, and text-mode browsers still
exist. Lynx, which originated at the University of Kansas in the early 1990s,
is still actively maintained:</p>
<pre><code>$ lynx -dump 'http://p1k3.com/userland-book/' | head
[1]# a book about the command line for humans
Late last year, [2]a side trip into text utilities got me thinking
about how much my writing habits depend on the Linux command line. This
struck me as a good hook for talking about the tools I use every day
with an audience of mixed technical background.
<p>If you invoke Lynx without any options, it&rsquo;ll start up in interactive mode, and
you can navigate between links with the arrow keys. <code>lynx -dump</code> spits a
rendered version of a page to standard output, with links annotated in square
brackets and printed as footnotes. Another useful option here is <code>-listonly</code>,
which will print just the list of links contained within a page:</p>
<pre><code>$ lynx -dump -listonly 'http://p1k3.com/userland-book/' | head
2. http://p1k3.com/2013/8/4
3. http://p1k3.com/userland-book.git
4. https://github.com/brennen/userland-book
5. http://p1k3.com/userland-book/
6. https://twitter.com/brennen
9. http://p1k3.com/userland-book/#a-book-about-the-command-line-for-humans
10. http://p1k3.com/userland-book/#copying
<p>An alternative to Lynx is w3m, which copes a little more gracefully with the
complexities of modern web layout.</p>
<pre><code>$ w3m -dump 'http://p1k3.com/userland-book/' | head
# a book about the command line for humans
Late last year, a side trip into text utilities got me thinking about how much
my writing habits depend on the Linux command line. This struck me as a good
hook for talking about the tools I use every day with an audience of mixed
technical background.
<p>Neither of these tools can easily replace enormously capable applications like
Chrome or Firefox, but they have their place in the toolbox, and help to
demonstrate how the web is built (in part) on principles we&rsquo;ve already seen at
<hr />
<h1><a name=a-miscellany-of-tools-and-techniques href=#a-miscellany-of-tools-and-techniques>#</a> 9. a miscellany of tools and techniques</h1>
<h2><a name=dict href=#dict>#</a> dict</h2>
<p>Want to know the definition of a word, or find useful synonyms?</p>
<pre><code>$ dict concatenate | head -10
4 definitions found
From The Collaborative International Dictionary of English v.0.48 [gcide]:
Concatenate \Con*cat"e*nate\ (k[o^]n*k[a^]t"[-e]*n[=a]t), v. t.
[imp. &amp; p. p. {Concatenated}; p. pr. &amp; vb. n.
{Concatenating}.] [L. concatenatus, p. p. of concatenare to
concatenate. See {Catenate}.]
To link together; to unite in a series or chain, as things
depending on one another.
<h2><a name=aspell href=#aspell>#</a> aspell</h2>
<p>Need to interactively spell-check your presentation notes?</p>
<pre><code>$ aspell check presentation
<p>Just want a list of potentially-misspelled words in a given file?</p>
<!-- exec -->
<pre><code>$ aspell list &lt; ../literary_environment/index.md | sort | uniq -ci | sort -nr | head -5
40 td
24 Veselka
17 Reuel
16 Brunner
15 Tiptree
<!-- end -->
<h2><a name=mostcommon href=#mostcommon>#</a> mostcommon</h2>
<p>Something like that last sequence sure does seem to show up a lot in my work:
Spit out the <em>n</em> most common lines in the input, one way or another. Here&rsquo;s
a little script to be less repetitive about it.</p>
<!-- exec -->
<pre><code>$ aspell list &lt; ../literary_environment/index.md | ./mostcommon -i -n5
40 td
24 Veselka
17 Reuel
16 Brunner
15 Tiptree
<!-- end -->
<p>This turns out to be pretty simple:</p>
<!-- exec -->
<pre><code>$ cat ./mostcommon
#!/usr/bin/env bash
# Optionally specify number of lines to show, defaulting to 10:
while getopts ":in:" opt; do
case $opt in
echo "Invalid option: -$OPTARG" &gt;&amp;2
exit 1
echo "Option -$OPTARG requires an argument." &gt;&amp;2
exit 1
# sort and then uniqify STDIN,
# sort numerically on the first field,
# chop off everything but $TOSHOW lines of input
sort &lt; /dev/stdin | uniq -c $CASEOPT | sort -k1 -nr | head -$TOSHOW
<!-- end -->
<p>Notice, though, that it doesn&rsquo;t handle opening files directly. If you wanted
to find the most common lines in a file with it, you&rsquo;d have to say something
like <code>mostcommon &lt; filename</code> in order to redirect the file to <code>mostcommon</code>&rsquo;s
<p>Also notice that most of the script is boilerplate for handling a couple of
options. The work is all done in a oneliner. Worth it? Maybe not, but an
interesting exercise.</p>
<h2><a name=cal-and-ncal href=