|
|
- <!DOCTYPE html>
- <html lang=en>
- <head>
- <meta charset="utf-8">
- <title>userland: a book about the command line for humans</title>
- <link rel=stylesheet href="userland.css" />
- <link rel="alternate" type="application/atom+xml" title="changes" href="//p1k3.com/userland-book/feed.xml" />
- <script src="js/jquery.js" type="text/javascript"></script>
- </head>
-
- <body>
-
- <h1 class=bigtitle>userland</h1>
- <hr />
-
- <h1><a name=a-book-about-the-command-line-for-humans href=#a-book-about-the-command-line-for-humans>#</a> a book about the command line for humans</h1>
-
- <p>In the fall of 2013, <a href="//p1k3.com/2013/8/4">thinking about</a> text utilities got
- me thinking in turn about how my writing habits depend on the Linux command
- line. This seems like a good hook for explaining some tools I use every day,
- so now I’m writing a short, haphazard book.</p>
-
- <p>This isn’t a book about system administration, writing complex software, or
- becoming a wizard. I am not a wizard, and I don’t subscribe to the idea that
- wizardry is required to use these tools. In fact, I barely know what I’m doing
- most of the time. I still get some stuff done.</p>
-
- <p>This is a work in progress. It probably gets some stuff wrong.</p>
-
- <p>– bpb / <a href="https://p1k3.com">p1k3</a> / <a href="https://twitter.com/brennen">@brennen</a></p>
-
- <div class=details>
- <h2 class=clicker><a name=contents href=#contents>#</a> contents</h2>
- <div class=full>
- <div class=contents><ul>
- <li><a href="#a-book-about-the-command-line-for-humans">a book about the command line for humans</a>
-
- <ul>
- <li><a href="#contents">contents</a></li>
- </ul>
- </li>
- <li><a href="#get-you-a-shell">0. get you a shell</a>
-
- <ul>
- <li><a href="#get-an-account-on-a-social-unix-server">get an account on a social unix server</a></li>
- <li><a href="#use-a-raspberry-pi-or-beaglebone">use a raspberry pi or beaglebone</a></li>
- <li><a href="#use-a-virtual-machine">use a virtual machine</a></li>
- </ul>
- </li>
- <li><a href="#the-command-line-as-literary-environment">1. the command line as literary environment</a>
-
- <ul>
- <li><a href="#terms-and-definitions">terms and definitions</a></li>
- <li><a href="#twisty-little-passages">twisty little passages</a></li>
- <li><a href="#cat">cat</a></li>
- <li><a href="#wildcards">wildcards</a></li>
- <li><a href="#sort">sort</a></li>
- <li><a href="#options">options</a></li>
- <li><a href="#uniq">uniq</a></li>
- <li><a href="#standard-IO">standard IO</a></li>
- <li><a href="#code-help-code-and-man-pages"><code>–help</code> and man pages</a></li>
- <li><a href="#wc">wc</a></li>
- <li><a href="#head-tail-and-cut">head, tail, and cut</a></li>
- <li><a href="#tab-separated-values">tab separated values</a></li>
- <li><a href="#finding-text-grep">finding text: grep</a></li>
- <li><a href="#now-you-have-n-problems">now you have n problems</a></li>
- </ul>
- </li>
- <li><a href="#a-literary-problem">2. a literary problem</a></li>
- <li><a href="#programmerthink">3. programmerthink</a></li>
- <li><a href="#script">4. script</a>
-
- <ul>
- <li><a href="#learn-you-an-editor">learn you an editor</a></li>
- <li><a href="#d-i-y-utilities">d.i.y. utilities</a></li>
- <li><a href="#heavy-lifting">heavy lifting</a></li>
- <li><a href="#generality">generality</a></li>
- </ul>
- </li>
- <li><a href="#general-purpose-programmering">5. general purpose programmering</a></li>
- <li><a href="#one-of-these-things-is-not-like-the-others">6. one of these things is not like the others</a>
-
- <ul>
- <li><a href="#diff">diff</a></li>
- <li><a href="#wdiff">wdiff</a></li>
- </ul>
- </li>
- <li><a href="#the-command-line-as-as-a-shared-world">7. the command line as as a shared world</a></li>
- <li><a href="#the-command-line-and-the-web">8. the command line and the web</a></li>
- <li><a href="#a-miscellany-of-tools-and-techniques">9. a miscellany of tools and techniques</a>
-
- <ul>
- <li><a href="#dict">dict</a></li>
- <li><a href="#aspell">aspell</a></li>
- <li><a href="#mostcommon">mostcommon</a></li>
- <li><a href="#cal-and-ncal">cal and ncal</a></li>
- <li><a href="#seq">seq</a></li>
- <li><a href="#shuf">shuf</a></li>
- <li><a href="#ptx">ptx</a></li>
- <li><a href="#figlet">figlet</a></li>
- <li><a href="#cowsay">cowsay</a></li>
- </ul>
- </li>
- <li><a href="#endmatter">endmatter</a>
-
- <ul>
- <li><a href="#further-reading">further reading</a></li>
- <li><a href="#code">code</a></li>
- <li><a href="#copying">copying</a></li>
- </ul>
- </li>
- </ul>
-
- </div>
- </div>
- </div>
-
-
- <hr />
-
- <h1><a name=get-you-a-shell href=#get-you-a-shell>#</a> 0. get you a shell</h1>
-
- <p>You don’t have to have a shell at hand to get something out of this book.
- Still, as with most practical subjects, you’ll learn more if you try things out
- as you go. You shouldn’t feel guilty about skipping this section. It will
- always be here later if you need it.</p>
-
- <p>Not so long ago, it was common for schools and ISPs to hand out shell accounts
- on big shared systems. People learned the command line as a side effect of
- reading their e-mail.</p>
-
- <p>That doesn’t happen as often now, but in the meanwhile computers have become
- relatively cheap and free software is abundant. If you’re reading this on the
- web, you can probably get access to a shell. Some options follow.</p>
-
- <h2><a name=get-an-account-on-a-social-unix-server href=#get-an-account-on-a-social-unix-server>#</a> get an account on a social unix server</h2>
-
- <p>Check out <a href="https://tilde.town/">tilde.town</a>:</p>
-
- <blockquote><p>tilde.town is an intentional digital community for making art, socializing, and
- learning. Unlike many online spaces, users interact with tilde.town through a
- direct connection instead of a web site. This means using a tool called ssh and
- other text based tools.</p></blockquote>
-
- <h2><a name=use-a-raspberry-pi-or-beaglebone href=#use-a-raspberry-pi-or-beaglebone>#</a> use a raspberry pi or beaglebone</h2>
-
- <p>Do you have a single-board computer laying around? Perfect. If you already
- run the standard Raspbian, Debian on a BeagleBone, or a similar-enough Linux,
- you don’t need much else. I wrote most of this text on a Raspberry Pi, and the
- example commands should all work there.</p>
-
- <h2><a name=use-a-virtual-machine href=#use-a-virtual-machine>#</a> use a virtual machine</h2>
-
- <p>A few options:</p>
-
- <ul>
- <li><a href="https://docs.vagrantup.com/v2/getting-started/index.html">Use Vagrant to spin up a machine in Virtualbox</a></li>
- <li><a href="https://www.digitalocean.com/community/tutorials/how-to-create-your-first-digitalocean-droplet-virtual-server">Use DigitalOcean to create a remotely-hosted VM running Linux</a></li>
- </ul>
-
-
- <hr />
-
- <h1><a name=the-command-line-as-literary-environment href=#the-command-line-as-literary-environment>#</a> 1. the command line as literary environment</h1>
-
- <p>There’re a lot of ways to structure an introduction to the command line. I’m
- going to start with writing as a point of departure because, aside from web
- development, it’s what I use a computer for most. I want to shine a light on
- the humane potential of ideas that are usually understood as nerd trivia.
- Computers have utterly transformed the practice of writing within the space of
- my lifetime, but it seems to me that writers as a class miss out on many of the
- software tools and patterns taken as a given in more “technical” fields.</p>
-
- <p>Writing, particularly writing of any real scope or complexity, is very much a
- technical task. It makes demands, both physical and psychological, of its
- practitioners. As with woodworkers, graphic artists, and farmers, writers
- exhibit strong preferences in their tools, materials, and environment, and they
- do so because they’re engaged in a physically and cognitively challenging task.</p>
-
- <p>My thesis is that the modern Linux command line is a pretty good environment
- for working with English prose and prosody, and that maybe this will illuminate
- the ways it could be useful in your own work with a computer, whatever that
- work happens to be.</p>
-
- <h2><a name=terms-and-definitions href=#terms-and-definitions>#</a> terms and definitions</h2>
-
- <p>What software are we actually talking about when we say “the command line”?</p>
-
- <p>For the purposes of this discussion, we’re talking about an environment built
- on a very old paradigm called Unix.</p>
-
- <p style="text-align:center;"> <img src="images/jp_unix.jpg" height=320 width=470></p>
-
- <p>…except what classical Unix really looks like is this:</p>
-
- <p style="text-align:center;"> <img src="images/blinking.gif" width=470></p>
-
- <p>The Unix-like environment we’re going to use isn’t very classical, really.
- It’s an operating system kernel called Linux, combined with a bunch of things
- written by other people (people in the GNU and Debian projects, and many
- others). Purists will tell you that this isn’t properly Unix at all. In
- strict historical terms they’re right, or at least a certain kind of right, but
- for the purposes of my cultural agenda I’m going to ignore them right now.</p>
-
- <p style="text-align:center;"> <img src="images/debian.png"></p>
-
- <p>This is what’s called a shell. There are many different shells, but they
- pretty much all operate on the same idea: You navigate a filesystem and run
- programs by typing commands. Commands can be combined in various ways to make
- programs of their own, and in fact the way you use the computer is often just
- to write little programs that invoke other programs, turtles-all-the-way-down
- style.</p>
-
- <p>The standard shell these days is something called Bash, so we’ll use Bash.
- It’s what you’ll most often see in the wild. Like most shells, Bash is ugly
- and stupid in more ways than it is possible to easily summarize. It’s also an
- incredibly powerful and expressive piece of software.</p>
-
- <h2><a name=twisty-little-passages href=#twisty-little-passages>#</a> twisty little passages</h2>
-
- <p>Have you ever played a text-based adventure game or MUD, of the kind that
- describes a setting and takes commands for movement and so on? Readers of a
- certain age and temperament might recognize the opening of Crowther & Woods'
- <em>Adventure</em>, the great-granddaddy of text adventure games:</p>
-
- <pre><code>YOU ARE STANDING AT THE END OF A ROAD BEFORE A SMALL BRICK BUILDING.
- AROUND YOU IS A FOREST. A SMALL STREAM FLOWS OUT OF THE BUILDING ANd
- DOWN A GULLY.
-
- > GO EAST
-
- YOU ARE INSIDE A BUILDING, A WELL HOUSE FOR A LARGE SPRING.
-
- THERE ARE SOME KEYS ON THE GROUND HERE.
-
- THERE IS A SHINY BRASS LAMP NEARBY.
-
- THERE IS FOOD HERE.
-
- THERE IS A BOTTLE OF WATER HERE.
- </code></pre>
-
- <p>You can think of the shell as a kind of environment you inhabit, in much the
- way your character inhabits an adventure game. The difference is that instead
- of navigating around virtual rooms and hallways with commands like <code>LOOK</code> and
- <code>EAST</code>, you navigate between directories by typing commands like <code>ls</code> and <code>cd
- notes</code>:</p>
-
- <pre><code>$ ls
- code Downloads notes p1k3 photos scraps userland-book
- $ cd notes
- $ ls
- notes.txt sparkfun TODO.txt
- </code></pre>
-
- <p><code>ls</code> lists files. Some files are directories, which means they can contain
- other files, and you can step inside of them by typing <code>cd</code> (for <strong>c</strong>hange
- <strong>d</strong>irectory).</p>
-
- <p>In the Macintosh and Windows world, directories have been called
- “folders” for a long time now. This isn’t the <em>worst</em> metaphor for what’s
- going on, and it’s so pervasive by now that it’s not worth fighting about.
- It’s also not exactly a <em>great</em> metaphor, since computer filesystems aren’t
- built very much like the filing cabinets of yore. A directory acts a lot like
- a container of some sort, but it’s an infinitely expandable one which may
- contain nested sub-spaces much larger than itself. Directories are frequently
- like the TARDIS: Bigger on the inside.</p>
-
- <h2><a name=cat href=#cat>#</a> cat</h2>
-
- <p>When you’re in the shell, you have many tools at your disposal - programs that
- can be used on many different files, or chained together with other programs.
- They tend to have weird, cryptic names, but a lot of them do very simple
- things. Tasks that might be a menu item in a big program like Word, like
- counting the number of words in a document or finding a particular phrase, are
- often programs unto themselves. We’ll start with something even more basic
- than that.</p>
-
- <p>Suppose you have some files, and you’re curious what’s in them. For example,
- suppose you’ve got a list of authors you’re planning to reference, and you just
- want to check its contents real quick-like. This is where our friend <code>cat</code>
- comes in:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat authors_sff
- Ursula K. Le Guin
- Jo Walton
- Pat Cadigan
- John Ronald Reuel Tolkien
- Vanessa Veselka
- James Tiptree, Jr.
- John Brunner
- </code></pre>
-
- <!-- end -->
-
-
- <p>“Why,” you might be asking, “is the command to dump out the contents of a file
- to a screen called <code>cat</code>? What do felines have to do with anything?”</p>
-
- <p>It turns out that <code>cat</code> is actually short for “catenate”, which is a long
- word basically meaning “stick things together”. In programming, we usually
- refer to sticking two bits of text together as “string concatenation”, probably
- because programmers like to feel like they’re being very precise about very
- simple actions.</p>
-
- <p>Suppose you wanted to see the contents of a <em>set</em> of author lists:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat authors_sff authors_contemporary_fic authors_nat_hist
- Ursula K. Le Guin
- Jo Walton
- Pat Cadigan
- John Ronald Reuel Tolkien
- Vanessa Veselka
- James Tiptree, Jr.
- John Brunner
- Eden Robinson
- Vanessa Veselka
- Miriam Toews
- Gwendolyn L. Waring
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=wildcards href=#wildcards>#</a> wildcards</h2>
-
- <p>We’re working with three filenames: <code>authors_sff</code>, <code>authors_contemporary_fic</code>,
- and <code>authors_nat_hist</code>. That’s an awful lot of typing every time we want to do
- something to all three files. Fortunately, our shell offers a shorthand for
- “all the files that start with <code>authors_</code>”:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat authors_*
- Eden Robinson
- Vanessa Veselka
- Miriam Toews
- Gwendolyn L. Waring
- Ursula K. Le Guin
- Jo Walton
- Pat Cadigan
- John Ronald Reuel Tolkien
- Vanessa Veselka
- James Tiptree, Jr.
- John Brunner
- </code></pre>
-
- <!-- end -->
-
-
- <p>In Bash-land, <code>*</code> basically means “anything”, and is known in the vernacular,
- somewhat poetically, as a “wildcard”. You should always be careful with
- wildcards, especially if you’re doing anything destructive. They can and will
- surprise the unwary. Still, once you’re used to the idea, they will save you a
- lot of RSI.</p>
-
- <h2><a name=sort href=#sort>#</a> sort</h2>
-
- <p>There’s a problem here. Our author list is out of order, and thus confusing to
- reference. Fortunately, since one of the most basic things you can do to a
- list is to sort it, someone else has already solved this problem for us.
- Here’s a command that will give us some organization:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort authors_*
- Eden Robinson
- Gwendolyn L. Waring
- James Tiptree, Jr.
- John Brunner
- John Ronald Reuel Tolkien
- Jo Walton
- Miriam Toews
- Pat Cadigan
- Ursula K. Le Guin
- Vanessa Veselka
- Vanessa Veselka
- </code></pre>
-
- <!-- end -->
-
-
- <p>Does it bother you that they aren’t sorted by last name? Me too. As a partial
- solution, we can ask <code>sort</code> to use the second “field” in each line as its sort
- <strong>k</strong>ey (by default, sort treats whitespace as a division between fields):</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort -k2 authors_*
- John Brunner
- Pat Cadigan
- Ursula K. Le Guin
- Gwendolyn L. Waring
- Eden Robinson
- John Ronald Reuel Tolkien
- James Tiptree, Jr.
- Miriam Toews
- Vanessa Veselka
- Vanessa Veselka
- Jo Walton
- </code></pre>
-
- <!-- end -->
-
-
- <p>That’s closer, right? It sorted on “Cadigan” and “Veselka” instead of “Pat”
- and “Vanessa”. (Of course, it’s still far from perfect, because the
- second field in each line isn’t necessarily the person’s last name.)</p>
-
- <h2><a name=options href=#options>#</a> options</h2>
-
- <p>Above, when we wanted to ask <code>sort</code> to behave differently, we gave it what is
- known as an option. Most programs with command-line interfaces will allow
- their behavior to be changed by adding various options. Options usually
- (but not always!) look like <code>-o</code> or <code>--option</code>.</p>
-
- <p>For example, if we wanted to see just the unique lines, irrespective of case,
- for a file called colors:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat colors
- RED
- blue
- red
- BLUE
- Green
- green
- GREEN
- </code></pre>
-
- <!-- end -->
-
-
- <p>We could write this:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort -uf colors
- blue
- Green
- RED
- </code></pre>
-
- <!-- end -->
-
-
- <p>Here <code>-u</code> stands for <strong>u</strong>nique and <code>-f</code> stands for <strong>f</strong>old case, which means
- to treat upper- and lower-case letters as the same for comparison purposes. You’ll
- often see a group of short options following the <code>-</code> like this.</p>
-
- <h2><a name=uniq href=#uniq>#</a> uniq</h2>
-
- <p>Did you notice how Vanessa Veselka shows up twice in our list of authors?
- That’s useful if we want to remember that she’s in more than one category, but
- it’s redundant if we’re just worried about membership in the overall set of
- authors. We can make sure our list doesn’t contain repeating lines by using
- <code>sort</code>, just like with that list of colors:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort -u -k2 authors_*
- John Brunner
- Pat Cadigan
- Ursula K. Le Guin
- Gwendolyn L. Waring
- Eden Robinson
- John Ronald Reuel Tolkien
- James Tiptree, Jr.
- Miriam Toews
- Vanessa Veselka
- Jo Walton
- </code></pre>
-
- <!-- end -->
-
-
- <p>But there’s another approach to this — <code>sort</code> is good at only displaying a line
- once, but suppose we wanted to see a count of how many different lists an
- author shows up on? <code>sort</code> doesn’t do that, but a command called <code>uniq</code> does,
- if you give it the option <code>-c</code> for <strong>c</strong>ount.</p>
-
- <p><code>uniq</code> moves through the lines in its input, and if it sees a line more than
- once in sequence, it will only print that line once. If you have a bunch of
- files and you just want to see the unique lines across all of those files, you
- probably need to run them through <code>sort</code> first. How do you do that?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort authors_* | uniq -c
- 1 Eden Robinson
- 1 Gwendolyn L. Waring
- 1 James Tiptree, Jr.
- 1 John Brunner
- 1 John Ronald Reuel Tolkien
- 1 Jo Walton
- 1 Miriam Toews
- 1 Pat Cadigan
- 1 Ursula K. Le Guin
- 2 Vanessa Veselka
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=standard-IO href=#standard-IO>#</a> standard IO</h2>
-
- <p>The <code>|</code> is called a “pipe”. In the command above, it tells your shell that
- instead of printing the output of <code>sort authors_*</code> right to your terminal, it
- should send it to <code>uniq -c</code>.</p>
-
- <p style="text-align:center;"> <img src="images/pipe.gif"></p>
-
- <p>Pipes are some of the most important magic in the shell. When the people who
- built Unix in the first place give interviews about the stuff they remember
- from the early days, a lot of them reminisce about the invention of pipes and
- all of the new stuff it immediately made possible.</p>
-
- <p>Pipes help you control a thing called “standard IO”. In the world of the
- command line, programs take <strong>i</strong>nput and produce <strong>o</strong>utput. A pipe is a way
- to hook the output from one program to the input of another.</p>
-
- <p>Unlike a lot of the weirdly named things you’ll encounter in software, the
- metaphor here is obvious and makes pretty good sense. It even kind of looks
- like a physical pipe.</p>
-
- <p>What if, instead of sending the output of one program to the input of another,
- you’d like to store it in a file for later use?</p>
-
- <p>Check it out:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort authors_* | uniq > ./all_authors
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ cat all_authors
- Eden Robinson
- Gwendolyn L. Waring
- James Tiptree, Jr.
- John Brunner
- John Ronald Reuel Tolkien
- Jo Walton
- Miriam Toews
- Pat Cadigan
- Ursula K. Le Guin
- Vanessa Veselka
- </code></pre>
-
- <!-- end -->
-
-
- <p>I like to think of the <code>></code> as looking like a little funnel. It can be
- dangerous — you should always make sure that you’re not going to clobber
- an existing file you actually want to keep.</p>
-
- <p>If you want to tack more stuff on to the end of an existing file, you can use
- <code>>></code> instead. To test that, let’s use <code>echo</code>, which prints out whatever string
- you give it on a line by itself:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ echo 'hello' > hello_world
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ echo 'world' >> hello_world
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ cat hello_world
- hello
- world
- </code></pre>
-
- <!-- end -->
-
-
- <p>You can also take a file and pull it directly back into the input of a given
- program, which is a bit like a funnel going the other direction:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ nl < all_authors
- 1 Eden Robinson
- 2 Gwendolyn L. Waring
- 3 James Tiptree, Jr.
- 4 John Brunner
- 5 John Ronald Reuel Tolkien
- 6 Jo Walton
- 7 Miriam Toews
- 8 Pat Cadigan
- 9 Ursula K. Le Guin
- 10 Vanessa Veselka
- </code></pre>
-
- <!-- end -->
-
-
- <p><code>nl</code> is just a way to <strong>n</strong>umber <strong>l</strong>ines. This command accomplishes pretty much
- the same thing as <code>cat all_authors | nl</code>, or <code>nl all_authors</code>. You won’t see
- it used as often as <code>|</code> and <code>></code>, since most utilities can read files on their
- own, but it can save you typing <code>cat</code> quite as often.</p>
-
- <p>We’ll use these features liberally from here on out.</p>
-
- <h2><a name=code-help-code-and-man-pages href=#code-help-code-and-man-pages>#</a> <code>--help</code> and man pages</h2>
-
- <p>You can change the behavior of most tools by giving them different options.
- This is all well and good if you already know what options are available,
- but what if you don’t?</p>
-
- <p>Often, you can ask the tool itself:</p>
-
- <pre><code>$ sort --help
- Usage: sort [OPTION]... [FILE]...
- or: sort [OPTION]... --files0-from=F
- Write sorted concatenation of all FILE(s) to standard output.
-
- Mandatory arguments to long options are mandatory for short options too.
- Ordering options:
-
- -b, --ignore-leading-blanks ignore leading blanks
- -d, --dictionary-order consider only blanks and alphanumeric characters
- -f, --ignore-case fold lower case to upper case characters
- -g, --general-numeric-sort compare according to general numerical value
- -i, --ignore-nonprinting consider only printable characters
- -M, --month-sort compare (unknown) < 'JAN' < ... < 'DEC'
- -h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)
- -n, --numeric-sort compare according to string numerical value
- -R, --random-sort sort by random hash of keys
- --random-source=FILE get random bytes from FILE
- -r, --reverse reverse the result of comparisons
- </code></pre>
-
- <p>…and so on. (It goes on for a while in this vein.)</p>
-
- <p>If that doesn’t work, or doesn’t provide enough info, the next thing to try is
- called a man page. (“man” is short for “manual”. It’s sort of an unfortunate
- abbreviation.)</p>
-
- <pre><code>$ man sort
-
- SORT(1) User Commands SORT(1)
-
-
-
- NAME
- sort - sort lines of text files
-
- SYNOPSIS
- sort [OPTION]... [FILE]...
- sort [OPTION]... --files0-from=F
-
- DESCRIPTION
- Write sorted concatenation of all FILE(s) to standard output.
- </code></pre>
-
- <p>…and so on. Manual pages vary in quality, and it can take a while to get
- used to reading them, but they’re very often the best place to look for help.</p>
-
- <p>If you’re not sure what <em>program</em> you want to use to solve a given problem, you
- might try searching all the man pages on the system for a keyword. <code>man</code>
- itself has an option to let you do this - <code>man -k keyword</code> - but most systems
- also have a shortcut called <code>apropos</code>, which I like to use because it’s easy to
- remember if you imagine yourself saying “apropos of [some problem I have]…”</p>
-
- <!-- exec -->
-
-
- <pre><code>$ apropos -s1 sort
- apt-sortpkgs (1) - Utility to sort package index files
- bunzip2 (1) - a block-sorting file compressor, v1.0.6
- bzip2 (1) - a block-sorting file compressor, v1.0.6
- comm (1) - compare two sorted files line by line
- sort (1) - sort lines of text files
- tsort (1) - perform topological sort
- </code></pre>
-
- <!-- end -->
-
-
- <p>It’s useful to know that the manual represented by <code>man</code> has numbered sections
- for different kinds of manual pages. Most of what the average user needs to
- know about lives in section 1, “User Commands”, so you’ll often see the names
- of different tools written like <code>sort(1)</code> or <code>cat(1)</code>. This can be a good way
- to make it clear in writing that you’re talking about a specific piece of
- software rather than a verb or a small carnivorous mammal. (I specified <code>-s1</code>
- for section 1 above just to cut down on clutter, though in practice I usually
- don’t bother.)</p>
-
- <p>Like other literary traditions, Unix is littered with this sort of convention.
- This one just happens to date from a time when the manual was still a physical
- book.</p>
-
- <h2><a name=wc href=#wc>#</a> wc</h2>
-
- <p><code>wc</code> stands for <strong>w</strong>ord <strong>c</strong>ount. It does about what you’d expect - it
- counts the number of words in its input.</p>
-
- <pre><code>$ wc index.md
- 736 4117 24944 index.md
- </code></pre>
-
- <p>736 is the number of lines, 4117 the number of words, and 24944 the number of
- characters in the file I’m writing right now. I use this constantly. Most
- obviously, it’s a good way to get an idea of how much you’ve written. <code>wc</code> is
- the tool I used to track my progress the last time I tried National Novel
- Writing Month:</p>
-
- <pre><code>$ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | tail -1
- 6585 total
- </code></pre>
-
- <!-- exec -->
-
-
- <pre><code>$ cowsay 'embarrassing.'
- _______________
- < embarrassing. >
- ---------------
- \ ^__^
- \ (oo)\_______
- (__)\ )\/\
- ||----w |
- || ||
- </code></pre>
-
- <!-- end -->
-
-
- <p>Anyway. The less obvious thing about <code>wc</code> is that you can use it to count the
- output of other commands. Want to know <em>how many</em> unique authors we have?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort authors_* | uniq | wc -l
- 10
- </code></pre>
-
- <!-- end -->
-
-
- <p>This kind of thing is trivial, but it comes in handy more often than you might
- think.</p>
-
- <h2><a name=head-tail-and-cut href=#head-tail-and-cut>#</a> head, tail, and cut</h2>
-
- <p>Remember our old pal <code>cat</code>, which just splats everything it’s given back to
- standard output?</p>
-
- <p>Sometimes you’ve got a piece of output that’s more than you actually want to
- deal with at once. Maybe you just want to glance at the first few lines in a
- file:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ head -3 colors
- RED
- blue
- red
- </code></pre>
-
- <!-- end -->
-
-
- <p>…or maybe you want to see the last thing in a list:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort colors | uniq -i | tail -1
- red
- </code></pre>
-
- <!-- end -->
-
-
- <p>…or maybe you’re only interested in the first “field” in some list. You might
- use <code>cut</code> here, asking it to treat spaces as delimiters between fields and
- return only the first field for each line of its input:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cut -d' ' -f1 ./authors_*
- Eden
- Vanessa
- Miriam
- Gwendolyn
- Ursula
- Jo
- Pat
- John
- Vanessa
- James
- John
- </code></pre>
-
- <!-- end -->
-
-
- <p>Suppose we’re curious what the few most commonly occurring first names on our
- author list are? Here’s an approach, silly but effective, that combines a lot
- of what we’ve discussed so far and looks like plenty of one-liners I wind up
- writing in real life:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
- 1 Ursula
- 2 John
- 2 Vanessa
- </code></pre>
-
- <!-- end -->
-
-
- <p>Let’s walk through this one step by step:</p>
-
- <p>First, we have <code>cut</code> extract the first field of each line in our author lists.</p>
-
- <pre><code>cut -d' ' -f1 ./authors_*
- </code></pre>
-
- <p>Then we sort these results</p>
-
- <pre><code>| sort
- </code></pre>
-
- <p>and pass them to <code>uniq</code>, asking it for a case-insensitive count of each
- repeated line</p>
-
- <pre><code>| uniq -ci
- </code></pre>
-
- <p>then sort again, numerically,</p>
-
- <pre><code>| sort -n
- </code></pre>
-
- <p>and finally, we chop off everything but the last three lines:</p>
-
- <pre><code>| tail -3
- </code></pre>
-
- <p>If you wanted to make sure to count an individual author’s first name
- only once, even if that author appears more than once in the files,
- you could instead do:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
- 1 Ursula
- 1 Vanessa
- 2 John
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=tab-separated-values href=#tab-separated-values>#</a> tab separated values</h2>
-
- <p>Notice above how we had to tell <code>cut</code> that “fields” in <code>authors_*</code> are
- delimited by spaces? It turns out that if you don’t use <code>-d</code>, <code>cut</code> defaults
- to using tab characters for a delimiter.</p>
-
- <p>Tab characters are sort of weird little animals. You can’t usually <em>see</em> them
- directly — they’re like a space character that takes up more than one space
- when displayed. By convention, one tab is usually rendered as 8 spaces, but
- it’s up to the software that’s displaying the character what it wants to do.</p>
-
- <p>(In fact, it’s more complicated than that: Tabs are often rendered as marking
- <em>tab stops</em>, which is a concept I remember from 7th grade typing classes, but
- haven’t actually thought about in my day-to-day life for nearly 20 years.)</p>
-
- <p>Here’s a version of our <code>all_authors</code> that’s been rearranged so that the first
- field is the author’s last name, the second is their first name, the third is
- their middle name or initial (if we know it) and the fourth is any suffix.
- Fields are separated by a single tab character:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat all_authors.tsv
- Robinson Eden
- Waring Gwendolyn L.
- Tiptree James Jr.
- Brunner John
- Tolkien John Ronald Reuel
- Walton Jo
- Toews Miriam
- Cadigan Pat
- Le Guin Ursula K.
- Veselka Vanessa
- </code></pre>
-
- <!-- end -->
-
-
- <p>That looks kind of garbled, right? In order to make it a little more obvious
- what’s happening, let’s use <code>cat -T</code>, which displays tab characters as <code>^I</code>:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat -T all_authors.tsv
- Robinson^IEden
- Waring^IGwendolyn^IL.
- Tiptree^IJames^I^IJr.
- Brunner^IJohn
- Tolkien^IJohn^IRonald Reuel
- Walton^IJo
- Toews^IMiriam
- Cadigan^IPat
- Le Guin^IUrsula^IK.
- Veselka^IVanessa
- </code></pre>
-
- <!-- end -->
-
-
- <p>It looks odd when displayed because some names are at or nearly at 8 characters long.
- “Robinson”, at 8 characters, overshoots the first tab stop, so “Eden” gets indented
- further than other first names, and so on.</p>
-
- <p>Fortunately, in order to make this more human-readable, we can pass it through
- <code>expand</code>, which turns tabs into a given number of spaces (8 by default):</p>
-
- <!-- exec -->
-
-
- <pre><code>$ expand -t14 all_authors.tsv
- Robinson Eden
- Waring Gwendolyn L.
- Tiptree James Jr.
- Brunner John
- Tolkien John Ronald Reuel
- Walton Jo
- Toews Miriam
- Cadigan Pat
- Le Guin Ursula K.
- Veselka Vanessa
- </code></pre>
-
- <!-- end -->
-
-
- <p>Now it’s easy to sort by last name:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ sort -k1 all_authors.tsv | expand -t14
- Brunner John
- Cadigan Pat
- Le Guin Ursula K.
- Robinson Eden
- Tiptree James Jr.
- Toews Miriam
- Tolkien John Ronald Reuel
- Veselka Vanessa
- Walton Jo
- Waring Gwendolyn L.
- </code></pre>
-
- <!-- end -->
-
-
- <p>Or just extract middle names and initials:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cut -f3 all_authors.tsv
-
- L.
-
-
- Ronald Reuel
-
-
-
- K.
- </code></pre>
-
- <!-- end -->
-
-
- <p>It probably won’t surprise you to learn that there’s a corresponding <code>paste</code>
- command, which takes two or more files and stitches them together with tab
- characters. Let’s extract a couple of things from our author list and put them
- back together in a different order:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cut -f1 all_authors.tsv > lastnames
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ cut -f2 all_authors.tsv > firstnames
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ paste firstnames lastnames | sort -k2 | expand -t12
- John Brunner
- Pat Cadigan
- Ursula Le Guin
- Eden Robinson
- James Tiptree
- Miriam Toews
- John Tolkien
- Vanessa Veselka
- Jo Walton
- Gwendolyn Waring
- </code></pre>
-
- <!-- end -->
-
-
- <p>As these examples show, TSV is something very like a primitive spreadsheet: A
- way to represent information in columns and rows. In fact, it’s a close cousin
- of CSV, which is often used as a lowest-common-denominator format for
- transferring spreadsheets, and which represents data something like this:</p>
-
- <pre><code>last,first,middle,suffix
- Tolkien,John,Ronald Reuel,
- Tiptree,James,,Jr.
- </code></pre>
-
- <p>The advantage of tabs is that they’re supported by a bunch of the standard
- tools. A disadvantage is that they’re kind of ugly and can be weird to deal
- with, but they’re useful anyway, and character-delimited rows are often a
- good-enough way to hack your way through problems that call for basic
- structure.</p>
-
- <h2><a name=finding-text-grep href=#finding-text-grep>#</a> finding text: grep</h2>
-
- <p>After all those contortions, what if you actually just want to see <em>which lists</em>
- an individual author appears on?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ grep 'Vanessa' ./authors_*
- ./authors_contemporary_fic:Vanessa Veselka
- ./authors_sff:Vanessa Veselka
- </code></pre>
-
- <!-- end -->
-
-
- <p><code>grep</code> takes a string to search for and, optionally, a list of files to search
- in. If you don’t specify files, it’ll look through standard input instead:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat ./authors_* | grep 'Vanessa'
- Vanessa Veselka
- Vanessa Veselka
- </code></pre>
-
- <!-- end -->
-
-
- <p>Most of the time, piping the output of <code>cat</code> to <code>grep</code> is considered silly,
- because <code>grep</code> knows how to find things in files on its own. Many thousands of
- words have been written on this topic by leading lights of the nerd community.</p>
-
- <p>You’ve probably noticed that this result doesn’t contain filenames (and thus
- isn’t very useful to us). That’s because all <code>grep</code> saw was the lines in the
- files, not the names of the files themselves.</p>
-
- <h2><a name=now-you-have-n-problems href=#now-you-have-n-problems>#</a> now you have n problems</h2>
-
- <p>To close out this introductory chapter, let’s spend a little time on a topic
- that will likely vex, confound, and (occasionally) delight you for as long as
- you are acquainted with the command line.</p>
-
- <p>When I was talking about <code>grep</code> a moment ago, I fudged the details more than a
- little by saying that it expects a string to search for. What <code>grep</code>
- <em>actually</em> expects is a <em>pattern</em>. Moreover, it expects a specific kind of
- pattern, what’s known as a <em>regular expression</em>, a cumbersome phrase frequently
- shortened to regex.</p>
-
- <p>There’s a lot of theory about what makes up a regular expression. Fortunately,
- very little of it matters to the short version that will let you get useful
- stuff done. The short version is that a regex is like using wildcards in the
- shell to match groups of files, but for text in general and with more magic.</p>
-
- <!-- exec -->
-
-
- <pre><code>$ grep 'Jo.*' ./authors_*
- ./authors_sff:Jo Walton
- ./authors_sff:John Ronald Reuel Tolkien
- ./authors_sff:John Brunner
- </code></pre>
-
- <!-- end -->
-
-
- <p>The pattern <code>Jo.*</code> says that we’re looking for lines which contain a literal
- <code>Jo</code>, followed by any quantity (including none) of any character. In a regex,
- <code>.</code> means “anything” and <code>*</code> means “any amount of the preceding thing”.</p>
-
- <p><code>.</code> and <code>*</code> are magical. In the particular dialect of regexen understood
- by <code>grep</code>, other magical things include:</p>
-
- <table>
- <tr><td><code>^</code> </td> <td>start of a line </td></tr>
- <tr><td><code>$</code> </td> <td>end of a line </td></tr>
- <tr><td><code>[abc]</code></td> <td>one of a, b, or c </td></tr>
- <tr><td><code>[a-z]</code></td> <td>a character in the range a through z</td></tr>
- <tr><td><code>[0-9]</code></td> <td>a character in the range 0 through 9</td></tr>
-
- <tr><td><code>+</code> </td> <td>one or more of the preceding thing </td></tr>
- <tr><td><code>?</code> </td> <td>0 or 1 of the preceding thing </td></tr>
- <tr><td><code>*</code> </td> <td>any number of the preceding thing </td></tr>
-
- <tr><td><code>(foo|bar)</code></td> <td>"foo" or "bar"</td></tr>
- <tr><td><code>(foo)?</code></td> <td>optional "foo"</td></tr>
- </table>
-
-
- <p>It’s actually a little more complicated than that: By default, if you want to
- use a lot of the magical characters, you have to prefix them with <code>\</code>. This is
- both ugly and confusing, so unless you’re writing a very simple pattern, it’s
- often easiest to call <code>grep -E</code>, for <strong>E</strong>xtended regular expressions, which
- means that lots of characters will have special meanings.</p>
-
- <p>Authors with 4-letter first names:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ grep -iE '^[a-z]{4} ' ./authors_*
- ./authors_contemporary_fic:Eden Robinson
- ./authors_sff:John Ronald Reuel Tolkien
- ./authors_sff:John Brunner
- </code></pre>
-
- <!-- end -->
-
-
- <p>A count of authors named John:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ grep -c '^John ' ./all_authors
- 2
- </code></pre>
-
- <!-- end -->
-
-
- <p>Lines in this file matching the words “magic” or “magical”:</p>
-
- <pre><code>$ grep -iE 'magic(al)?' ./index.md
- Pipes are some of the most important magic in the shell. When the people who
- shell to match groups of files, but with more magic.
- `.` and `*` are magical. In the particular dialect of regexen understood
- by `grep`, other magical things include:
- use a lot of the magical characters, you have to prefix them with `\`. This is
- Lines in this file matching the words "magic" or "magical":
- $ grep -iE 'magic(al)?' ./index.md
- </code></pre>
-
- <p>Find some “-agic” words in a big list of words:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ grep -iE '(m|tr|pel)agic' /usr/share/dict/words
- magic
- magic's
- magical
- magically
- magician
- magician's
- magicians
- pelagic
- tragic
- tragically
- tragicomedies
- tragicomedy
- tragicomedy's
- </code></pre>
-
- <!-- end -->
-
-
- <p><code>grep</code> isn’t the only - or even the most important - tool that makes use of
- regular expressions, but it’s a good place to start because it’s one of the
- fundamental building blocks for so many other operations. Filtering lists of
- things, matching patterns within collections, and writing concise descriptions
- of how text should be transformed are at the heart of a practical approach to
- Unix-like systems. Regexen turn out to be a seductively powerful way to do
- these things - so much so that they’ve crept their way into text editors,
- databases, and full-featured programming languages.</p>
-
- <p>There’s a dark side to all of this, for the truth about regular expressions is
- that they are ugly, inconsistent, brittle, and <em>incredibly</em> difficult to think
- clearly about. They take years to master and reward the wielder with great
- power, but they are also a trap: a temptation towards the path of cleverness
- masquerading as wisdom.</p>
-
- <p style="text-align:center;"> ✑</p>
-
- <p>I’ll be returning to this theme, but for the time being let’s move on. Now
- that we’ve established, however haphazardly, some of the basics, let’s consider
- their application to a real-world task.</p>
-
- <hr />
-
- <h1><a name=a-literary-problem href=#a-literary-problem>#</a> 2. a literary problem</h1>
-
- <p>The <a href="../literary_environment">previous chapter</a> introduced a bunch of tools
- using contrived examples. Now we’ll look at a real problem, and work through a
- solution by building on tools we’ve already covered.</p>
-
- <p>So on to the problem: I write poetry.</p>
-
- <p>{rimshot dot wav}</p>
-
- <p>Most of the poems I have written are not very good, but lately I’ve been
- thinking that I’d like to comb through the last ten years' worth and pull
- the least-embarrassing stuff into a single collection.</p>
-
- <p>I’ve hinted at how the contents of my blog are stored as files, but let’s take
- a look at the whole thing:</p>
-
- <pre><code>$ ls -F ~/p1k3/archives/
- 1997/ 2003/ 2009/ bones/ meta/
- 1998/ 2004/ 2010/ chapbook/ winfield/
- 1999/ 2005/ 2011/ cli/ wip/
- 2000/ 2006/ 2012/ colophon/
- 2001/ 2007/ 2013/ europe/
- 2002/ 2008/ 2014/ hack/
- </code></pre>
-
- <p>(<code>ls</code>, again, just lists files. <code>-F</code> tells it to append a character that shows
- it what type of file we’re looking at, such as a trailing / for directories.
- <code>~</code> is a shorthand that means “my home directory”, which in this case is
- <code>/home/brennen</code>.)</p>
-
- <p>Each of the directories here holds other directories. The ones for each year
- have sub-directories for the months of the year, which in turn contain files
- for the days. The files are just little pieces of HTML and Markdown and some
- other stuff. Many years ago, before I had much of an idea how to program, I
- wrote a script to glue them all together into a web page and serve them up to
- visitors. This all sounds complicated, but all it really means is that if I
- want to write a blog entry, I just open a file and type some stuff. Here’s an
- example for March 1st:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat ~/p1k3/archives/2014/3/1
- <h1>Saturday, March 1</h1>
-
- <markdown>
- Sometimes I'm going along on a Saturday morning, still a little dazed from the
- night before, and I think something like "I should just go write a detailed
- analysis of hooded sweatshirts". Mostly these thoughts don't survive contact
- with an actual keyboard. It's almost certainly for the best.
- </markdown>
- </code></pre>
-
- <!-- end -->
-
-
- <p>And here’s an older one that contains a short poem:</p>
-
- <!-- took this one out of exec block 'cause later i
- made a dir out of it... -->
-
-
- <pre><code>$ cat ~/p1k3/archives/2012/10/9
- <h1>tuesday, october 9</h1>
-
- <freeverse>i am a stateful machine
- i exist in a manifold of consequence
- a clattering miscellany of impure functions
- and side effects</freeverse>
- </code></pre>
-
- <p>Notice that <code><freeverse></code> bit? It kind of looks like an HTML tag, but it’s
- not. What it actually does is tell my blog script that it should format the
- text it contains like a poem. The specifics don’t matter for our purposes
- (yet), but this convention is going to come in handy, because the first thing I
- want to do is get a list of all the entries that contain poems.</p>
-
- <p>Remember <code>grep</code>?</p>
-
- <pre><code>$ grep -ri '<freeverse>' ~/p1k3/archives > ~/possible_poems
- </code></pre>
-
- <p>Let’s step through this bit by bit:</p>
-
- <p>First, I’m asking <code>grep</code> to search <strong>r</strong>ecursively, <strong>i</strong>gnoring case.
- “Recursively” just means that every time the program finds a directory, it
- should descend into that directory and search in any files there as well.</p>
-
- <pre><code>grep -ri
- </code></pre>
-
- <p>Next comes a pattern to search for. It’s in single quotes because the
- characters <code><</code> and <code>></code> have a special meaning to the shell, and here we need
- the shell to understand that it should treat them as literal angle brackets
- instead.</p>
-
- <pre><code>'<freeverse>'
- </code></pre>
-
- <p>This is the path I want to search:</p>
-
- <pre><code>~/p1k3/archives
- </code></pre>
-
- <p>Finally, because there are so many entries to search, I know the process will
- be slow and produce a large list, so I tell the shell to redirect it to a file
- called <code>possible_poems</code> in my home directory:</p>
-
- <pre><code>> ~/possible_poems
- </code></pre>
-
- <p>This is quite a few instances…</p>
-
- <pre><code>$ wc -l ~/possible_poems
- 679 /home/brennen/possible_poems
- </code></pre>
-
- <p>…and it’s also not super-pretty to look at:</p>
-
- <pre><code>$ head -5 ~/possible_poems
- /home/brennen/p1k3/archives/2011/10/14:<freeverse>i've got this friend has a real knack
- /home/brennen/p1k3/archives/2011/4/25:<freeverse>i can't claim to strive for it
- /home/brennen/p1k3/archives/2011/8/10:<freeverse>one diminishes or becomes greater
- /home/brennen/p1k3/archives/2011/8/12:<freeverse>
- /home/brennen/p1k3/archives/2011/1/1:<freeverse>six years on
- </code></pre>
-
- <p>Still, it’s a decent start. I can see paths to the files I have to check, and
- usually a first line. Since I use a fancy text editor, I can just go down the
- list opening each file in a new window and copying the stuff I’m interested in
- to a new file.</p>
-
- <p>This is good enough for government work, but what if instead of jumping around
- between hundreds of files, I’d rather read everything in one file and just weed
- out the bad ones as I go?</p>
-
- <pre><code>$ cat `grep -ril '<freeverse>' ~/p1k3/archives` > ~/possible_poems_full
- </code></pre>
-
- <p>This probably bears some explaining. <code>grep</code> is still doing all the real work
- here. The main difference from before is that <code>-l</code> tells grep to just list any
- files it finds which contain a match.</p>
-
- <pre><code>`grep -ril '<freeverse>' ~/p1k3/archives`
- </code></pre>
-
- <p>Notice those backticks around the grep command? This part is a little
- trippier. It turns out that if you put backticks around something in a
- command, it’ll get executed and replaced with its result, which in turn gets
- executed as part of the larger command. So what we’re really saying is
- something like:</p>
-
- <pre><code>$ cat [all of the files in the blog directory with <freeverse> in them]
- </code></pre>
-
- <p>Did you catch that? I just wrote a command that rewrote itself as a
- <em>different</em>, more specific command. And it appears to have worked on the
- first try:</p>
-
- <pre><code>$ wc ~/possible_poems_full
- 17628 80980 528699 /home/brennen/possible_poems_full
- </code></pre>
-
- <p>Welcome to wizard school.</p>
-
- <hr />
-
- <h1><a name=programmerthink href=#programmerthink>#</a> 3. programmerthink</h1>
-
- <p>In the <a href="#a-literary-problem">preceding chapter</a>, I worked through accumulating
- a big piece of text from some other, smaller texts. I started with a bunch of
- files and wound up with one big file called <code>potential_poems_full</code>.</p>
-
- <p>Let’s talk for a minute about how programmers approach problems like this one.
- What I’ve just done is sort of an old-school humanities take on things:
- Metaphorically speaking, I took a book off the shelf and hauled it down to the
- copy machine to xerox a bunch of pages, and now I’m going to start in on them
- with a highlighter and some Post-Its or something. A process like this will
- often trigger a cascade of questions in the programmer-mind:</p>
-
- <ul>
- <li>What if, halfway through the project, I realize my selection criteria were all
- wrong and have to backtrack?</li>
- <li>What if I discover corrections that also need to be made in the source documents?</li>
- <li>What if I want to access metadata, like the original location of a file?</li>
- <li>What if I want to quickly re-order the poems according to some new criteria?</li>
- <li>Why am I storing the same text in two different places?</li>
- </ul>
-
-
- <p>A unifying theme of these questions is that they could all be answered by
- involving a little more abstraction.</p>
-
- <p style="text-align:center;"> ★</p>
-
- <p>Some kinds of abstraction are so common in the physical world that we can
- forget they’re part of a sophisticated technology. For example, a good deal of
- bicycle maintenance can be accomplished with a cheap multi-tool containing a
- few different sizes of hex wrench and a couple of screwdrivers.</p>
-
- <p>A hex wrench or screwdriver doesn’t really know anything about bicycles. All
- it <em>really</em> knows about is fitting into a space and allowing torque to be
- applied. Standardized fasteners and adjustment mechanisms on a bicycle ensure
- that the work can be done anywhere, by anyone with a certain set of tools.
- Standard tools mean that if you can work on a particular bike, you can work on
- <em>most</em> bikes, and even on things that aren’t bikes at all, but were designed by
- people with the same abstractions in mind.</p>
-
- <p>The relationship between a wrench, a bolt, and the purpose of a bolt is a lot
- like something we call <em>indirection</em> in software. Programs like <code>grep</code> or
- <code>cat</code> don’t really know anything about poetry. All they <em>really</em> know about is
- finding lines of text in input, or sticking inputs together. Files, lines, and
- text are like standardized fasteners that allow a user who can work on one kind
- of data (be it poetry, a list of authors, the source code of a program) to use
- the same tools for other problems and other data.</p>
-
- <p style="text-align:center;"> ★</p>
-
- <p>When I first started writing stuff on the web, I edited a page — a single HTML
- file — by hand. When the entries on my nascent blog got old, I manually
- cut-and-pasted them to archive files with names like <code>old_main97.html</code>, which
- held all of the stuff I’d written in 1997.</p>
-
- <p>I’m not holding this up as an example of youthful folly. In fact, it worked
- fine, and just having a single, static file that you can open in any text
- editor has turned out to be a <em>lot</em> more future-proof than the sophisticated
- blogging software people were starting to write at the time.</p>
-
- <p>And yet. Something about this habit nagged at my developing programmer mind
- after a few years. It was just a little bit too manual and repetitive, a
- little bit silly to have to write things like a table of contents by hand, or
- move entries around by copy-and-pasting them to different files. Since I knew
- the date for each entry, and wanted to make them navigable on that basis, why
- not define a directory structure for the years and months, and then write a
- file to hold each day? That way, all I’d have to do is concatenate the files
- in one directory to display any given month:</p>
-
- <pre><code>$ cat ~/p1k3/archives/2014/1/* | head -10
- <h1>Sunday, January 12</h1>
-
- <h2>the one casey is waiting for</h2>
-
- <freeverse>
- after a while
- the thing about drinking
- is that it just feeds
- what you drink to kill
- and kills
- </code></pre>
-
- <p>I ultimately wound up writing a few thousand lines of Perl to do the actual
- work, but the essential idea of the thing is still little more than invoking
- <code>cat</code> on some stuff.</p>
-
- <p>I didn’t know the word for it at the time, but what I was reaching for was a
- kind of indirection. By putting blog posts in a specific directory layout, I
- was creating a simple model of the temporal structure that I considered their
- most important property. Now, if I want to write commands that ask questions
- about my blog posts or re-combine them in certain ways, I can address my
- concerns to this model. Maybe, for example, I want a rough idea how many words
- I’ve written in blog posts so far in 2014:</p>
-
- <pre><code>$ find ~/p1k3/archives/2014/ -type f | xargs cat | wc -w
- 6677
- </code></pre>
-
- <p><code>xargs</code> is not the most intuitive command, but it’s useful and common enough to
- explain here. At the end of last chapter, when I said:</p>
-
- <pre><code>$ cat `grep -ril '<freeverse>' ~/p1k3/archives` > ~/possible_poems_full
- </code></pre>
-
- <p>I could also have written this as:</p>
-
- <pre><code>$ grep -ril '<freeverse>' ~/p1k3/archives | xargs cat > ~/possible_poems_full
- </code></pre>
-
- <p>What this does is take its input, which starts like:</p>
-
- <pre><code>/home/brennen/p1k3/archives/2002/10/16
- /home/brennen/p1k3/archives/2002/10/27
- /home/brennen/p1k3/archives/2002/10/10
- </code></pre>
-
- <p>…and run <code>cat</code> on all the things in it:</p>
-
- <pre><code>cat /home/brennen/p1k3/archives/2002/10/16 /home/brennen/p1k3/archives/2002/10/27 /home/brennen/p1k3/archives/2002/10/10 ...
- </code></pre>
-
- <p>It can be a better idea to use <code>xargs</code>, because while backticks are
- incredibly useful, they have some limitations. If you’re dealing with a very
- large list of files, for example, you might exceed the maximum allowed length
- for arguments to a command on your system. <code>xargs</code> is smart enough to know
- that limit and run <code>cat</code> more than once if needed.</p>
-
- <p><code>xargs</code> is actually sort of a pain to think about, and will make you jump
- through some irritating hoops if you have spaces or other weirdness in your
- filenames, but I wind up using it quite a bit.</p>
-
- <p>Maybe I want to see a table of contents:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ find ~/p1k3/archives/2014/ -type d | xargs ls -v | head -10
- /home/brennen/p1k3/archives/2014/:
- 1
- 2
- 3
- 4
-
- /home/brennen/p1k3/archives/2014/1:
- 5
- 12
- 14
- </code></pre>
-
- <!-- end -->
-
-
- <p>Or find the subtitles I used in 2013:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ find ~/p1k3/archives/2012/ -type f | xargs perl -ne 'print "$1\n" if m{<h2>(.*?)</h2>}'
- pursuit
- fragment
- this poem again
- i'll do better next time
- timebinding animals
- more observations on gear nerdery &amp; utility fetishism
- thrift
- A miracle, in fact, means work
- <em>technical notes for late october</em>, or <em>it gets dork out earlier these days</em>
- radio
- light enough to travel
- 12:06am
- "figures like Heinlein and Gingrich"
- </code></pre>
-
- <!-- end -->
-
-
- <p>The crucial thing about this is that the filesystem <em>itself</em> is just like <code>cat</code>
- and <code>grep</code>: It doesn’t know anything about blogs (or poetry), and it’s
- basically indifferent to the actual <em>structure</em> of a file like
- <code>~/p1k3/archives/2014/1/12</code>. What the filesystem knows is that there are files
- with certain names in certain places. It need not know anything about the
- <em>meaning</em> of those names in order to be useful; in fact, it’s best if it stays
- agnostic about the question, for this enables us to assign our own meaning to a
- structure and manipulate that structure with standard tools.</p>
-
- <p style="text-align:center;"> ★</p>
-
- <p>Back to the problem at hand: I have this collection of files, and I know how
- to extract the ones that contain poems. My goal is to see all the poems and
- collect the subset of them that I still find worthwhile. Just knowing how to
- grep and then edit a big file solves my problem, in a basic sort of way. And
- yet: Something about this nags at my mind. I find that, just as I can already
- use standard tools and the filesystem to ask questions about all of my blog
- posts in a given year or month, I would like to be able to ask questions about
- the set of interesting poems.</p>
-
- <p>If I want the freedom to execute many different sorts of commands against this
- set of poems, it begins to seem that I need a model.</p>
-
- <p>When programmers talk about models, they often mean something that people in
- the sciences would recognize: We find ways to represent the arrangement of
- facts so that we can think about them. A structured representation of things
- often means that we can <em>change</em> those things, or at least derive new
- understanding of them.</p>
-
- <p style="text-align:center;"> ★</p>
-
- <p>At this point in the narrative, I could pretend that my next step is
- immediately obvious, but in fact it’s not. I spend a couple of days thinking
- off and on about how to proceed, scribbling notes during bus rides and while
- drinking beers at the pizza joint down the street. I assess and discard ideas
- which fall into a handful of broad approaches:</p>
-
- <ul>
- <li>Store blog entries in a relational database system which would allow me to
- associate them with data like “this entry is in a collection called ‘ok
- poems’”.</li>
- <li>Selectively build up a file containing the list of files with ok poems, and use
- it to do other tasks.</li>
- <li>Define a format for metadata that lives within entry files.</li>
- <li>Turn each interesting file into a directory of its own which contains a file
- with the original text and another file with metadata.</li>
- </ul>
-
-
- <p>I discard the relational database idea immediately: I like working with files,
- and I don’t feel like abandoning a model that’s served me well for my entire
- adult life.</p>
-
- <p>Building up an index file to point at the other files I’m working with has a
- certain appeal. I’m already most of the way there with the <code>grep</code> output in
- <code>potential_poems</code>. It would be easy to write shell commands to add, remove,
- sort, and search entries. Still, it doesn’t feel like a very satisfying
- solution unto itself. I’d like to know that an entry is part of the collection
- just by looking at the entry, without having to cross-reference it to a list
- somewhere else.</p>
-
- <p>What about putting some meaningful text in the file itself? I thought about
- a bunch of different ways to do this, some of them really complicated, and
- eventually arrived at this:</p>
-
- <pre><code><!-- collection: ok-poems -->
- </code></pre>
-
- <p>The <code><!-- --></code> bits are how you define a comment in HTML, which means that
- neither my blog code nor web browsers nor my text editor have to know anything
- about the format, but I can easily find files with certain values. Check it:</p>
-
- <pre><code>$ find ~/p1k3/archives -type f | xargs perl -ne 'print "$ARGV[0]: $1 -> $2\n" if m{<!-- ([a-z]+): (.*?) -->};'
- /home/brennen/p1k3/archives/2014/2/9: collection -> ok-poems
- </code></pre>
-
- <p>That’s an ugly one-liner, and I haven’t explained half of what it does, but the
- comment format actually seems pretty workable for this. It’s a little tacky to
- look at, but it’s simple and searchable.</p>
-
- <p>Before we settle, though, let’s turn to the notion of making each entry into a
- directory that can contain some structured metadata in a separate file.
- Imagine something like:</p>
-
- <pre><code>$ ls ~/p1k3/archives/2013/2/9
- index Meta
- </code></pre>
-
- <p>Here I use the name “index” for the main part of the entry because it’s a
- convention of web sites for the top-level page in a directory to be called
- something like <code>index.html</code>. As it happens, my blog software already supports
- this kind of file layout for entries which contain multiple parts, image files,
- and so forth.</p>
-
- <pre><code>$ head ~/p1k3/archives/2013/2/9/index
- <h1>saturday, february 9</h1>
-
- <freeverse>
- midwinter midafternoon; depressed as hell
- sitting in a huge cabin in the rich-people mountains
- writing a sprawl, pages, of melancholic midlife bullshit
-
- outside the snow gives way to broken clouds and the
- clear unyielding light of the high country sun fills
-
- $ cat ~/p1k3/archives/2013/2/9/Meta
- collection: ok-poems
- </code></pre>
-
- <p>It would then be easy to <code>find</code> files called <code>Meta</code> and grep them for
- <code>collection: ok-poems</code>.</p>
-
- <p>What if I put metadata right in the filename itself, and dispense with the grep
- altogether?</p>
-
- <pre><code>$ ls ~/p1k3/archives/2013/2/9
- index meta-ok-poem
-
- $ find ~/p1k3/archives -name 'meta-ok-poem'
- /home/brennen/archives/2013/2/9/meta-ok-poem
- </code></pre>
-
- <p>There’s a lot to like about this. For one thing, it’s immediately visible in a
- directory listing. For another, it doesn’t require searching through thousands
- of lines of text to extract a specific string. If a directory has a
- <code>meta-ok-poem</code> in it, I can be pretty sure that it will contain an interesting
- <code>index</code>.</p>
-
- <p>What are the downsides? Well, it requires transforming lots of text files into
- directories-containing-files. I might automate that process, but it’s still a
- little tedious and it makes the layout of the entry archive more complicated
- overall. There’s a cost to doing things this way. It lets me extend my
- existing model of a blog entry to include arbitrary metadata, but it also adds
- steps to writing or finding blog entries.</p>
-
- <p>Abstractions usually cost you something. Is this one worth the hassle?
- Sometimes the best way to answer that question is to start writing code that
- handles a given abstraction.</p>
-
- <hr />
-
- <h1><a name=script href=#script>#</a> 4. script</h1>
-
- <p>Back in chapter 1, I said that “the way you use the computer is often just to write
- little programs that invoke other programs”. In fact, we’ve already gone over a
- bunch of these. Grepping through the text of a previous chapter should pull
- up some good examples:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ grep -E '\$ [a-z]+.*\| ' ../literary_environment/index.md
- $ sort authors_* | uniq -c
- $ sort authors_* | uniq > ./all_authors
- $ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | tail -1
- $ sort authors_* | uniq | wc -l
- $ sort colors | uniq -i | tail -1
- $ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
- $ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
- $ sort -k1 all_authors.tsv | expand -t14
- $ paste firstnames lastnames | sort -k2 | expand -t12
- $ cat ./authors_* | grep 'Vanessa'
- </code></pre>
-
- <!-- end -->
-
-
- <p>None of these one-liners do all that much, but they all take input of one sort
- or another and apply one or more transformations to it. They’re little formal
- sentences describing how to make one thing into another, which is as good a
- definition of programming as most. Or at least this is a good way to describe
- programming-in-the-small. (A lot of the programs we use day-to-day are more
- like essays, novels, or interminable Fantasy series where every character you
- like dies horribly than they are like individual sentences.)</p>
-
- <p>One-liners like these are all well and good when you’re staring at a terminal,
- trying to figure something out - but what about when you’ve already figured it out and
- you want to repeat it in the future?</p>
-
- <p>It turns out that Bash has you covered. Since shell commands are just text,
- they can live in a text file as easily as they can be typed.</p>
-
- <h2><a name=learn-you-an-editor href=#learn-you-an-editor>#</a> learn you an editor</h2>
-
- <p>We’ve skirted the topic so far, but now that we’re talking about writing out
- text files in earnest, you’re going to want a text editor.</p>
-
- <p>My editor is where I spend most of my time that isn’t in a web browser, because
- it’s where I write both code and prose. It turns out that the features which
- make a good code editor overlap a lot with the ones that make a good editor of
- English sentences.</p>
-
- <p>So what should you use? Well, there have been other contenders in recent
- years, but in truth nothing comes close to dethroning the Great Old Ones of
- text editing. Emacs is a creature both primal and sophisticated, like an
- avatar of some interstellar civilization that evolved long before multicellular
- life existed on earth and seeded the galaxy with incomprehensible artefacts and
- colossal engineering projects. Vim is like a lovable chainsaw-studded robot
- with the most elegant keyboard interface in history secretly emblazoned on its
- shining diamond heart.</p>
-
- <p>It’s worth the time it takes to learn one of the serious editors, but there are
- easier places to start. Nano, for example, is easy to pick up, and should be
- available on most systems. To start it, just say:</p>
-
- <pre><code>$ nano file
- </code></pre>
-
- <p>You should see something like this:</p>
-
- <p style="text-align:center;"> <img src="images/nano.png" alt="nano" /></p>
-
- <p>Arrow keys will move your cursor around, and typing stuff will make it appear
- in the file. This is pretty much like every other editor you’ve ever used. If
- you haven’t used Nano before, that stuff along the bottom of the terminal is a
- reference to the most commonly used commands. <code>^</code> is a convention for “Ctrl”,
- so <code>^O</code> means Ctrl-o (the case of the letter doesn’t actually matter), which
- will save the file you’re working on. Ctrl-x will quit, which is probably the
- first important thing to know about any given editor.</p>
-
- <h2><a name=d-i-y-utilities href=#d-i-y-utilities>#</a> d.i.y. utilities</h2>
-
- <p>So back to putting commands in text files. Here’s a file I just created in
- my editor:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat okpoems
- #!/bin/bash
-
- # find all the marker files and get the name of
- # the directory containing each
- find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
-
- exit 0
- </code></pre>
-
- <!-- end -->
-
-
- <p>This is known as a script. There are a handful of things to notice here.
- First, there’s this fragment:</p>
-
- <pre><code>#!/bin/bash
- </code></pre>
-
- <p>The <code>#!</code> right at the beginning, followed by the path to a program, is a
- special sequence that lets the kernel know what program should be used to
- interpret the contents of the file. <code>/bin/bash</code> is the path on the filesystem
- where Bash itself lives. You might see this referred to as a shebang or a hash
- bang.</p>
-
- <p>Lines that start with a <code>#</code> are comments, used to describe the code to a human
- reader. The <code>exit 0</code> tells Bash that the currently running script should exit
- with a status of 0, which basically means “nothing went wrong”.</p>
-
- <p>If you examine the directory listing for <code>okpoems</code>, you’ll see something
- important:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ ls -l okpoems
- -rwxrwxr-x 1 brennen brennen 163 Apr 19 00:08 okpoems
- </code></pre>
-
- <!-- end -->
-
-
- <p>That looks pretty cryptic. For the moment, just remember that those little
- <code>x</code>s in the first bit mean that the file has been marked e<strong>x</strong>ecutable. We
- accomplish this by saying something like:</p>
-
- <pre><code>$ chmod +x ./okpoems
- </code></pre>
-
- <p>Once that’s done, it and the shebang line in combination mean that typing
- <code>./okpoems</code> will have the same effect as typing <code>bash okpoems</code>:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ ./okpoems
- /home/brennen/p1k3/archives/2013/2/9
- /home/brennen/p1k3/archives/2012/3/17
- /home/brennen/p1k3/archives/2012/3/26
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=heavy-lifting href=#heavy-lifting>#</a> heavy lifting</h2>
-
- <p><code>okpoems</code> demonstrates the basics, but it doesn’t do very much. Here’s
- a script with a little more substance to it:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat markpoem
- #!/bin/bash
-
- # $1 is the first parameter to our script
- POEM=$1
-
- # Complain and exit if we weren't given a path:
- if [ ! $POEM ]; then
- echo 'usage: markpoem <path>'
-
- # Confusingly, an exit status of 0 means to the shell that everything went
- # fine, while any other number means that something went wrong.
- exit 64
- fi
-
- if [ ! -e $POEM ]; then
- echo "$POEM not found"
- exit 66
- fi
-
- echo "marking $POEM an ok poem"
-
- POEM_BASENAME=$(basename $POEM)
-
- # If the target is a plain file instead of a directory, make it into
- # a directory and move the content into $POEM/index:
- if [ -f $POEM ]; then
- echo "making $POEM into a directory, moving content to"
- echo " $POEM/index"
- TEMPFILE="/tmp/$POEM_BASENAME.$(date +%s.%N)"
- mv $POEM $TEMPFILE
- mkdir $POEM
- mv $TEMPFILE $POEM/index
- fi
-
- if [ -d $POEM ]; then
- # touch(1) will either create the file or update its timestamp:
- touch $POEM/meta-ok-poem
- else
- echo "something broke - why isn't $POEM a directory?"
- file $POEM
- fi
-
- # Signal that all is copacetic:
- echo kthxbai
- exit 0
- </code></pre>
-
- <!-- end -->
-
-
- <p>Both of these scripts are imperfect, but they were quick to write, they’re made
- out of standard commands, and I don’t yet hate myself for them: All signs that
- I’m not totally on the wrong track with the <code>meta-ok-poem</code> abstraction, and
- could live with it as part of an ongoing writing project. <code>okpoems</code> and
- <code>markpoem</code> would also be easy to use with custom keybindings in my editor. In
- a few more lines of code, I can build a system to wade through the list of
- candidate files and quickly mark the interesting ones.</p>
-
- <h2><a name=generality href=#generality>#</a> generality</h2>
-
- <p>So what’s lacking here? Well, probably a bunch of things, feature-wise. I can
- imagine writing a script to unmark a poem, for example. That said, there’s one
- really glaring problem. “Ok poem” is only one kind of property a blog entry
- might possess. Suppose I wanted a way to express that a poem is terrible?</p>
-
- <p>It turns out I already know how to add properties to an entry. If I generalize
- just a little, the tools become much more flexible.</p>
-
- <!-- exec -->
-
-
- <pre><code>$ ./addprop /home/brennen/p1k3/archives/2012/3/26 meta-terrible-poem
- marking /home/brennen/p1k3/archives/2012/3/26 with meta-terrible-poem
- kthxbai
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ ./findprop meta-terrible-poem
- /home/brennen/p1k3/archives/2012/3/26
- </code></pre>
-
- <!-- end -->
-
-
- <p><code>addprop</code> is only a little different from <code>markpoem</code>. It takes two parameters
- instead of one - the target entry and a property to add.</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat addprop
- #!/bin/bash
-
- ENTRY=$1
- PROPERTY=$2
-
- # Complain and exit if we weren't given a path and a property:
- if [[ ! $ENTRY || ! $PROPERTY ]]; then
- echo "usage: addprop <path> <property>"
- exit 64
- fi
-
- if [ ! -e $ENTRY ]; then
- echo "$ENTRY not found"
- exit 66
- fi
-
- echo "marking $ENTRY with $PROPERTY"
-
- # If the target is a plain file instead of a directory, make it into
- # a directory and move the content into $ENTRY/index:
- if [ -f $ENTRY ]; then
- echo "making $ENTRY into a directory, moving content to"
- echo " $ENTRY/index"
-
- # Get a safe temporary file:
- TEMPFILE=`mktemp`
-
- mv $ENTRY $TEMPFILE
- mkdir $ENTRY
- mv $TEMPFILE $ENTRY/index
- fi
-
- if [ -d $ENTRY ]; then
- touch $ENTRY/$PROPERTY
- else
- echo "something broke - why isn't $ENTRY a directory?"
- file $ENTRY
- fi
-
- echo kthxbai
- exit 0
- </code></pre>
-
- <!-- end -->
-
-
- <p>Meanwhile, <code>findprop</code> is more or less <code>okpoems</code>, but with a parameter for the
- property to find:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat findprop
- #!/bin/bash
-
- if [ ! $1 ]
- then
- echo "usage: findprop <property>"
- exit
- fi
-
- # find all the marker files and get the name of
- # the directory containing each
- find ~/p1k3/archives -name $1 | xargs -n1 dirname
-
- exit 0
- </code></pre>
-
- <!-- end -->
-
-
- <p>These scripts aren’t much more complicated than their poem-specific
- counterparts, but now they can be used to solve problems I haven’t even thought
- of yet, and included in other scripts that need their functionality.</p>
-
- <hr />
-
- <h1><a name=general-purpose-programmering href=#general-purpose-programmering>#</a> 5. general purpose programmering</h1>
-
- <p>I didn’t set out to write a book about programming, <em>as such</em>, but because
- programming and the command line are so inextricably linked, this text
- draws near the subject almost of its own accord.</p>
-
- <p>If you’re not terribly interested in programming, this chapter can easily
- enough be skipped. It’s more in the way of philosophical rambling than
- concrete instruction, and will be of most use to those with an existing
- background in writing code.</p>
-
- <p style="text-align:center;"> ✢</p>
-
- <p>If you’ve used computers for more than a few years, you’re probably viscerally
- aware that most software is fragile and most systems decay. In the time since
- I took my first tentative steps into the little world of a computer (a friend’s
- dad’s unidentifiable gaming machine, my own father’s blue monochrome Zenith
- laptop, the Apple II) the churn has been overwhelming. By now I’ve learned my
- way around vastly more software — operating systems, programming languages and
- development environments, games, editors, chat clients, mail systems — than I
- presently could use if I wanted to. Most of it has gone the way of some
- ancient civilization, surviving (if at all) only in faint, half-understood
- cultural echoes and occasional museum-piece displays. Every user of technology
- becomes, in time, a refugee from an irretrievably recent past.</p>
-
- <p>And yet, despite all this, the shell endures. Most of the ideas in this book
- are older than I am. Most of them could have been applied in 1994 or
- thereabouts, when I first logged on to multiuser systems running AT&T Unix.
- Since the early 1990s, systems built on a fundamental substrate of Unix-like
- behavior and abstractions have proliferated wildly, becoming foundational at
- once to the modern web, the ecosystem of free and open software, and the
- technological dominance ca. 2014 of companies like Apple, Google, and Facebook.</p>
-
- <p>Why is this, exactly?</p>
-
- <p style="text-align:center;"> ✣</p>
-
- <p>As I’ve said (and hopefully shown), the commands you write in your shell
- are essentially little programs. Like other programs, they can be stored
- for later use and recombined with other commands, creating new uses for
- your ideas.</p>
-
- <p>It would be hard to say that there’s any <em>one</em> reason command line environments
- remain so vital after decades of evolution and hard-won refinement in computer
- interfaces, but it seems like this combinatory nature is somewhere near the
- heart of it. The command line often lacks the polish of other interfaces we
- depend on, but in exchange it offers a richness and freedom of expression
- rarely seen elsewhere, and invites its users to build upon its basic
- facilities.</p>
-
- <p>What is it that makes last chapter’s <code>addprop</code> preferable to the more specific
- <code>markpoem</code>? Let’s look at an alternative implementation of <code>markpoem</code>:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat simple_markpoem
- #!/bin/bash
-
- addprop $1 meta-ok-poem
- </code></pre>
-
- <!-- end -->
-
-
- <p>Is this script trivial? Absolutely. It’s so trivial that it barely seems to
- exist, because I already wrote <code>addprop</code> to do all the heavy lifting and play
- well with others, freeing us to imagine new uses for its central idea without
- worrying about the implementation details.</p>
-
- <p>Unlike <code>markpoem</code>, <code>addprop</code> doesn’t know anything about poetry. All it knows
- about, in fact, is putting a file (or three) in a particular place. And this
- is in keeping with a basic insight of Unix: Pieces of software that do one
- very simple thing generalize well. Good command line tools are like a hex
- wrench, a hammer, a utility knife: They embody knowledge of turning, of
- striking, of cutting — and with this kind of knowledge at hand, the user can
- change the world even though no individual tool is made with complete knowledge
- of the world as a whole. There’s a lot of power in the accumulation of small
- competencies.</p>
-
- <p>Of course, if your code is only good at one thing, to be of any use, it has to
- talk to code that’s good at other things. There’s another basic insight in the
- Unix tradition: Tools should be composable. All those little programs have to
- share some assumptions, have to speak some kind of trade language, in order to
- combine usefully. Which is how we’ve arrived at standard IO, pipelines,
- filesystems, and text as as a lowest-common-denominator medium of exchange. If
- you think about most of these things, they have some very rough edges, but they
- give otherwise simple tools ways to communicate without becoming
- super-complicated along the way.</p>
-
- <p style="text-align:center;"> ✤</p>
-
- <p>What is the command line?</p>
-
- <p>The command line is an environment of tool use.</p>
-
- <p>So are kitchens, workshops, libraries, and programming languages.</p>
-
- <p style="text-align:center;"> ✥</p>
-
- <p>Here’s a confession: I don’t like writing shell scripts very much, and I
- can’t blame anyone else for feeling the same way.</p>
-
- <p>That doesn’t mean you shouldn’t <em>know</em> about them, or that you shouldn’t
- <em>write</em> them. I write little ones all the time, and the ability to puzzle
- through other people’s scripts comes in handy. Oftentimes, the best, most
- tasteful way to automate something is to build a script out of the commonly
- available commands. The standard tools are already there on millions of
- machines. Many of them have been pretty well understood for a generation, and
- most will probably be around for a generation or three to come. They do neat
- stuff. Scripts let you build on ideas you’ve already worked out, and give
- repeatable operations a memorable, user-friendly name. They encourage reuse of
- existing programs, and help express your ideas to people who’ll come after you.</p>
-
- <p>One of the reliable markers of powerful software is that it can be scripted: It
- extends to its users some of the same power that its authors used in creating
- it. Scriptable software is to some extent <em>living</em> software. It’s a book that
- you, the reader, get to help write.</p>
-
- <p>In all these ways, shell scripts are wonderful, a little bit magical, and
- quietly indispensable to the machinery of modern civilization.</p>
-
- <p>Unfortunately, in all the ways that a shell like Bash is weird, finicky, and
- covered in 40 years of incidental cruft, long-form Bash scripts are even worse.
- Bash is a useful glue language, particularly if you’re already comfortable
- wiring commands together. Syntactic and conceptual innovations like pipes are
- beautiful and necessary. What Bash is <em>not</em>, despite its power, is a very good
- general purpose programming language. It’s just not especially good at things
- like math, or complex data structures, or not looking like a punctuation-heavy
- variety of alphabet soup.</p>
-
- <p>It turns out that there’s a threshold of complexity beyond which life becomes
- easier if you switch from shell scripting to a more robust language. Just
- where this threshold is located varies a lot between users and problems, but I
- often think about switching languages before a script gets bigger than I can
- view on my screen all at once. <code>addprop</code> is a good example:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ wc -l ../script/addprop
- 41 ../script/addprop
- </code></pre>
-
- <!-- end -->
-
-
- <p>41 lines is a touch over what fits on one screen in the editor I usually use.
- If I were going to add much in the way of features, I’d think pretty hard about
- porting it to another language first.</p>
-
- <p>What’s cool is that if you know a language like C, Python, Perl, Ruby, PHP, or
- JavaScript, your code can participate in the shell environment as a first class
- citizen simply by respecting the conventions of standard IO, files, and command
- line arguments. Often, in order to create a useful utility, it’s only
- necessary to deal with <code>STDIN</code>, or operate on a particular sort of file, and
- most languages offer simple conventions for doing these things.</p>
-
- <p style="text-align:center;"> *</p>
-
- <p>I think the shell can be taught and understood as a humane environment, despite
- all of its ugliness and complication, because it offers the materials of its
- own construction to its users, whatever their concerns. The writer, the
- philosopher, the scientist, the programmer: Files and text and pipes know
- little enough about these things, but in their very indifference to the
- specifics of any one complex purpose, they’re adaptable to the basic needs of
- many. Simple utilities which enact simple kinds of knowledge survive and
- recombine because there is a wisdom to be found in small things.</p>
-
- <p>Files and text know nothing about poetry, nothing in particular of the human
- soul. Neither do pen and ink, printing presses or codex books, but somehow we
- got Shakespeare and Montaigne.</p>
-
- <hr />
-
- <h1><a name=one-of-these-things-is-not-like-the-others href=#one-of-these-things-is-not-like-the-others>#</a> 6. one of these things is not like the others</h1>
-
- <p>If you’re the sort of person who took a few detours into the history of
- religion in college, you might be familiar with some of the ways people used to
- do textual comparison. When pen, paper, and typesetting were what scholars had
- to work with, they did some fairly sophisticated things in order to expose the
- relationships between multiple pieces of text.</p>
-
- <p style="text-align:center;"> <img src="images/throckmorton_small.jpg" height=320 width=470></p>
-
- <p>Here’s a book I got in college: <em>Gospel Parallels: A Comparison of the
- Synoptic Gospels</em>, Burton H. Throckmorton, Jr., Ed. It breaks up three books
- from the New Testament by the stories and themes that they contain, and shows
- the overlapping sections of each book that contain parallel texts. You can
- work your way through and see what parts only show up in one book, or in two
- but not the other, or in all three. Pages are arranged like so:</p>
-
- <pre>
- § JESUS DOES SOME STUFF
- ________________________________________________
- | MAT | MAR | LUK |
- |-----------------+--------------------+---------|
- | Stuff | | |
- | | Stuff | |
- | | Stuff | Stuff |
- | | Stuff | |
- | | Stuff | |
- | | | |
- </pre>
-
-
- <p>The way I understand it, a book like this one only scratches the surface of the
- field. Tools like this support a lot of theory about which books copied each
- other and how, and what other sources they might have copied that we’ve since
- lost.</p>
-
- <p>This is some <em>incredibly</em> dry material, even if you kind of dig thinking about
- the questions it addresses. It takes a special temperament to actually sit
- poring over fragmentary texts in ancient languages and do these painstaking
- comparisons. Even if you’re a writer or editor and work with a lot of
- revisions of a text, there’s a good chance you rarely do this kind of
- comparison on your own work, because that shit is <em>tedious</em>.</p>
-
- <h2><a name=diff href=#diff>#</a> diff</h2>
-
- <p>It turns out that academics aren’t the only people who need tools for comparing
- different versions of a text. Working programmers, in fact, need to do this
- <em>constantly</em>. Programmers are also happiest when putting off the <em>actual</em> task
- at hand to solve some incidental problem that cropped up along the way, so by
- now there are a lot of ways to say “here’s how this file is different from this
- file”, or “here’s how this file is different from itself a year ago”.</p>
-
- <p>Let’s look at a couple of shell scripts from an earlier chapter:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat ../script/okpoems
- #!/bin/bash
-
- # find all the marker files and get the name of
- # the directory containing each
- find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
-
- exit 0
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ cat ../script/findprop
- #!/bin/bash
-
- if [ ! $1 ]
- then
- echo "usage: findprop <property>"
- exit
- fi
-
- # find all the marker files and get the name of
- # the directory containing each
- find ~/p1k3/archives -name $1 | xargs -n1 dirname
-
- exit 0
- </code></pre>
-
- <!-- end -->
-
-
- <p>It’s pretty obvious these are similar files, but do we know what <em>exactly</em>
- changed between them at a glance? It wouldn’t be hard to figure out, once. If
- you wanted to be really certain about it, you could print them out, set them
- side by side, and go over them with a highlighter.</p>
-
- <p>Now imagine doing that for a bunch of files, some of them hundreds or thousands
- of lines long. I’ve actually done that before, colored markers and all, but I
- didn’t feel smart while I was doing it. This is a job for software.</p>
-
- <!-- exec -->
-
-
- <pre><code>$ diff ../script/okpoems ../script/findprop
- 2a3,8
- > if [ ! $1 ]
- > then
- > echo "usage: findprop <property>"
- > exit
- > fi
- >
- 5c11
- < find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
- ---
- > find ~/p1k3/archives -name $1 | xargs -n1 dirname
- </code></pre>
-
- <!-- end -->
-
-
- <p>That’s not the most human-friendly output, but it’s a little simpler than it
- seems at first glance. It’s basically just a way of describing the changes
- needed to turn <code>okpoems</code> into <code>findprop</code>. The string <code>2a3,8</code> can be read as
- “at line 2, add lines 3 through 8”. Lines with a <code>></code> in front of them are
- added. <code>5c11</code> can be read as “line 5 in the original file becomes line 11 in
- the new file”, and the <code><</code> line is replaced with the <code>></code> line. If you wanted,
- you could take a copy of the original file and apply these instructions by hand
- in your text editor, and you’d wind up with the new file.</p>
-
- <p>A lot of people (me included) prefer what’s known as a “unified” diff, because
- it’s easier to read and offers context for the changed lines. We can ask for
- one of these with <code>diff -u</code>:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ diff -u ../script/okpoems ../script/findprop
- --- ../script/okpoems 2014-04-19 00:08:03.321230818 -0600
- +++ ../script/findprop 2014-04-21 21:51:29.360846449 -0600
- @@ -1,7 +1,13 @@
- #!/bin/bash
-
- +if [ ! $1 ]
- +then
- + echo "usage: findprop <property>"
- + exit
- +fi
- +
- # find all the marker files and get the name of
- # the directory containing each
- -find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
- +find ~/p1k3/archives -name $1 | xargs -n1 dirname
-
- exit 0
- </code></pre>
-
- <!-- end -->
-
-
- <p>That’s a little longer, and has some metadata we might not always care about,
- but if you look for lines starting with <code>+</code> and <code>-</code>, it’s easy to read as
- “added these, took away these”. This diff tells us at a glance that we added
- some lines to complain if we didn’t get a command line argument, and replaced
- <code>'meta-ok-poem'</code> in the <code>find</code> command with that argument. Since it shows us
- some context, we have a pretty good idea where those lines are in the file
- and what they’re for.</p>
-
- <p>What if we don’t care exactly <em>how</em> the files differ, but only whether they
- do?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ diff -q ../script/okpoems ../script/findprop
- Files ../script/okpoems and ../script/findprop differ
- </code></pre>
-
- <!-- end -->
-
-
- <p>I use <code>diff</code> a lot in the course of my day job, because I spend a lot of time
- needing to know just how two programs differ. Just as importantly, I often
- need to know how (or whether!) the <em>output</em> of programs differs. As a concrete
- example, I want to make sure that <code>findprop meta-ok-poem</code> is really a suitable
- replacement for <code>okpoems</code>. Since I expect their output to be identical, I can
- do this:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ ../script/okpoems > okpoem_output
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ ../script/findprop meta-ok-poem > findprop_output
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ diff -s okpoem_output findprop_output
- Files okpoem_output and findprop_output are identical
- </code></pre>
-
- <!-- end -->
-
-
- <p>The <code>-s</code> just means that <code>diff</code> should explicitly tell us if files are the
- <strong>s</strong>ame. Otherwise, it’d output nothing at all, because there aren’t any
- differences.</p>
-
- <p>As with many other tools, <code>diff</code> doesn’t very much care whether it’s looking at
- shell scripts or a list of filenames or what-have-you. If you read the man
- page, you’ll find some features geared towards people writing C-like
- programming languages, but its real specialty is just text files with lines
- made out of characters, which works well for lots of code, but certainly could
- be applied to English prose.</p>
-
- <p>Since I have a couple of versions ready to hand, let’s apply this to a text
- with some well-known variations and a bit of a literary legacy. Here’s the
- first day of the Genesis creation narrative in a couple of English
- translations:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat genesis_nkj
- In the beginning God created the heavens and the earth. The earth was without
- form, and void; and darkness was on the face of the deep. And the Spirit of
- God was hovering over the face of the waters. Then God said, "Let there be
- light"; and there was light. And God saw the light, that it was good; and God
- divided the light from the darkness. God called the light Day, and the darkness
- He called Night. So the evening and the morning were the first day.
- </code></pre>
-
- <!-- end -->
-
-
-
-
- <!-- exec -->
-
-
- <pre><code>$ cat genesis_nrsv
- In the beginning when God created the heavens and the earth, the earth was a
- formless void and darkness covered the face of the deep, while a wind from
- God swept over the face of the waters. Then God said, "Let there be light";
- and there was light. And God saw that the light was good; and God separated
- the light from the darkness. God called the light Day, and the darkness he
- called Night. And there was evening and there was morning, the first day.
- </code></pre>
-
- <!-- end -->
-
-
- <p>What happens if we diff them?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ diff -u genesis_nkj genesis_nrsv
- --- genesis_nkj 2014-05-11 16:28:29.692508461 -0600
- +++ genesis_nrsv 2014-05-11 16:28:29.744508459 -0600
- @@ -1,6 +1,6 @@
- -In the beginning God created the heavens and the earth. The earth was without
- -form, and void; and darkness was on the face of the deep. And the Spirit of
- -God was hovering over the face of the waters. Then God said, "Let there be
- -light"; and there was light. And God saw the light, that it was good; and God
- -divided the light from the darkness. God called the light Day, and the darkness
- -He called Night. So the evening and the morning were the first day.
- +In the beginning when God created the heavens and the earth, the earth was a
- +formless void and darkness covered the face of the deep, while a wind from
- +God swept over the face of the waters. Then God said, "Let there be light";
- +and there was light. And God saw that the light was good; and God separated
- +the light from the darkness. God called the light Day, and the darkness he
- +called Night. And there was evening and there was morning, the first day.
- </code></pre>
-
- <!-- end -->
-
-
- <p>Kind of useless, right? If a given line differs by so much as a character,
- it’s not the same line. This highlights the limitations of <code>diff</code> for comparing
- things that</p>
-
- <ul>
- <li>aren’t logically grouped by line</li>
- <li>aren’t easily thought of as versions of the same text with some lines changed</li>
- </ul>
-
-
- <p>We could edit the files into a more logically defined structure, like
- one-line-per-verse, and try again:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ diff -u genesis_nkj_by_verse genesis_nrsv_by_verse
- --- genesis_nkj_by_verse 2014-05-11 16:51:14.312457198 -0600
- +++ genesis_nrsv_by_verse 2014-05-11 16:53:02.484453134 -0600
- @@ -1,5 +1,5 @@
- -In the beginning God created the heavens and the earth.
- -The earth was without form, and void; and darkness was on the face of the deep. And the Spirit of God was hovering over the face of the waters.
- +In the beginning when God created the heavens and the earth,
- +the earth was a formless void and darkness covered the face of the deep, while a wind from God swept over the face of the waters.
- Then God said, "Let there be light"; and there was light.
- -And God saw the light, that it was good; and God divided the light from the darkness.
- -God called the light Day, and the darkness He called Night. So the evening and the morning were the first day.
- +And God saw that the light was good; and God separated the light from the darkness.
- +God called the light Day, and the darkness he called Night. And there was evening and there was morning, the first day.
- </code></pre>
-
- <!-- end -->
-
-
- <p>It might be a little more descriptive, but editing all that text just for a
- quick comparison felt suspiciously like work, and anyway the output still
- doesn’t seem very useful.</p>
-
- <h2><a name=wdiff href=#wdiff>#</a> wdiff</h2>
-
- <p>For cases like this, I’m fond of a tool called <code>wdiff</code>:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ wdiff genesis_nkj genesis_nrsv
- In the beginning {+when+} God created the heavens and the [-earth. The-] {+earth, the+} earth was [-without
- form, and void;-] {+a
- formless void+} and darkness [-was on-] {+covered+} the face of the [-deep. And the Spirit of-] {+deep, while a wind from+}
- God [-was hovering-] {+swept+} over the face of the waters. Then God said, "Let there be light";
- and there was light. And God saw [-the light,-] that [-it-] {+the light+} was good; and God
- [-divided-] {+separated+}
- the light from the darkness. God called the light Day, and the darkness
- [-He-] {+he+}
- called Night. [-So the-] {+And there was+} evening and [-the morning were-] {+there was morning,+} the first day.
- </code></pre>
-
- <!-- end -->
-
-
- <p>Deleted words are surrounded by <code>[- -]</code> and inserted ones by <code>{+ +}</code>. You can
- even ask it to spit out HTML tags for insertion and deletion…</p>
-
- <pre><code>$ wdiff -w '<del>' -x '</del>' -y '<ins>' -z '</ins>' genesis_nkj genesis_nrsv
- </code></pre>
-
- <p>…and come up with something your browser will render like this:</p>
-
- <blockquote>
- <p>In the beginning <ins>when</ins> God created the heavens and the <del>earth. The</del> <ins>earth, the</ins> earth was <del>without
- form, and void;</del> <ins>a
- formless void</ins> and darkness <del>was on</del> <ins>covered</ins> the face of the <del>deep. And the Spirit of</del> <ins>deep, while a wind from</ins>
- God <del>was hovering</del> <ins>swept</ins> over the face of the waters. Then God said, "Let there be light";
- and there was light. And God saw <del>the light,</del> that <del>it</del> <ins>the light</ins> was good; and God
- <del>divided</del> <ins>separated</ins>
- the light from the darkness. God called the light Day, and the darkness
- <del>He</del> <ins>he</ins>
- called Night. <del>So the</del> <ins>And there was</ins> evening and <del>the morning were</del> <ins>there was morning,</ins> the first day.</p>
- </blockquote>
-
-
- <p>Burton H. Throckmorton, Jr. this ain’t. Still, it has its uses.</p>
-
- <hr />
-
- <h1><a name=the-command-line-as-as-a-shared-world href=#the-command-line-as-as-a-shared-world>#</a> 7. the command line as as a shared world</h1>
-
- <p>In an earlier chapter, I wrote:</p>
-
- <blockquote><p>You can think of the shell as a kind of environment you inhabit, in much
- the way your character inhabits an adventure game.</p></blockquote>
-
- <p>It turns out that sometimes there are other human inhabitants of this
- environment.</p>
-
- <p>Unix was built on a model known as “time-sharing”. This is an idea with a lot
- of history, but the very short version is that when computers were rare and
- expensive, it made sense for lots of people to be able to use them at once.
- This is part of the story of how ideas like e-mail and chat were originally
- born, well before networks took over the world: As ways for the many users of
- one computer to communicate on the same machine.</p>
-
- <p>Says Dennis Ritchie:</p>
-
- <blockquote><p>What we wanted to preserve was not just a good environment in which to do
- programming, but a system around which a fellowship could form. We knew from
- experience that the essence of communal computing, as supplied by
- remote-access, time-shared machines, is not just to type programs into a
- terminal instead of a keypunch, but to encourage close communication.</p></blockquote>
-
- <p>Times have changed, and while it’s mundane to use software that’s shared
- between many users, it’s not nearly as common as it once was for a bunch of us
- to be logged into the same computer all at once.</p>
-
- <p style="text-align:center;"> ★</p>
-
- <p>In the mid 1990s, when I was first exposed to Unix, it was by opening up a
- program called NCSA Telnet on one of the Macs at school and connecting to a
- server called mother.esu1.k12.ne.us.</p>
-
- <p>NCSA Telnet was a terminal, not unlike the kind that you use to open a shell on
- your own Linux computer, a piece of software that itself emulated actual,
- physical hardware from an earlier era. Hardware terminals were basically very
- simple computers with keyboards, screens, and just enough networking brains to
- talk to a <em>real</em> computer somewhere else. You’ll still come across these
- scattered around big institutional environments. The last time I looked over
- the shoulder of an airline checkin desk clerk, for example, I saw green
- monochrome text that was probably coming from an IBM mainframe somewhere
- far away.</p>
-
- <p>Part of what was exciting about being logged into a computer somewhere else
- was that you could <em>talk to people</em>.</p>
-
- <p style="text-align:center;"> ★</p>
-
- <p><em>{This chapter is a work in progress.}</em></p>
-
- <hr />
-
- <h1><a name=the-command-line-and-the-web href=#the-command-line-and-the-web>#</a> 8. the command line and the web</h1>
-
- <p>Web browsers are really complicated these days. They’re full of rendering
- engines, audio and video players, programming languages, development tools,
- databases — you name it, and there’s a fair chance it’s in there somewhere.
- The modern web browser is kitchen sink software, and to make matters worse, it
- is <em>totally surrounded</em> by technobabble. It can take <em>years</em> to come to terms
- with the ocean of words about web stuff and sort out the meaningful ones from
- the snake oil and bureaucratic mysticism.</p>
-
- <p>All of which can make the web itself seem like a really complicated landscape,
- and obscure the simplicity of its basic design, which is this:</p>
-
- <p>Some programs pass text around to one another.</p>
-
- <p>Which might sound familiar.</p>
-
- <p>The gist of it is that the web is made out of URLs, “Uniform Resource
- Locators”, which are paths to things. If you squint, these look kind of like
- paths to files on your filesystem. When you visit a URL in your browser, it
- asks a server for a certain path, and the server gives it back some text. When
- you click a button to submit a form, your browser sends some text to the server
- and waits to see what it says back. The text that gets passed around is
- (usually) written in a language with particular significance to web browsers,
- but if you look at it directly, it’s a format that humans can understand.</p>
-
- <p>Let’s illustrate this. I’ve written a really simple web page that lives at
- <a href="http://p1k3.com/hello_world.html"><code>http://p1k3.com/hello_world.html</code></a>.</p>
-
- <pre><code>$ curl 'https://p1k3.com/hello_world.html'
- <html>
- <head>
- <title>hello, world</title>
- </head>
-
- <body>
- <h1>hi everybody</h1>
-
- <p>How are things?</p>
- </body>
- </html>
- </code></pre>
-
- <p><code>curl</code> is a program with lots and lots of features — it too is a little bit
- of a kitchen sink — but it has one core purpose, which is to grab things from
- URLs and spit them back out. It’s a little bit like <code>cat</code> for things that live
- on the web. Try the above command with just about any URL you can think of,
- and you’ll probably get <em>something</em> back. Let’s try this book:</p>
-
- <pre><code>$ curl 'https://p1k3.com/userland-book/' | head
- <!DOCTYPE html>
- <html lang=en>
- <head>
- <meta charset="utf-8">
- <title>userland: a book about the command line for humans</title>
- <link rel=stylesheet href="userland.css" />
- <script src="js/jquery.js" type="text/javascript"></script>
- </head>
-
- <body>
- </code></pre>
-
- <p><code>hello_world.html</code> and <code>userland-book</code> are both written in HyperText Markup
- Language. HTML is just text with a specific kind of structure. It’s been
- around for quite a while now, and has grown up a lot in 20 years, but at heart
- it still looks a lot <a href="http://info.cern.ch/hypertext/WWW/TheProject.html">like it did in 1991</a>.</p>
-
- <p>The basic idea is that the contents of a web page are marked up with tags.
- A tag looks like this:</p>
-
- <pre><code><title>hi!</title> -,
- | | |
- | `- content |
- | `- closing tag
- `-opening tag
- </code></pre>
-
- <p>Sometimes you’ll see tags with what are known as “attributes”:</p>
-
- <pre><code><a href="https://p1k3.com/userland-book">userland</a>
- </code></pre>
-
- <p>This is how links are written in HTML. <code>href="..."</code> tells the browser where to
- go when the user clicks on “<a href="http://p1k3.com/userland-book">userland</a>”.</p>
-
- <p>Tags are a way to describe not so much what something <em>looks like</em> as what
- something <em>means</em>. Browsers are, in large part, big collections of knowledge
- about the meanings of tags and ways to represent those meanings.</p>
-
- <p>While the browser you use day-to-day has (probably) a graphical interface and
- does all sorts of things impossible to render in a terminal, some of the
- earliest web browsers were entirely text-based, and text-mode browsers still
- exist. Lynx, which originated at the University of Kansas in the early 1990s,
- is still actively maintained:</p>
-
- <pre><code>$ lynx -dump 'http://p1k3.com/userland-book/' | head
- userland
- __________________________________________________________________
-
- [1]# a book about the command line for humans
-
- Late last year, [2]a side trip into text utilities got me thinking
- about how much my writing habits depend on the Linux command line. This
- struck me as a good hook for talking about the tools I use every day
- with an audience of mixed technical background.
- </code></pre>
-
- <p>If you invoke Lynx without any options, it’ll start up in interactive mode, and
- you can navigate between links with the arrow keys. <code>lynx -dump</code> spits a
- rendered version of a page to standard output, with links annotated in square
- brackets and printed as footnotes. Another useful option here is <code>-listonly</code>,
- which will print just the list of links contained within a page:</p>
-
- <pre><code>$ lynx -dump -listonly 'http://p1k3.com/userland-book/' | head
-
- References
-
- 2. http://p1k3.com/2013/8/4
- 3. http://p1k3.com/userland-book.git
- 4. https://github.com/brennen/userland-book
- 5. http://p1k3.com/userland-book/
- 6. https://twitter.com/brennen
- 9. http://p1k3.com/userland-book/#a-book-about-the-command-line-for-humans
- 10. http://p1k3.com/userland-book/#copying
- </code></pre>
-
- <p>An alternative to Lynx is w3m, which copes a little more gracefully with the
- complexities of modern web layout.</p>
-
- <pre><code>$ w3m -dump 'http://p1k3.com/userland-book/' | head
- userland
-
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-
- # a book about the command line for humans
-
- Late last year, a side trip into text utilities got me thinking about how much
- my writing habits depend on the Linux command line. This struck me as a good
- hook for talking about the tools I use every day with an audience of mixed
- technical background.
- </code></pre>
-
- <p>Neither of these tools can easily replace enormously capable applications like
- Chrome or Firefox, but they have their place in the toolbox, and help to
- demonstrate how the web is built (in part) on principles we’ve already seen at
- work.</p>
-
- <hr />
-
- <h1><a name=a-miscellany-of-tools-and-techniques href=#a-miscellany-of-tools-and-techniques>#</a> 9. a miscellany of tools and techniques</h1>
-
- <h2><a name=dict href=#dict>#</a> dict</h2>
-
- <p>Want to know the definition of a word, or find useful synonyms?</p>
-
- <pre><code>$ dict concatenate | head -10
- 4 definitions found
-
- From The Collaborative International Dictionary of English v.0.48 [gcide]:
-
- Concatenate \Con*cat"e*nate\ (k[o^]n*k[a^]t"[-e]*n[=a]t), v. t.
- [imp. & p. p. {Concatenated}; p. pr. & vb. n.
- {Concatenating}.] [L. concatenatus, p. p. of concatenare to
- concatenate. See {Catenate}.]
- To link together; to unite in a series or chain, as things
- depending on one another.
- </code></pre>
-
- <h2><a name=aspell href=#aspell>#</a> aspell</h2>
-
- <p>Need to interactively spell-check your presentation notes?</p>
-
- <pre><code>$ aspell check presentation
- </code></pre>
-
- <p>Just want a list of potentially-misspelled words in a given file?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ aspell list < ../literary_environment/index.md | sort | uniq -ci | sort -nr | head -5
- 40 td
- 24 Veselka
- 17 Reuel
- 16 Brunner
- 15 Tiptree
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=mostcommon href=#mostcommon>#</a> mostcommon</h2>
-
- <p>Something like that last sequence sure does seem to show up a lot in my work:
- Spit out the <em>n</em> most common lines in the input, one way or another. Here’s
- a little script to be less repetitive about it.</p>
-
- <!-- exec -->
-
-
- <pre><code>$ aspell list < ../literary_environment/index.md | ./mostcommon -i -n5
- 40 td
- 24 Veselka
- 17 Reuel
- 16 Brunner
- 15 Tiptree
- </code></pre>
-
- <!-- end -->
-
-
- <p>This turns out to be pretty simple:</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cat ./mostcommon
- #!/usr/bin/env bash
-
- # Optionally specify number of lines to show, defaulting to 10:
- TOSHOW=10
- CASEOPT=""
-
- while getopts ":in:" opt; do
- case $opt in
- i)
- CASEOPT="-i"
- ;;
- n)
- TOSHOW=$OPTARG
- ;;
- \?)
- echo "Invalid option: -$OPTARG" >&2
- exit 1
- ;;
- :)
- echo "Option -$OPTARG requires an argument." >&2
- exit 1
- ;;
- esac
- done
-
- # sort and then uniqify STDIN,
- # sort numerically on the first field,
- # chop off everything but $TOSHOW lines of input
-
- sort < /dev/stdin | uniq -c $CASEOPT | sort -k1 -nr | head -$TOSHOW
- </code></pre>
-
- <!-- end -->
-
-
- <p>Notice, though, that it doesn’t handle opening files directly. If you wanted
- to find the most common lines in a file with it, you’d have to say something
- like <code>mostcommon < filename</code> in order to redirect the file to <code>mostcommon</code>’s
- input.</p>
-
- <p>Also notice that most of the script is boilerplate for handling a couple of
- options. The work is all done in a oneliner. Worth it? Maybe not, but an
- interesting exercise.</p>
-
- <h2><a name=cal-and-ncal href=#cal-and-ncal>#</a> cal and ncal</h2>
-
- <p>Want to know what the calendar looks like for this month?</p>
-
- <pre><code>$ cal
- April 2014
- Su Mo Tu We Th Fr Sa
- 1 2 3 4 5
- 6 7 8 9 10 11 12
- 13 14 15 16 17 18 19
- 20 21 22 23 24 25 26
- 27 28 29 30
- </code></pre>
-
- <p>How about for September, 1950, in a more compact format?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ ncal -m9 1950
- September 1950
- Su 3 10 17 24
- Mo 4 11 18 25
- Tu 5 12 19 26
- We 6 13 20 27
- Th 7 14 21 28
- Fr 1 8 15 22 29
- Sa 2 9 16 23 30
- </code></pre>
-
- <!-- end -->
-
-
- <p>Need to know the date of Easter this year?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ ncal -e
- April 20 2014
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=seq href=#seq>#</a> seq</h2>
-
- <p>Need the numbers 1-5?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ seq 1 5
- 1
- 2
- 3
- 4
- 5
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=shuf href=#shuf>#</a> shuf</h2>
-
- <p>Want to shuffle some lines?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ seq 1 5 | shuf
- 2
- 1
- 4
- 3
- 5
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=ptx href=#ptx>#</a> ptx</h2>
-
- <p>Want to make a <a href="http://en.wikipedia.org/wiki/Key_Word_in_Context">permuted index</a> of some phrase?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ echo 'i like american music' | ptx
- i like american music
- i like american music
- i like american music
- i like american music
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=figlet href=#figlet>#</a> figlet</h2>
-
- <p>Need to make ASCII art of some giant letters?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ figlet "R T F M"
- ____ _____ _____ __ __
- | _ \ |_ _| | ___| | \/ |
- | |_) | | | | |_ | |\/| |
- | _ < | | | _| | | | |
- |_| \_\ |_| |_| |_| |_|
- </code></pre>
-
- <!-- end -->
-
-
- <h2><a name=cowsay href=#cowsay>#</a> cowsay</h2>
-
- <p>How about ASCII art of a <del>cow</del> dragon saying something?</p>
-
- <!-- exec -->
-
-
- <pre><code>$ cowsay -f dragon "RTFM, man"
- ___________
- < RTFM, man >
- -----------
- \ / \ //\
- \ |\___/| / \// \\
- /0 0 \__ / // | \ \
- / / \/_/ // | \ \
- @_^_@'/ \/_ // | \ \
- //_^_/ \/_ // | \ \
- ( //) | \/// | \ \
- ( / /) _|_ / ) // | \ _\
- ( // /) '/,_ _ _/ ( ; -. | _ _\.-~ .-~~~^-.
- (( / / )) ,-{ _ `-.|.-~-. .~ `.
- (( // / )) '/\ / ~-. _ .-~ .-~^-. \
- (( /// )) `. { } / \ \
- (( / )) .----~-.\ \-' .~ \ `. \^-.
- ///.----..> \ _ -~ `. ^-` ^-_
- ///-._ _ _ _ _ _ _}^ - - - - ~ ~-- ,.-~
- /.-~
- </code></pre>
-
- <!-- end -->
-
-
- <hr />
-
- <h1><a name=endmatter href=#endmatter>#</a> endmatter</h1>
-
- <h2><a name=further-reading href=#further-reading>#</a> further reading</h2>
-
- <ul>
- <li><em>The Unix Programming Environment</em> - Brian W. Kernighan, Rob Pike</li>
- <li><a href="http://cm.bell-labs.com/cm/cs/who/dmr/hist.html">The Evolution of the Unix Time-sharing System</a> - Dennis M. Ritchie</li>
- <li><a href="https://www.youtube.com/watch?v=tc4ROCJYbm0">AT&T Archives: The UNIX Operating System</a> (YouTube)</li>
- <li><a href="https://medium.com/message/tilde-club-i-had-a-couple-drinks-and-woke-up-with-1-000-nerds-a8904f0a2ebf">I had a couple drinks and woke up with 1,000 nerds</a> - Paul Ford</li>
- </ul>
-
-
- <h2><a name=code href=#code>#</a> code</h2>
-
- <p>As of July 2018, source for this work can be found <a
- href="https://code.p1k3.com/gitea/brennen/userland-book">on code.p1k3.com</a>.
- I welcome feedback there, <a href="https://mastodon.social/brennen">on
- Mastodon</a>, or by mail to userland@p1k3.com.</p>
-
- <h2><a name=copying href=#copying>#</a> copying</h2>
-
- <p>This work is licensed under a
- <a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">Creative
- Commons Attribution-ShareAlike 4.0 International License</a>.</p>
-
- <p><a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">
- <img alt="Creative Commons License" src="images/by-sa-4.png" />
- </a></p>
-
- <hr />
- <script>
- $(document).ready(function () {
- // ☜ ☝ ☞ ☟ ☆ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪
- var closed_sigil = 'show';
- var open_sigil = 'hide';
-
- var togglesigil = function (elem) {
- var sigil = $(elem).html();
- if (sigil === closed_sigil) {
- $(elem).html(open_sigil);
- } else {
- $(elem).html(closed_sigil);
- }
- };
-
- $(".details").each(function () {
- var $this = $(this);
- var $button = $('<button class=clicker-button>' + closed_sigil + '</button>');
- var $details_full = $(this).find('.full');
-
- $button.click(function (e) {
- e.preventDefault();
- $details_full.toggle({
- duration: 550
- });
- togglesigil(this);
- });
-
- $(this).find('.clicker').append($button);
- $button.show();
- });
-
- $('.details .full').hide();
- });
- </script>
- </body>
- </html>
|