brennen
/
userland-book


								<!DOCTYPE html>

								<html lang=en>

								<head>

								  <meta charset="utf-8">

								  <title>userland: a book about the command line for humans</title>

								  <link rel=stylesheet href="userland.css" />

								  <link rel="alternate" type="application/atom+xml" title="changes" href="//p1k3.com/userland-book/feed.xml" />

								  <script src="js/jquery.js" type="text/javascript"></script>

								</head>


								<body>


								<h1 class=bigtitle>userland</h1>

								<hr />


								<h1><a name=a-book-about-the-command-line-for-humans href=#a-book-about-the-command-line-for-humans>#</a> a book about the command line for humans</h1>


								<p>In the fall of 2013, <a href="//p1k3.com/2013/8/4">thinking about</a> text utilities got

								me thinking in turn about how my writing habits depend on the Linux command

								line.  This seems like a good hook for explaining some tools I use every day,

								so now I&rsquo;m writing a short, haphazard book.</p>


								<p>This isn&rsquo;t a book about system administration, writing complex software, or

								becoming a wizard.  I am not a wizard, and I don&rsquo;t subscribe to the idea that

								wizardry is required to use these tools.  In fact, I barely know what I&rsquo;m doing

								most of the time.  I still get some stuff done.</p>


								<p>This is a work in progress.  It probably gets some stuff wrong.</p>


								<p>&ndash; bpb / <a href="https://p1k3.com">p1k3</a> / <a href="https://twitter.com/brennen">@brennen</a></p>


								<div class=details>

								  <h2 class=clicker><a name=contents href=#contents>#</a> contents</h2>

								  <div class=full>

								    <div class=contents><ul>

								<li><a href="#a-book-about-the-command-line-for-humans">a book about the command line for humans</a>


								<ul>

								<li><a href="#contents">contents</a></li>

								</ul>

								</li>

								<li><a href="#get-you-a-shell">0. get you a shell</a>


								<ul>

								<li><a href="#get-an-account-on-a-social-unix-server">get an account on a social unix server</a></li>

								<li><a href="#use-a-raspberry-pi-or-beaglebone">use a raspberry pi or beaglebone</a></li>

								<li><a href="#use-a-virtual-machine">use a virtual machine</a></li>

								</ul>

								</li>

								<li><a href="#the-command-line-as-literary-environment">1. the command line as literary environment</a>


								<ul>

								<li><a href="#terms-and-definitions">terms and definitions</a></li>

								<li><a href="#twisty-little-passages">twisty little passages</a></li>

								<li><a href="#cat">cat</a></li>

								<li><a href="#wildcards">wildcards</a></li>

								<li><a href="#sort">sort</a></li>

								<li><a href="#options">options</a></li>

								<li><a href="#uniq">uniq</a></li>

								<li><a href="#standard-IO">standard IO</a></li>

								<li><a href="#code-help-code-and-man-pages"><code>&ndash;help</code> and man pages</a></li>

								<li><a href="#wc">wc</a></li>

								<li><a href="#head-tail-and-cut">head, tail, and cut</a></li>

								<li><a href="#tab-separated-values">tab separated values</a></li>

								<li><a href="#finding-text-grep">finding text: grep</a></li>

								<li><a href="#now-you-have-n-problems">now you have n problems</a></li>

								</ul>

								</li>

								<li><a href="#a-literary-problem">2. a literary problem</a></li>

								<li><a href="#programmerthink">3. programmerthink</a></li>

								<li><a href="#script">4. script</a>


								<ul>

								<li><a href="#learn-you-an-editor">learn you an editor</a></li>

								<li><a href="#d-i-y-utilities">d.i.y. utilities</a></li>

								<li><a href="#heavy-lifting">heavy lifting</a></li>

								<li><a href="#generality">generality</a></li>

								</ul>

								</li>

								<li><a href="#general-purpose-programmering">5. general purpose programmering</a></li>

								<li><a href="#one-of-these-things-is-not-like-the-others">6. one of these things is not like the others</a>


								<ul>

								<li><a href="#diff">diff</a></li>

								<li><a href="#wdiff">wdiff</a></li>

								</ul>

								</li>

								<li><a href="#the-command-line-as-as-a-shared-world">7. the command line as as a shared world</a></li>

								<li><a href="#the-command-line-and-the-web">8. the command line and the web</a></li>

								<li><a href="#a-miscellany-of-tools-and-techniques">9. a miscellany of tools and techniques</a>


								<ul>

								<li><a href="#dict">dict</a></li>

								<li><a href="#aspell">aspell</a></li>

								<li><a href="#mostcommon">mostcommon</a></li>

								<li><a href="#cal-and-ncal">cal and ncal</a></li>

								<li><a href="#seq">seq</a></li>

								<li><a href="#shuf">shuf</a></li>

								<li><a href="#ptx">ptx</a></li>

								<li><a href="#figlet">figlet</a></li>

								<li><a href="#cowsay">cowsay</a></li>

								</ul>

								</li>

								<li><a href="#endmatter">endmatter</a>


								<ul>

								<li><a href="#further-reading">further reading</a></li>

								<li><a href="#code">code</a></li>

								<li><a href="#copying">copying</a></li>

								</ul>

								</li>

								</ul>


								</div>

								  </div>

								</div>


								<hr />


								<h1><a name=get-you-a-shell href=#get-you-a-shell>#</a> 0. get you a shell</h1>


								<p>You don&rsquo;t have to have a shell at hand to get something out of this book.

								Still, as with most practical subjects, you&rsquo;ll learn more if you try things out

								as you go.  You shouldn&rsquo;t feel guilty about skipping this section.  It will

								always be here later if you need it.</p>


								<p>Not so long ago, it was common for schools and ISPs to hand out shell accounts

								on big shared systems.  People learned the command line as a side effect of

								reading their e-mail.</p>


								<p>That doesn&rsquo;t happen as often now, but in the meanwhile computers have become

								relatively cheap and free software is abundant.  If you&rsquo;re reading this on the

								web, you can probably get access to a shell.  Some options follow.</p>


								<h2><a name=get-an-account-on-a-social-unix-server href=#get-an-account-on-a-social-unix-server>#</a> get an account on a social unix server</h2>


								<p>Check out <a href="https://tilde.town/">tilde.town</a>:</p>


								<blockquote><p>tilde.town is an intentional digital community for making art, socializing, and

								learning. Unlike many online spaces, users interact with tilde.town through a

								direct connection instead of a web site. This means using a tool called ssh and

								other text based tools.</p></blockquote>


								<h2><a name=use-a-raspberry-pi-or-beaglebone href=#use-a-raspberry-pi-or-beaglebone>#</a> use a raspberry pi or beaglebone</h2>


								<p>Do you have a single-board computer laying around?  Perfect.  If you already

								run the standard Raspbian, Debian on a BeagleBone, or a similar-enough Linux,

								you don&rsquo;t need much else.  I wrote most of this text on a Raspberry Pi, and the

								example commands should all work there.</p>


								<h2><a name=use-a-virtual-machine href=#use-a-virtual-machine>#</a> use a virtual machine</h2>


								<p>A few options:</p>


								<ul>

								<li><a href="https://docs.vagrantup.com/v2/getting-started/index.html">Use Vagrant to spin up a machine in Virtualbox</a></li>

								<li><a href="https://www.digitalocean.com/community/tutorials/how-to-create-your-first-digitalocean-droplet-virtual-server">Use DigitalOcean to create a remotely-hosted VM running Linux</a></li>

								</ul>


								<hr />


								<h1><a name=the-command-line-as-literary-environment href=#the-command-line-as-literary-environment>#</a> 1. the command line as literary environment</h1>


								<p>There&rsquo;re a lot of ways to structure an introduction to the command line.  I&rsquo;m

								going to start with writing as a point of departure because, aside from web

								development, it&rsquo;s what I use a computer for most.  I want to shine a light on

								the humane potential of ideas that are usually understood as nerd trivia.

								Computers have utterly transformed the practice of writing within the space of

								my lifetime, but it seems to me that writers as a class miss out on many of the

								software tools and patterns taken as a given in more &ldquo;technical&rdquo; fields.</p>


								<p>Writing, particularly writing of any real scope or complexity, is very much a

								technical task.  It makes demands, both physical and psychological, of its

								practitioners.  As with woodworkers, graphic artists, and farmers, writers

								exhibit strong preferences in their tools, materials, and environment, and they

								do so because they&rsquo;re engaged in a physically and cognitively challenging task.</p>


								<p>My thesis is that the modern Linux command line is a pretty good environment

								for working with English prose and prosody, and that maybe this will illuminate

								the ways it could be useful in your own work with a computer, whatever that

								work happens to be.</p>


								<h2><a name=terms-and-definitions href=#terms-and-definitions>#</a> terms and definitions</h2>


								<p>What software are we actually talking about when we say &ldquo;the command line&rdquo;?</p>


								<p>For the purposes of this discussion, we&rsquo;re talking about an environment built

								on a very old paradigm called Unix.</p>


								<p style="text-align:center;"> <img src="images/jp_unix.jpg" height=320 width=470></p>


								<p>&hellip;except what classical Unix really looks like is this:</p>


								<p style="text-align:center;"> <img src="images/blinking.gif" width=470></p>


								<p>The Unix-like environment we&rsquo;re going to use isn&rsquo;t very classical, really.

								It&rsquo;s an operating system kernel called Linux, combined with a bunch of things

								written by other people (people in the GNU and Debian projects, and many

								others).  Purists will tell you that this isn&rsquo;t properly Unix at all.  In

								strict historical terms they&rsquo;re right, or at least a certain kind of right, but

								for the purposes of my cultural agenda I&rsquo;m going to ignore them right now.</p>


								<p style="text-align:center;"> <img src="images/debian.png"></p>


								<p>This is what&rsquo;s called a shell.  There are many different shells, but they

								pretty much all operate on the same idea:  You navigate a filesystem and run

								programs by typing commands.  Commands can be combined in various ways to make

								programs of their own, and in fact the way you use the computer is often just

								to write little programs that invoke other programs, turtles-all-the-way-down

								style.</p>


								<p>The standard shell these days is something called Bash, so we&rsquo;ll use Bash.

								It&rsquo;s what you&rsquo;ll most often see in the wild.  Like most shells, Bash is ugly

								and stupid in more ways than it is possible to easily summarize.  It&rsquo;s also an

								incredibly powerful and expressive piece of software.</p>


								<h2><a name=twisty-little-passages href=#twisty-little-passages>#</a> twisty little passages</h2>


								<p>Have you ever played a text-based adventure game or MUD, of the kind that

								describes a setting and takes commands for movement and so on?  Readers of a

								certain age and temperament might recognize the opening of Crowther &amp; Woods'

								<em>Adventure</em>, the great-granddaddy of text adventure games:</p>


								<pre><code>YOU ARE STANDING AT THE END OF A ROAD BEFORE A SMALL BRICK BUILDING.

								AROUND YOU IS A FOREST.  A SMALL STREAM FLOWS OUT OF THE BUILDING ANd

								DOWN A GULLY.


								&gt; GO EAST


								YOU ARE INSIDE A BUILDING, A WELL HOUSE FOR A LARGE SPRING.


								THERE ARE SOME KEYS ON THE GROUND HERE.


								THERE IS A SHINY BRASS LAMP NEARBY.


								THERE IS FOOD HERE.


								THERE IS A BOTTLE OF WATER HERE.

								</code></pre>


								<p>You can think of the shell as a kind of environment you inhabit, in much the

								way your character inhabits an adventure game.  The difference is that instead

								of navigating around virtual rooms and hallways with commands like <code>LOOK</code> and

								<code>EAST</code>, you navigate between directories by typing commands like <code>ls</code> and <code>cd

								notes</code>:</p>


								<pre><code>$ ls

								code  Downloads  notes  p1k3  photos  scraps  userland-book

								$ cd notes

								$ ls

								notes.txt  sparkfun  TODO.txt

								</code></pre>


								<p><code>ls</code> lists files.  Some files are directories, which means they can contain

								other files, and you can step inside of them by typing <code>cd</code> (for <strong>c</strong>hange

								<strong>d</strong>irectory).</p>


								<p>In the Macintosh and Windows world, directories have been called

								&ldquo;folders&rdquo; for a long time now.  This isn&rsquo;t the <em>worst</em> metaphor for what&rsquo;s

								going on, and it&rsquo;s so pervasive by now that it&rsquo;s not worth fighting about.

								It&rsquo;s also not exactly a <em>great</em> metaphor, since computer filesystems aren&rsquo;t

								built very much like the filing cabinets of yore.  A directory acts a lot like

								a container of some sort, but it&rsquo;s an infinitely expandable one which may

								contain nested sub-spaces much larger than itself.  Directories are frequently

								like the TARDIS: Bigger on the inside.</p>


								<h2><a name=cat href=#cat>#</a> cat</h2>


								<p>When you&rsquo;re in the shell, you have many tools at your disposal - programs that

								can be used on many different files, or chained together with other programs.

								They tend to have weird, cryptic names, but a lot of them do very simple

								things.  Tasks that might be a menu item in a big program like Word, like

								counting the number of words in a document or finding a particular phrase, are

								often programs unto themselves.  We&rsquo;ll start with something even more basic

								than that.</p>


								<p>Suppose you have some files, and you&rsquo;re curious what&rsquo;s in them.  For example,

								suppose you&rsquo;ve got a list of authors you&rsquo;re planning to reference, and you just

								want to check its contents real quick-like.  This is where our friend <code>cat</code>

								comes in:</p>


								<!-- exec -->


								<pre><code>$ cat authors_sff

								Ursula K. Le Guin

								Jo Walton

								Pat Cadigan

								John Ronald Reuel Tolkien

								Vanessa Veselka

								James Tiptree, Jr.

								John Brunner

								</code></pre>


								<!-- end -->


								<p>&ldquo;Why,&rdquo; you might be asking, &ldquo;is the command to dump out the contents of a file

								to a screen called <code>cat</code>?  What do felines have to do with anything?&rdquo;</p>


								<p>It turns out that <code>cat</code> is actually short for &ldquo;catenate&rdquo;, which is a long

								word basically meaning &ldquo;stick things together&rdquo;.  In programming, we usually

								refer to sticking two bits of text together as &ldquo;string concatenation&rdquo;, probably

								because programmers like to feel like they&rsquo;re being very precise about very

								simple actions.</p>


								<p>Suppose you wanted to see the contents of a <em>set</em> of author lists:</p>


								<!-- exec -->


								<pre><code>$ cat authors_sff authors_contemporary_fic authors_nat_hist

								Ursula K. Le Guin

								Jo Walton

								Pat Cadigan

								John Ronald Reuel Tolkien

								Vanessa Veselka

								James Tiptree, Jr.

								John Brunner

								Eden Robinson

								Vanessa Veselka

								Miriam Toews

								Gwendolyn L. Waring

								</code></pre>


								<!-- end -->


								<h2><a name=wildcards href=#wildcards>#</a> wildcards</h2>


								<p>We&rsquo;re working with three filenames: <code>authors_sff</code>, <code>authors_contemporary_fic</code>,

								and <code>authors_nat_hist</code>.  That&rsquo;s an awful lot of typing every time we want to do

								something to all three files.  Fortunately, our shell offers a shorthand for

								&ldquo;all the files that start with <code>authors_</code>&rdquo;:</p>


								<!-- exec -->


								<pre><code>$ cat authors_*

								Eden Robinson

								Vanessa Veselka

								Miriam Toews

								Gwendolyn L. Waring

								Ursula K. Le Guin

								Jo Walton

								Pat Cadigan

								John Ronald Reuel Tolkien

								Vanessa Veselka

								James Tiptree, Jr.

								John Brunner

								</code></pre>


								<!-- end -->


								<p>In Bash-land, <code>*</code> basically means &ldquo;anything&rdquo;, and is known in the vernacular,

								somewhat poetically, as a &ldquo;wildcard&rdquo;.  You should always be careful with

								wildcards, especially if you&rsquo;re doing anything destructive.  They can and will

								surprise the unwary.  Still, once you&rsquo;re used to the idea, they will save you a

								lot of RSI.</p>


								<h2><a name=sort href=#sort>#</a> sort</h2>


								<p>There&rsquo;s a problem here.  Our author list is out of order, and thus confusing to

								reference.  Fortunately, since one of the most basic things you can do to a

								list is to sort it, someone else has already solved this problem for us.

								Here&rsquo;s a command that will give us some organization:</p>


								<!-- exec -->


								<pre><code>$ sort authors_*

								Eden Robinson

								Gwendolyn L. Waring

								James Tiptree, Jr.

								John Brunner

								John Ronald Reuel Tolkien

								Jo Walton

								Miriam Toews

								Pat Cadigan

								Ursula K. Le Guin

								Vanessa Veselka

								Vanessa Veselka

								</code></pre>


								<!-- end -->


								<p>Does it bother you that they aren&rsquo;t sorted by last name?  Me too.  As a partial

								solution, we can ask <code>sort</code> to use the second &ldquo;field&rdquo; in each line as its sort

								<strong>k</strong>ey (by default, sort treats whitespace as a division between fields):</p>


								<!-- exec -->


								<pre><code>$ sort -k2 authors_*

								John Brunner

								Pat Cadigan

								Ursula K. Le Guin

								Gwendolyn L. Waring

								Eden Robinson

								John Ronald Reuel Tolkien

								James Tiptree, Jr.

								Miriam Toews

								Vanessa Veselka

								Vanessa Veselka

								Jo Walton

								</code></pre>


								<!-- end -->


								<p>That&rsquo;s closer, right?  It sorted on &ldquo;Cadigan&rdquo; and &ldquo;Veselka&rdquo; instead of &ldquo;Pat&rdquo;

								and &ldquo;Vanessa&rdquo;.  (Of course, it&rsquo;s still far from perfect, because the

								second field in each line isn&rsquo;t necessarily the person&rsquo;s last name.)</p>


								<h2><a name=options href=#options>#</a> options</h2>


								<p>Above, when we wanted to ask <code>sort</code> to behave differently, we gave it what is

								known as an option.  Most programs with command-line interfaces will allow

								their behavior to be changed by adding various options.  Options usually

								(but not always!) look like <code>-o</code> or <code>--option</code>.</p>


								<p>For example, if we wanted to see just the unique lines, irrespective of case,

								for a file called colors:</p>


								<!-- exec -->


								<pre><code>$ cat colors

								RED

								blue

								red

								BLUE

								Green

								green

								GREEN

								</code></pre>


								<!-- end -->


								<p>We could write this:</p>


								<!-- exec -->


								<pre><code>$ sort -uf colors

								blue

								Green

								RED

								</code></pre>


								<!-- end -->


								<p>Here <code>-u</code> stands for <strong>u</strong>nique and <code>-f</code> stands for <strong>f</strong>old case, which means

								to treat upper- and lower-case letters as the same for comparison purposes.  You&rsquo;ll

								often see a group of short options following the <code>-</code> like this.</p>


								<h2><a name=uniq href=#uniq>#</a> uniq</h2>


								<p>Did you notice how Vanessa Veselka shows up twice in our list of authors?

								That&rsquo;s useful if we want to remember that she&rsquo;s in more than one category, but

								it&rsquo;s redundant if we&rsquo;re just worried about membership in the overall set of

								authors.  We can make sure our list doesn&rsquo;t contain repeating lines by using

								<code>sort</code>, just like with that list of colors:</p>


								<!-- exec -->


								<pre><code>$ sort -u -k2 authors_*

								John Brunner

								Pat Cadigan

								Ursula K. Le Guin

								Gwendolyn L. Waring

								Eden Robinson

								John Ronald Reuel Tolkien

								James Tiptree, Jr.

								Miriam Toews

								Vanessa Veselka

								Jo Walton

								</code></pre>


								<!-- end -->


								<p>But there&rsquo;s another approach to this &mdash; <code>sort</code> is good at only displaying a line

								once, but suppose we wanted to see a count of how many different lists an

								author shows up on?  <code>sort</code> doesn&rsquo;t do that, but a command called <code>uniq</code> does,

								if you give it the option <code>-c</code> for <strong>c</strong>ount.</p>


								<p><code>uniq</code> moves through the lines in its input, and if it sees a line more than

								once in sequence, it will only print that line once.  If you have a bunch of

								files and you just want to see the unique lines across all of those files, you

								probably need to run them through <code>sort</code> first.  How do you do that?</p>


								<!-- exec -->


								<pre><code>$ sort authors_* | uniq -c

								      1 Eden Robinson

								      1 Gwendolyn L. Waring

								      1 James Tiptree, Jr.

								      1 John Brunner

								      1 John Ronald Reuel Tolkien

								      1 Jo Walton

								      1 Miriam Toews

								      1 Pat Cadigan

								      1 Ursula K. Le Guin

								      2 Vanessa Veselka

								</code></pre>


								<!-- end -->


								<h2><a name=standard-IO href=#standard-IO>#</a> standard IO</h2>


								<p>The <code>|</code> is called a &ldquo;pipe&rdquo;.  In the command above, it tells your shell that

								instead of printing the output of <code>sort authors_*</code> right to your terminal, it

								should send it to <code>uniq -c</code>.</p>


								<p style="text-align:center;"> <img src="images/pipe.gif"></p>


								<p>Pipes are some of the most important magic in the shell.  When the people who

								built Unix in the first place give interviews about the stuff they remember

								from the early days, a lot of them reminisce about the invention of pipes and

								all of the new stuff it immediately made possible.</p>


								<p>Pipes help you control a thing called &ldquo;standard IO&rdquo;.  In the world of the

								command line, programs take <strong>i</strong>nput and produce <strong>o</strong>utput.  A pipe is a way

								to hook the output from one program to the input of another.</p>


								<p>Unlike a lot of the weirdly named things you&rsquo;ll encounter in software, the

								metaphor here is obvious and makes pretty good sense.  It even kind of looks

								like a physical pipe.</p>


								<p>What if, instead of sending the output of one program to the input of another,

								you&rsquo;d like to store it in a file for later use?</p>


								<p>Check it out:</p>


								<!-- exec -->


								<pre><code>$ sort authors_* | uniq &gt; ./all_authors

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ cat all_authors

								Eden Robinson

								Gwendolyn L. Waring

								James Tiptree, Jr.

								John Brunner

								John Ronald Reuel Tolkien

								Jo Walton

								Miriam Toews

								Pat Cadigan

								Ursula K. Le Guin

								Vanessa Veselka

								</code></pre>


								<!-- end -->


								<p>I like to think of the <code>&gt;</code> as looking like a little funnel.  It can be

								dangerous &mdash; you should always make sure that you&rsquo;re not going to clobber

								an existing file you actually want to keep.</p>


								<p>If you want to tack more stuff on to the end of an existing file, you can use

								<code>&gt;&gt;</code> instead.  To test that, let&rsquo;s use <code>echo</code>, which prints out whatever string

								you give it on a line by itself:</p>


								<!-- exec -->


								<pre><code>$ echo 'hello' &gt; hello_world

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ echo 'world' &gt;&gt; hello_world

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ cat hello_world

								hello

								world

								</code></pre>


								<!-- end -->


								<p>You can also take a file and pull it directly back into the input of a given

								program, which is a bit like a funnel going the other direction:</p>


								<!-- exec -->


								<pre><code>$ nl &lt; all_authors

								     1  Eden Robinson

								     2  Gwendolyn L. Waring

								     3  James Tiptree, Jr.

								     4  John Brunner

								     5  John Ronald Reuel Tolkien

								     6  Jo Walton

								     7  Miriam Toews

								     8  Pat Cadigan

								     9  Ursula K. Le Guin

								    10  Vanessa Veselka

								</code></pre>


								<!-- end -->


								<p><code>nl</code> is just a way to <strong>n</strong>umber <strong>l</strong>ines.  This command accomplishes pretty much

								the same thing as <code>cat all_authors | nl</code>, or <code>nl all_authors</code>.  You won&rsquo;t see

								it used as often as <code>|</code> and <code>&gt;</code>, since most utilities can read files on their

								own, but it can save you typing <code>cat</code> quite as often.</p>


								<p>We&rsquo;ll use these features liberally from here on out.</p>


								<h2><a name=code-help-code-and-man-pages href=#code-help-code-and-man-pages>#</a> <code>--help</code> and man pages</h2>


								<p>You can change the behavior of most tools by giving them different options.

								This is all well and good if you already know what options are available,

								but what if you don&rsquo;t?</p>


								<p>Often, you can ask the tool itself:</p>


								<pre><code>$ sort --help

								Usage: sort [OPTION]... [FILE]...

								  or:  sort [OPTION]... --files0-from=F

								Write sorted concatenation of all FILE(s) to standard output.


								Mandatory arguments to long options are mandatory for short options too.

								Ordering options:


								  -b, --ignore-leading-blanks  ignore leading blanks

								  -d, --dictionary-order      consider only blanks and alphanumeric characters

								  -f, --ignore-case           fold lower case to upper case characters

								  -g, --general-numeric-sort  compare according to general numerical value

								  -i, --ignore-nonprinting    consider only printable characters

								  -M, --month-sort            compare (unknown) &lt; 'JAN' &lt; ... &lt; 'DEC'

								  -h, --human-numeric-sort    compare human readable numbers (e.g., 2K 1G)

								  -n, --numeric-sort          compare according to string numerical value

								  -R, --random-sort           sort by random hash of keys

								      --random-source=FILE    get random bytes from FILE

								  -r, --reverse               reverse the result of comparisons

								</code></pre>


								<p>&hellip;and so on.  (It goes on for a while in this vein.)</p>


								<p>If that doesn&rsquo;t work, or doesn&rsquo;t provide enough info, the next thing to try is

								called a man page.  (&ldquo;man&rdquo; is short for &ldquo;manual&rdquo;.  It&rsquo;s sort of an unfortunate

								abbreviation.)</p>


								<pre><code>$ man sort


								SORT(1)                         User Commands                        SORT(1)


								NAME

								       sort - sort lines of text files


								SYNOPSIS

								       sort [OPTION]... [FILE]...

								       sort [OPTION]... --files0-from=F


								DESCRIPTION

								       Write sorted concatenation of all FILE(s) to standard output.

								</code></pre>


								<p>&hellip;and so on.  Manual pages vary in quality, and it can take a while to get

								used to reading them, but they&rsquo;re very often the best place to look for help.</p>


								<p>If you&rsquo;re not sure what <em>program</em> you want to use to solve a given problem, you

								might try searching all the man pages on the system for a keyword.  <code>man</code>

								itself has an option to let you do this - <code>man -k keyword</code> - but most systems

								also have a shortcut called <code>apropos</code>, which I like to use because it&rsquo;s easy to

								remember if you imagine yourself saying &ldquo;apropos of [some problem I have]&hellip;&rdquo;</p>


								<!-- exec -->


								<pre><code>$ apropos -s1 sort

								apt-sortpkgs (1)     - Utility to sort package index files

								bunzip2 (1)          - a block-sorting file compressor, v1.0.6

								bzip2 (1)            - a block-sorting file compressor, v1.0.6

								comm (1)             - compare two sorted files line by line

								sort (1)             - sort lines of text files

								tsort (1)            - perform topological sort

								</code></pre>


								<!-- end -->


								<p>It&rsquo;s useful to know that the manual represented by <code>man</code> has numbered sections

								for different kinds of manual pages.  Most of what the average user needs to

								know about lives in section 1, &ldquo;User Commands&rdquo;, so you&rsquo;ll often see the names

								of different tools written like <code>sort(1)</code> or <code>cat(1)</code>.  This can be a good way

								to make it clear in writing that you&rsquo;re talking about a specific piece of

								software rather than a verb or a small carnivorous mammal.  (I specified <code>-s1</code>

								for section 1 above just to cut down on clutter, though in practice I usually

								don&rsquo;t bother.)</p>


								<p>Like other literary traditions, Unix is littered with this sort of convention.

								This one just happens to date from a time when the manual was still a physical

								book.</p>


								<h2><a name=wc href=#wc>#</a> wc</h2>


								<p><code>wc</code> stands for <strong>w</strong>ord <strong>c</strong>ount.  It does about what you&rsquo;d expect - it

								counts the number of words in its input.</p>


								<pre><code>$ wc index.md

								  736  4117 24944 index.md

								</code></pre>


								<p>736 is the number of lines, 4117 the number of words, and 24944 the number of

								characters in the file I&rsquo;m writing right now.  I use this constantly.  Most

								obviously, it&rsquo;s a good way to get an idea of how much you&rsquo;ve written.  <code>wc</code> is

								the tool I used to track my progress the last time I tried National Novel

								Writing Month:</p>


								<pre><code>$ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | tail -1

								 6585 total

								</code></pre>


								<!-- exec -->


								<pre><code>$ cowsay 'embarrassing.'

								 _______________

								&lt; embarrassing. &gt;

								 ---------------

								        \   ^__^

								         \  (oo)\_______

								            (__)\       )\/\

								                ||----w |

								                ||     ||

								</code></pre>


								<!-- end -->


								<p>Anyway.  The less obvious thing about <code>wc</code> is that you can use it to count the

								output of other commands.  Want to know <em>how many</em> unique authors we have?</p>


								<!-- exec -->


								<pre><code>$ sort authors_* | uniq | wc -l

								10

								</code></pre>


								<!-- end -->


								<p>This kind of thing is trivial, but it comes in handy more often than you might

								think.</p>


								<h2><a name=head-tail-and-cut href=#head-tail-and-cut>#</a> head, tail, and cut</h2>


								<p>Remember our old pal <code>cat</code>, which just splats everything it&rsquo;s given back to

								standard output?</p>


								<p>Sometimes you&rsquo;ve got a piece of output that&rsquo;s more than you actually want to

								deal with at once.  Maybe you just want to glance at the first few lines in a

								file:</p>


								<!-- exec -->


								<pre><code>$ head -3 colors

								RED

								blue

								red

								</code></pre>


								<!-- end -->


								<p>&hellip;or maybe you want to see the last thing in a list:</p>


								<!-- exec -->


								<pre><code>$ sort colors | uniq -i | tail -1

								red

								</code></pre>


								<!-- end -->


								<p>&hellip;or maybe you&rsquo;re only interested in the first &ldquo;field&rdquo; in some list. You might

								use <code>cut</code>  here, asking it to treat spaces as delimiters between fields and

								return only the first field for each line of its input:</p>


								<!-- exec -->


								<pre><code>$ cut -d' ' -f1 ./authors_*

								Eden

								Vanessa

								Miriam

								Gwendolyn

								Ursula

								Jo

								Pat

								John

								Vanessa

								James

								John

								</code></pre>


								<!-- end -->


								<p>Suppose we&rsquo;re curious what the few most commonly occurring first names on our

								author list are?  Here&rsquo;s an approach, silly but effective, that combines a lot

								of what we&rsquo;ve discussed so far and looks like plenty of one-liners I wind up

								writing in real life:</p>


								<!-- exec -->


								<pre><code>$ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3

								      1 Ursula

								      2 John

								      2 Vanessa

								</code></pre>


								<!-- end -->


								<p>Let&rsquo;s walk through this one step by step:</p>


								<p>First, we have <code>cut</code> extract the first field of each line in our author lists.</p>


								<pre><code>cut -d' ' -f1 ./authors_*

								</code></pre>


								<p>Then we sort these results</p>


								<pre><code>| sort

								</code></pre>


								<p>and pass them to <code>uniq</code>, asking it for a case-insensitive count of each

								repeated line</p>


								<pre><code>| uniq -ci

								</code></pre>


								<p>then sort again, numerically,</p>


								<pre><code>| sort -n

								</code></pre>


								<p>and finally, we chop off everything but the last three lines:</p>


								<pre><code>| tail -3

								</code></pre>


								<p>If you wanted to make sure to count an individual author&rsquo;s first name

								only once, even if that author appears more than once in the files,

								you could instead do:</p>


								<!-- exec -->


								<pre><code>$ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3

								      1 Ursula

								      1 Vanessa

								      2 John

								</code></pre>


								<!-- end -->


								<h2><a name=tab-separated-values href=#tab-separated-values>#</a> tab separated values</h2>


								<p>Notice above how we had to tell <code>cut</code> that &ldquo;fields&rdquo; in <code>authors_*</code> are

								delimited by spaces?  It turns out that if you don&rsquo;t use <code>-d</code>, <code>cut</code> defaults

								to using tab characters for a delimiter.</p>


								<p>Tab characters are sort of weird little animals.  You can&rsquo;t usually <em>see</em> them

								directly &mdash; they&rsquo;re like a space character that takes up more than one space

								when displayed.  By convention, one tab is usually rendered as 8 spaces, but

								it&rsquo;s up to the software that&rsquo;s displaying the character what it wants to do.</p>


								<p>(In fact, it&rsquo;s more complicated than that:  Tabs are often rendered as marking

								<em>tab stops</em>, which is a concept I remember from 7th grade typing classes, but

								haven&rsquo;t actually thought about in my day-to-day life for nearly 20 years.)</p>


								<p>Here&rsquo;s a version of our <code>all_authors</code> that&rsquo;s been rearranged so that the first

								field is the author&rsquo;s last name, the second is their first name, the third is

								their middle name or initial (if we know it) and the fourth is any suffix.

								Fields are separated by a single tab character:</p>


								<!-- exec -->


								<pre><code>$ cat all_authors.tsv

								Robinson    Eden

								Waring  Gwendolyn   L.

								Tiptree James       Jr.

								Brunner John

								Tolkien John    Ronald Reuel

								Walton  Jo

								Toews   Miriam

								Cadigan Pat

								Le Guin Ursula  K.

								Veselka Vanessa

								</code></pre>


								<!-- end -->


								<p>That looks kind of garbled, right?  In order to make it a little more obvious

								what&rsquo;s happening, let&rsquo;s use <code>cat -T</code>, which displays tab characters as <code>^I</code>:</p>


								<!-- exec -->


								<pre><code>$ cat -T all_authors.tsv

								Robinson^IEden

								Waring^IGwendolyn^IL.

								Tiptree^IJames^I^IJr.

								Brunner^IJohn

								Tolkien^IJohn^IRonald Reuel

								Walton^IJo

								Toews^IMiriam

								Cadigan^IPat

								Le Guin^IUrsula^IK.

								Veselka^IVanessa

								</code></pre>


								<!-- end -->


								<p>It looks odd when displayed because some names are at or nearly at 8 characters long.

								&ldquo;Robinson&rdquo;, at 8 characters, overshoots the first tab stop, so &ldquo;Eden&rdquo; gets indented

								further than other first names, and so on.</p>


								<p>Fortunately, in order to make this more human-readable, we can pass it through

								<code>expand</code>, which turns tabs into a given number of spaces (8 by default):</p>


								<!-- exec -->


								<pre><code>$ expand -t14 all_authors.tsv

								Robinson      Eden

								Waring        Gwendolyn     L.

								Tiptree       James                       Jr.

								Brunner       John

								Tolkien       John          Ronald Reuel

								Walton        Jo

								Toews         Miriam

								Cadigan       Pat

								Le Guin       Ursula        K.

								Veselka       Vanessa

								</code></pre>


								<!-- end -->


								<p>Now it&rsquo;s easy to sort by last name:</p>


								<!-- exec -->


								<pre><code>$ sort -k1 all_authors.tsv | expand -t14

								Brunner       John

								Cadigan       Pat

								Le Guin       Ursula        K.

								Robinson      Eden

								Tiptree       James                       Jr.

								Toews         Miriam

								Tolkien       John          Ronald Reuel

								Veselka       Vanessa

								Walton        Jo

								Waring        Gwendolyn     L.

								</code></pre>


								<!-- end -->


								<p>Or just extract middle names and initials:</p>


								<!-- exec -->


								<pre><code>$ cut -f3 all_authors.tsv


								L.


								Ronald Reuel


								K.

								</code></pre>


								<!-- end -->


								<p>It probably won&rsquo;t surprise you to learn that there&rsquo;s a corresponding <code>paste</code>

								command, which takes two or more files and stitches them together with tab

								characters.  Let&rsquo;s extract a couple of things from our author list and put them

								back together in a different order:</p>


								<!-- exec -->


								<pre><code>$ cut -f1 all_authors.tsv &gt; lastnames

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ cut -f2 all_authors.tsv &gt; firstnames

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ paste firstnames lastnames | sort -k2 | expand -t12

								John        Brunner

								Pat         Cadigan

								Ursula      Le Guin

								Eden        Robinson

								James       Tiptree

								Miriam      Toews

								John        Tolkien

								Vanessa     Veselka

								Jo          Walton

								Gwendolyn   Waring

								</code></pre>


								<!-- end -->


								<p>As these examples show, TSV is something very like a primitive spreadsheet:  A

								way to represent information in columns and rows.  In fact, it&rsquo;s a close cousin

								of CSV, which is often used as a lowest-common-denominator format for

								transferring spreadsheets, and which represents data something like this:</p>


								<pre><code>last,first,middle,suffix

								Tolkien,John,Ronald Reuel,

								Tiptree,James,,Jr.

								</code></pre>


								<p>The advantage of tabs is that they&rsquo;re supported by a bunch of the standard

								tools.  A disadvantage is that they&rsquo;re kind of ugly and can be weird to deal

								with, but they&rsquo;re useful anyway, and character-delimited rows are often a

								good-enough way to hack your way through problems that call for basic

								structure.</p>


								<h2><a name=finding-text-grep href=#finding-text-grep>#</a> finding text: grep</h2>


								<p>After all those contortions, what if you actually just want to see <em>which lists</em>

								an individual author appears on?</p>


								<!-- exec -->


								<pre><code>$ grep 'Vanessa' ./authors_*

								./authors_contemporary_fic:Vanessa Veselka

								./authors_sff:Vanessa Veselka

								</code></pre>


								<!-- end -->


								<p><code>grep</code> takes a string to search for and, optionally, a list of files to search

								in.   If you don&rsquo;t specify files, it&rsquo;ll look through standard input instead:</p>


								<!-- exec -->


								<pre><code>$ cat ./authors_* | grep 'Vanessa'

								Vanessa Veselka

								Vanessa Veselka

								</code></pre>


								<!-- end -->


								<p>Most of the time, piping the output of <code>cat</code> to <code>grep</code> is considered silly,

								because <code>grep</code> knows how to find things in files on its own.  Many thousands of

								words have been written on this topic by leading lights of the nerd community.</p>


								<p>You&rsquo;ve probably noticed that this result doesn&rsquo;t contain filenames (and thus

								isn&rsquo;t very useful to us).  That&rsquo;s because all <code>grep</code> saw was the lines in the

								files, not the names of the files themselves.</p>


								<h2><a name=now-you-have-n-problems href=#now-you-have-n-problems>#</a> now you have n problems</h2>


								<p>To close out this introductory chapter, let&rsquo;s spend a little time on a topic

								that will likely vex, confound, and (occasionally) delight you for as long as

								you are acquainted with the command line.</p>


								<p>When I was talking about <code>grep</code> a moment ago, I fudged the details more than a

								little by saying that it expects a string to search for.  What <code>grep</code>

								<em>actually</em> expects is a <em>pattern</em>.  Moreover, it expects a specific kind of

								pattern, what&rsquo;s known as a <em>regular expression</em>, a cumbersome phrase frequently

								shortened to regex.</p>


								<p>There&rsquo;s a lot of theory about what makes up a regular expression.  Fortunately,

								very little of it matters to the short version that will let you get useful

								stuff done.  The short version is that a regex is like using wildcards in the

								shell to match groups of files, but for text in general and with more magic.</p>


								<!-- exec -->


								<pre><code>$ grep 'Jo.*' ./authors_*

								./authors_sff:Jo Walton

								./authors_sff:John Ronald Reuel Tolkien

								./authors_sff:John Brunner

								</code></pre>


								<!-- end -->


								<p>The pattern <code>Jo.*</code> says that we&rsquo;re looking for lines which contain a literal

								<code>Jo</code>, followed by any quantity (including none) of any character.  In a regex,

								<code>.</code> means &ldquo;anything&rdquo; and <code>*</code> means &ldquo;any amount of the preceding thing&rdquo;.</p>


								<p><code>.</code> and <code>*</code> are magical.  In the particular dialect of regexen understood

								by <code>grep</code>, other magical things include:</p>


								<table>

								    <tr><td><code>^</code>    </td>  <td>start of a line                     </td></tr>

								    <tr><td><code>$</code>    </td>  <td>end of a line                       </td></tr>

								    <tr><td><code>[abc]</code></td>  <td>one of a, b, or c                   </td></tr>

								    <tr><td><code>[a-z]</code></td>  <td>a character in the range a through z</td></tr>

								    <tr><td><code>[0-9]</code></td>  <td>a character in the range 0 through 9</td></tr>


								    <tr><td><code>+</code>    </td>  <td>one or more of the preceding thing  </td></tr>

								    <tr><td><code>?</code>    </td>  <td>0 or 1 of the preceding thing       </td></tr>

								    <tr><td><code>*</code>    </td>  <td>any number of the preceding thing   </td></tr>


								    <tr><td><code>(foo|bar)</code></td>  <td>"foo" or "bar"</td></tr>

								    <tr><td><code>(foo)?</code></td>     <td>optional "foo"</td></tr>

								</table>


								<p>It&rsquo;s actually a little more complicated than that:  By default, if you want to

								use a lot of the magical characters, you have to prefix them with <code>\</code>.  This is

								both ugly and confusing, so unless you&rsquo;re writing a very simple pattern, it&rsquo;s

								often easiest to call <code>grep -E</code>, for <strong>E</strong>xtended regular expressions, which

								means that lots of characters will have special meanings.</p>


								<p>Authors with 4-letter first names:</p>


								<!-- exec -->


								<pre><code>$ grep -iE '^[a-z]{4} ' ./authors_*

								./authors_contemporary_fic:Eden Robinson

								./authors_sff:John Ronald Reuel Tolkien

								./authors_sff:John Brunner

								</code></pre>


								<!-- end -->


								<p>A count of authors named John:</p>


								<!-- exec -->


								<pre><code>$ grep -c '^John ' ./all_authors

								2

								</code></pre>


								<!-- end -->


								<p>Lines in this file matching the words &ldquo;magic&rdquo; or &ldquo;magical&rdquo;:</p>


								<pre><code>$ grep -iE 'magic(al)?' ./index.md

								Pipes are some of the most important magic in the shell.  When the people who

								shell to match groups of files, but with more magic.

								`.` and `*` are magical.  In the particular dialect of regexen understood

								by `grep`, other magical things include:

								use a lot of the magical characters, you have to prefix them with `\`.  This is

								Lines in this file matching the words "magic" or "magical":

								    $ grep -iE 'magic(al)?' ./index.md

								</code></pre>


								<p>Find some &ldquo;-agic&rdquo; words in a big list of words:</p>


								<!-- exec -->


								<pre><code>$ grep -iE '(m|tr|pel)agic' /usr/share/dict/words

								magic

								magic's

								magical

								magically

								magician

								magician's

								magicians

								pelagic

								tragic

								tragically

								tragicomedies

								tragicomedy

								tragicomedy's

								</code></pre>


								<!-- end -->


								<p><code>grep</code> isn&rsquo;t the only - or even the most important - tool that makes use of

								regular expressions, but it&rsquo;s a good place to start because it&rsquo;s one of the

								fundamental building blocks for so many other operations.  Filtering lists of

								things, matching patterns within collections, and writing concise descriptions

								of how text should be transformed are at the heart of a practical approach to

								Unix-like systems.  Regexen turn out to be a seductively powerful way to do

								these things - so much so that they&rsquo;ve crept their way into text editors,

								databases, and full-featured programming languages.</p>


								<p>There&rsquo;s a dark side to all of this, for the truth about regular expressions is

								that they are ugly, inconsistent, brittle, and <em>incredibly</em> difficult to think

								clearly about.  They take years to master and reward the wielder with great

								power, but they are also a trap: a temptation towards the path of cleverness

								masquerading as wisdom.</p>


								<p style="text-align:center;"> ✑</p>


								<p>I&rsquo;ll be returning to this theme, but for the time being let&rsquo;s move on.  Now

								that we&rsquo;ve established, however haphazardly, some of the basics, let&rsquo;s consider

								their application to a real-world task.</p>


								<hr />


								<h1><a name=a-literary-problem href=#a-literary-problem>#</a> 2. a literary problem</h1>


								<p>The <a href="../literary_environment">previous chapter</a> introduced a bunch of tools

								using contrived examples.  Now we&rsquo;ll look at a real problem, and work through a

								solution by building on tools we&rsquo;ve already covered.</p>


								<p>So on to the problem:  I write poetry.</p>


								<p>{rimshot dot wav}</p>


								<p>Most of the poems I have written are not very good, but lately I&rsquo;ve been

								thinking that I&rsquo;d like to comb through the last ten years' worth and pull

								the least-embarrassing stuff into a single collection.</p>


								<p>I&rsquo;ve hinted at how the contents of my blog are stored as files, but let&rsquo;s take

								a look at the whole thing:</p>


								<pre><code>$ ls -F ~/p1k3/archives/

								1997/  2003/  2009/  bones/     meta/

								1998/  2004/  2010/  chapbook/  winfield/

								1999/  2005/  2011/  cli/       wip/

								2000/  2006/  2012/  colophon/

								2001/  2007/  2013/  europe/

								2002/  2008/  2014/  hack/

								</code></pre>


								<p>(<code>ls</code>, again, just lists files.  <code>-F</code> tells it to append a character that shows

								it what type of file we&rsquo;re looking at, such as a trailing / for directories.

								<code>~</code> is a shorthand that means &ldquo;my home directory&rdquo;, which in this case is

								<code>/home/brennen</code>.)</p>


								<p>Each of the directories here holds other directories.  The ones for each year

								have sub-directories for the months of the year, which in turn contain files

								for the days.  The files are just little pieces of HTML and Markdown and some

								other stuff.  Many years ago, before I had much of an idea how to program, I

								wrote a script to glue them all together into a web page and serve them up to

								visitors.  This all sounds complicated, but all it really means is that if I

								want to write a blog entry, I just open a file and type some stuff.  Here&rsquo;s an

								example for March 1st:</p>


								<!-- exec -->


								<pre><code>$ cat ~/p1k3/archives/2014/3/1

								&lt;h1&gt;Saturday, March 1&lt;/h1&gt;


								&lt;markdown&gt;

								Sometimes I'm going along on a Saturday morning, still a little dazed from the

								night before, and I think something like "I should just go write a detailed

								analysis of hooded sweatshirts".  Mostly these thoughts don't survive contact

								with an actual keyboard.  It's almost certainly for the best.

								&lt;/markdown&gt;

								</code></pre>


								<!-- end -->


								<p>And here&rsquo;s an older one that contains a short poem:</p>


								<!-- took this one out of exec block 'cause later i

								     made a dir out of it... -->


								<pre><code>$ cat ~/p1k3/archives/2012/10/9

								&lt;h1&gt;tuesday, october 9&lt;/h1&gt;


								&lt;freeverse&gt;i am a stateful machine

								i exist in a manifold of consequence

								a clattering miscellany of impure functions

								and side effects&lt;/freeverse&gt;

								</code></pre>


								<p>Notice that <code>&lt;freeverse&gt;</code> bit?  It kind of looks like an HTML tag, but it&rsquo;s

								not.  What it actually does is tell my blog script that it should format the

								text it contains like a poem.  The specifics don&rsquo;t matter for our purposes

								(yet), but this convention is going to come in handy, because the first thing I

								want to do is get a list of all the entries that contain poems.</p>


								<p>Remember <code>grep</code>?</p>


								<pre><code>$ grep -ri '&lt;freeverse&gt;' ~/p1k3/archives &gt; ~/possible_poems

								</code></pre>


								<p>Let&rsquo;s step through this bit by bit:</p>


								<p>First, I&rsquo;m asking <code>grep</code> to search <strong>r</strong>ecursively, <strong>i</strong>gnoring case.

								&ldquo;Recursively&rdquo; just means that every time the program finds a directory, it

								should descend into that directory and search in any files there as well.</p>


								<pre><code>grep -ri

								</code></pre>


								<p>Next comes a pattern to search for.  It&rsquo;s in single quotes because the

								characters <code>&lt;</code> and <code>&gt;</code> have a special meaning to the shell, and here we need

								the shell to understand that it should treat them as literal angle brackets

								instead.</p>


								<pre><code>'&lt;freeverse&gt;'

								</code></pre>


								<p>This is the path I want to search:</p>


								<pre><code>~/p1k3/archives

								</code></pre>


								<p>Finally, because there are so many entries to search, I know the process will

								be slow and produce a large list, so I tell the shell to redirect it to a file

								called <code>possible_poems</code> in my home directory:</p>


								<pre><code>&gt; ~/possible_poems

								</code></pre>


								<p>This is quite a few instances&hellip;</p>


								<pre><code>$ wc -l ~/possible_poems

								679 /home/brennen/possible_poems

								</code></pre>


								<p>&hellip;and it&rsquo;s also not super-pretty to look at:</p>


								<pre><code>$ head -5 ~/possible_poems

								/home/brennen/p1k3/archives/2011/10/14:&lt;freeverse&gt;i've got this friend has a real knack

								/home/brennen/p1k3/archives/2011/4/25:&lt;freeverse&gt;i can't claim to strive for it

								/home/brennen/p1k3/archives/2011/8/10:&lt;freeverse&gt;one diminishes or becomes greater

								/home/brennen/p1k3/archives/2011/8/12:&lt;freeverse&gt;

								/home/brennen/p1k3/archives/2011/1/1:&lt;freeverse&gt;six years on

								</code></pre>


								<p>Still, it&rsquo;s a decent start.  I can see paths to the files I have to check, and

								usually a first line.  Since I use a fancy text editor, I can just go down the

								list opening each file in a new window and copying the stuff I&rsquo;m interested in

								to a new file.</p>


								<p>This is good enough for government work, but what if instead of jumping around

								between hundreds of files, I&rsquo;d rather read everything in one file and just weed

								out the bad ones as I go?</p>


								<pre><code>$ cat `grep -ril '&lt;freeverse&gt;' ~/p1k3/archives` &gt; ~/possible_poems_full

								</code></pre>


								<p>This probably bears some explaining.  <code>grep</code> is still doing all the real work

								here.  The main difference from before is that <code>-l</code> tells grep to just list any

								files it finds which contain a match.</p>


								<pre><code>`grep -ril '&lt;freeverse&gt;' ~/p1k3/archives`

								</code></pre>


								<p>Notice those backticks around the grep command?  This part is a little

								trippier.  It turns out that if you put backticks around something in a

								command, it&rsquo;ll get executed and replaced with its result, which in turn gets

								executed as part of the larger command.  So what we&rsquo;re really saying is

								something like:</p>


								<pre><code>$ cat [all of the files in the blog directory with &lt;freeverse&gt; in them]

								</code></pre>


								<p>Did you catch that?  I just wrote a command that rewrote itself as a

								<em>different</em>, more specific command.  And it appears to have worked on the

								first try:</p>


								<pre><code>$ wc ~/possible_poems_full

								 17628  80980 528699 /home/brennen/possible_poems_full

								</code></pre>


								<p>Welcome to wizard school.</p>


								<hr />


								<h1><a name=programmerthink href=#programmerthink>#</a> 3. programmerthink</h1>


								<p>In the <a href="#a-literary-problem">preceding chapter</a>, I worked through accumulating

								a big piece of text from some other, smaller texts.  I started with a bunch of

								files and wound up with one big file called <code>potential_poems_full</code>.</p>


								<p>Let&rsquo;s talk for a minute about how programmers approach problems like this one.

								What I&rsquo;ve just done is sort of an old-school humanities take on things:

								Metaphorically speaking, I took a book off the shelf and hauled it down to the

								copy machine to xerox a bunch of pages, and now I&rsquo;m going to start in on them

								with a highlighter and some Post-Its or something.  A process like this will

								often trigger a cascade of questions in the programmer-mind:</p>


								<ul>

								<li>What if, halfway through the project, I realize my selection criteria were all

								wrong and have to backtrack?</li>

								<li>What if I discover corrections that also need to be made in the source documents?</li>

								<li>What if I want to access metadata, like the original location of a file?</li>

								<li>What if I want to quickly re-order the poems according to some new criteria?</li>

								<li>Why am I storing the same text in two different places?</li>

								</ul>


								<p>A unifying theme of these questions is that they could all be answered by

								involving a little more abstraction.</p>


								<p style="text-align:center;"> ★</p>


								<p>Some kinds of abstraction are so common in the physical world that we can

								forget they&rsquo;re part of a sophisticated technology.  For example, a good deal of

								bicycle maintenance can be accomplished with a cheap multi-tool containing a

								few different sizes of hex wrench and a couple of screwdrivers.</p>


								<p>A hex wrench or screwdriver doesn&rsquo;t really know anything about bicycles.  All

								it <em>really</em> knows about is fitting into a space and allowing torque to be

								applied.  Standardized fasteners and adjustment mechanisms on a bicycle ensure

								that the work can be done anywhere, by anyone with a certain set of tools.

								Standard tools mean that if you can work on a particular bike, you can work on

								<em>most</em> bikes, and even on things that aren&rsquo;t bikes at all, but were designed by

								people with the same abstractions in mind.</p>


								<p>The relationship between a wrench, a bolt, and the purpose of a bolt is a lot

								like something we call <em>indirection</em> in software.  Programs like <code>grep</code> or

								<code>cat</code> don&rsquo;t really know anything about poetry.  All they <em>really</em> know about is

								finding lines of text in input, or sticking inputs together.  Files, lines, and

								text are like standardized fasteners that allow a user who can work on one kind

								of data (be it poetry, a list of authors, the source code of a program) to use

								the same tools for other problems and other data.</p>


								<p style="text-align:center;"> ★</p>


								<p>When I first started writing stuff on the web, I edited a page &mdash; a single HTML

								file &mdash; by hand.  When the entries on my nascent blog got old, I manually

								cut-and-pasted them to archive files with names like <code>old_main97.html</code>, which

								held all of the stuff I&rsquo;d written in 1997.</p>


								<p>I&rsquo;m not holding this up as an example of youthful folly.  In fact, it worked

								fine, and just having a single, static file that you can open in any text

								editor has turned out to be a <em>lot</em> more future-proof than the sophisticated

								blogging software people were starting to write at the time.</p>


								<p>And yet.  Something about this habit nagged at my developing programmer mind

								after a few years.  It was just a little bit too manual and repetitive, a

								little bit silly to have to write things like a table of contents by hand, or

								move entries around by copy-and-pasting them to different files.  Since I knew

								the date for each entry, and wanted to make them navigable on that basis, why

								not define a directory structure for the years and months, and then write a

								file to hold each day?  That way, all I&rsquo;d have to do is concatenate the files

								in one directory to display any given month:</p>


								<pre><code>$ cat ~/p1k3/archives/2014/1/* | head -10

								&lt;h1&gt;Sunday, January 12&lt;/h1&gt;


								&lt;h2&gt;the one casey is waiting for&lt;/h2&gt;


								&lt;freeverse&gt;

								after a while

								the thing about drinking

								is that it just feeds

								what you drink to kill

								and kills

								</code></pre>


								<p>I ultimately wound up writing a few thousand lines of Perl to do the actual

								work, but the essential idea of the thing is still little more than invoking

								<code>cat</code> on some stuff.</p>


								<p>I didn&rsquo;t know the word for it at the time, but what I was reaching for was a

								kind of indirection.  By putting blog posts in a specific directory layout, I

								was creating a simple model of the temporal structure that I considered their

								most important property.  Now, if I want to write commands that ask questions

								about my blog posts or re-combine them in certain ways, I can address my

								concerns to this model.  Maybe, for example, I want a rough idea how many words

								I&rsquo;ve written in blog posts so far in 2014:</p>


								<pre><code>$ find ~/p1k3/archives/2014/ -type f | xargs cat | wc -w

								6677

								</code></pre>


								<p><code>xargs</code> is not the most intuitive command, but it&rsquo;s useful and common enough to

								explain here.  At the end of last chapter, when I said:</p>


								<pre><code>$ cat `grep -ril '&lt;freeverse&gt;' ~/p1k3/archives` &gt; ~/possible_poems_full

								</code></pre>


								<p>I could also have written this as:</p>


								<pre><code>$ grep -ril '&lt;freeverse&gt;' ~/p1k3/archives | xargs cat &gt; ~/possible_poems_full

								</code></pre>


								<p>What this does is take its input, which starts like:</p>


								<pre><code>/home/brennen/p1k3/archives/2002/10/16

								/home/brennen/p1k3/archives/2002/10/27

								/home/brennen/p1k3/archives/2002/10/10

								</code></pre>


								<p>&hellip;and run <code>cat</code> on all the things in it:</p>


								<pre><code>cat /home/brennen/p1k3/archives/2002/10/16 /home/brennen/p1k3/archives/2002/10/27 /home/brennen/p1k3/archives/2002/10/10 ...

								</code></pre>


								<p>It can be a better idea to use <code>xargs</code>, because while backticks are

								incredibly useful, they have some limitations.  If you&rsquo;re dealing with a very

								large list of files, for example, you might exceed the maximum allowed length

								for arguments to a command on your system.  <code>xargs</code> is smart enough to know

								that limit and run <code>cat</code> more than once if needed.</p>


								<p><code>xargs</code> is actually sort of a pain to think about, and will make you jump

								through some irritating hoops if you have spaces or other weirdness in your

								filenames, but I wind up using it quite a bit.</p>


								<p>Maybe I want to see a table of contents:</p>


								<!-- exec -->


								<pre><code>$ find ~/p1k3/archives/2014/ -type d | xargs ls -v | head -10

								/home/brennen/p1k3/archives/2014/:

								1

								2

								3

								4


								/home/brennen/p1k3/archives/2014/1:

								5

								12

								14

								</code></pre>


								<!-- end -->


								<p>Or find the subtitles I used in 2013:</p>


								<!-- exec -->


								<pre><code>$ find ~/p1k3/archives/2012/ -type f | xargs perl -ne 'print "$1\n" if m{&lt;h2&gt;(.*?)&lt;/h2&gt;}'

								pursuit

								fragment

								this poem again

								i'll do better next time

								timebinding animals

								more observations on gear nerdery &amp;amp; utility fetishism

								thrift

								A miracle, in fact, means work

								&lt;em&gt;technical notes for late october&lt;/em&gt;, or &lt;em&gt;it gets dork out earlier these days&lt;/em&gt;

								radio

								light enough to travel

								12:06am

								"figures like Heinlein and Gingrich"

								</code></pre>


								<!-- end -->


								<p>The crucial thing about this is that the filesystem <em>itself</em> is just like <code>cat</code>

								and <code>grep</code>:  It doesn&rsquo;t know anything about blogs (or poetry), and it&rsquo;s

								basically indifferent to the actual <em>structure</em> of a file like

								<code>~/p1k3/archives/2014/1/12</code>.  What the filesystem knows is that there are files

								with certain names in certain places.  It need not know anything about the

								<em>meaning</em> of those names in order to be useful; in fact, it&rsquo;s best if it stays

								agnostic about the question, for this enables us to assign our own meaning to a

								structure and manipulate that structure with standard tools.</p>


								<p style="text-align:center;"> ★</p>


								<p>Back to the problem at hand:  I have this collection of files, and I know how

								to extract the ones that contain poems.  My goal is to see all the poems and

								collect the subset of them that I still find worthwhile.  Just knowing how to

								grep and then edit a big file solves my problem, in a basic sort of way.  And

								yet: Something about this nags at my mind.  I find that, just as I can already

								use standard tools and the filesystem to ask questions about all of my blog

								posts in a given year or month, I would like to be able to ask questions about

								the set of interesting poems.</p>


								<p>If I want the freedom to execute many different sorts of commands against this

								set of poems, it begins to seem that I need a model.</p>


								<p>When programmers talk about models, they often mean something that people in

								the sciences would recognize:  We find ways to represent the arrangement of

								facts so that we can think about them.  A structured representation of things

								often means that we can <em>change</em> those things, or at least derive new

								understanding of them.</p>


								<p style="text-align:center;"> ★</p>


								<p>At this point in the narrative, I could pretend that my next step is

								immediately obvious, but in fact it&rsquo;s not.  I spend a couple of days thinking

								off and on about how to proceed, scribbling notes during bus rides and while

								drinking beers at the pizza joint down the street.  I assess and discard ideas

								which fall into a handful of broad approaches:</p>


								<ul>

								<li>Store blog entries in a relational database system which would allow me to

								associate them with data like &ldquo;this entry is in a collection called &lsquo;ok

								poems&rsquo;&rdquo;.</li>

								<li>Selectively build up a file containing the list of files with ok poems, and use

								it to do other tasks.</li>

								<li>Define a format for metadata that lives within entry files.</li>

								<li>Turn each interesting file into a directory of its own which contains a file

								with the original text and another file with metadata.</li>

								</ul>


								<p>I discard the relational database idea immediately:  I like working with files,

								and I don&rsquo;t feel like abandoning a model that&rsquo;s served me well for my entire

								adult life.</p>


								<p>Building up an index file to point at the other files I&rsquo;m working with has a

								certain appeal.  I&rsquo;m already most of the way there with the <code>grep</code> output in

								<code>potential_poems</code>. It would be easy to write shell commands to add, remove,

								sort, and search entries.  Still, it doesn&rsquo;t feel like a very satisfying

								solution unto itself.  I&rsquo;d like to know that an entry is part of the collection

								just by looking at the entry, without having to cross-reference it to a list

								somewhere else.</p>


								<p>What about putting some meaningful text in the file itself?  I thought about

								a bunch of different ways to do this, some of them really complicated, and

								eventually arrived at this:</p>


								<pre><code>&lt;!-- collection: ok-poems --&gt;

								</code></pre>


								<p>The <code>&lt;!-- --&gt;</code> bits are how you define a comment in HTML, which means that

								neither my blog code nor web browsers nor my text editor have to know anything

								about the format, but I can easily find files with certain values.  Check it:</p>


								<pre><code>$ find ~/p1k3/archives -type f | xargs perl -ne 'print "$ARGV[0]: $1 -&gt; $2\n" if m{&lt;!-- ([a-z]+): (.*?) --&gt;};'

								/home/brennen/p1k3/archives/2014/2/9: collection -&gt; ok-poems

								</code></pre>


								<p>That&rsquo;s an ugly one-liner, and I haven&rsquo;t explained half of what it does, but the

								comment format actually seems pretty workable for this.  It&rsquo;s a little tacky to

								look at, but it&rsquo;s simple and searchable.</p>


								<p>Before we settle, though, let&rsquo;s turn to the notion of making each entry into a

								directory that can contain some structured metadata in a separate file.

								Imagine something like:</p>


								<pre><code>$ ls ~/p1k3/archives/2013/2/9

								index  Meta

								</code></pre>


								<p>Here I use the name &ldquo;index&rdquo; for the main part of the entry because it&rsquo;s a

								convention of web sites for the top-level page in a directory to be called

								something like <code>index.html</code>.  As it happens, my blog software already supports

								this kind of file layout for entries which contain multiple parts, image files,

								and so forth.</p>


								<pre><code>$ head ~/p1k3/archives/2013/2/9/index

								&lt;h1&gt;saturday, february 9&lt;/h1&gt;


								&lt;freeverse&gt;

								midwinter midafternoon; depressed as hell

								sitting in a huge cabin in the rich-people mountains

								writing a sprawl, pages, of melancholic midlife bullshit


								outside the snow gives way to broken clouds and the

								clear unyielding light of the high country sun fills


								$ cat ~/p1k3/archives/2013/2/9/Meta

								collection: ok-poems

								</code></pre>


								<p>It would then be easy to <code>find</code> files called <code>Meta</code> and grep them for

								<code>collection: ok-poems</code>.</p>


								<p>What if I put metadata right in the filename itself, and dispense with the grep

								altogether?</p>


								<pre><code>$ ls ~/p1k3/archives/2013/2/9

								index  meta-ok-poem


								$ find ~/p1k3/archives -name 'meta-ok-poem'

								/home/brennen/archives/2013/2/9/meta-ok-poem

								</code></pre>


								<p>There&rsquo;s a lot to like about this.  For one thing, it&rsquo;s immediately visible in a

								directory listing.  For another, it doesn&rsquo;t require searching through thousands

								of lines of text to extract a specific string.  If a directory has a

								<code>meta-ok-poem</code> in it, I can be pretty sure that it will contain an interesting

								<code>index</code>.</p>


								<p>What are the downsides?  Well, it requires transforming lots of text files into

								directories-containing-files.  I might automate that process, but it&rsquo;s still a

								little tedious and it makes the layout of the entry archive more complicated

								overall.  There&rsquo;s a cost to doing things this way.  It lets me extend my

								existing model of a blog entry to include arbitrary metadata, but it also adds

								steps to writing or finding blog entries.</p>


								<p>Abstractions usually cost you something.  Is this one worth the hassle?

								Sometimes the best way to answer that question is to start writing code that

								handles a given abstraction.</p>


								<hr />


								<h1><a name=script href=#script>#</a> 4. script</h1>


								<p>Back in chapter 1, I said that &ldquo;the way you use the computer is often just to write

								little programs that invoke other programs&rdquo;.  In fact, we&rsquo;ve already gone over a

								bunch of these.  Grepping through the text of a previous chapter should pull

								up some good examples:</p>


								<!-- exec -->


								<pre><code>$ grep -E '\$ [a-z]+.*\| ' ../literary_environment/index.md

								    $ sort authors_* | uniq -c

								    $ sort authors_* | uniq &gt; ./all_authors

								    $ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | tail -1

								    $ sort authors_* | uniq | wc -l

								    $ sort colors | uniq -i | tail -1

								    $ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3

								    $ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3

								    $ sort -k1 all_authors.tsv | expand -t14

								    $ paste firstnames lastnames | sort -k2 | expand -t12

								    $ cat ./authors_* | grep 'Vanessa'

								</code></pre>


								<!-- end -->


								<p>None of these one-liners do all that much, but they all take input of one sort

								or another and apply one or more transformations to it.  They&rsquo;re little formal

								sentences describing how to make one thing into another, which is as good a

								definition of programming as most.  Or at least this is a good way to describe

								programming-in-the-small.  (A lot of the programs we use day-to-day are more

								like essays, novels, or interminable Fantasy series where every character you

								like dies horribly than they are like individual sentences.)</p>


								<p>One-liners like these are all well and good when you&rsquo;re staring at a terminal,

								trying to figure something out - but what about when you&rsquo;ve already figured it out and

								you want to repeat it in the future?</p>


								<p>It turns out that Bash has you covered.  Since shell commands are just text,

								they can live in a text file as easily as they can be typed.</p>


								<h2><a name=learn-you-an-editor href=#learn-you-an-editor>#</a> learn you an editor</h2>


								<p>We&rsquo;ve skirted the topic so far, but now that we&rsquo;re talking about writing out

								text files in earnest, you&rsquo;re going to want a text editor.</p>


								<p>My editor is where I spend most of my time that isn&rsquo;t in a web browser, because

								it&rsquo;s where I write both code and prose.  It turns out that the features which

								make a good code editor overlap a lot with the ones that make a good editor of

								English sentences.</p>


								<p>So what should you use?  Well, there have been other contenders in recent

								years, but in truth nothing comes close to dethroning the Great Old Ones of

								text editing.  Emacs is a creature both primal and sophisticated, like an

								avatar of some interstellar civilization that evolved long before multicellular

								life existed on earth and seeded the galaxy with incomprehensible artefacts and

								colossal engineering projects.  Vim is like a lovable chainsaw-studded robot

								with the most elegant keyboard interface in history secretly emblazoned on its

								shining diamond heart.</p>


								<p>It&rsquo;s worth the time it takes to learn one of the serious editors, but there are

								easier places to start.  Nano, for example, is easy to pick up, and should be

								available on most systems.  To start it, just say:</p>


								<pre><code>$ nano file

								</code></pre>


								<p>You should see something like this:</p>


								<p style="text-align:center;"> <img src="images/nano.png" alt="nano" /></p>


								<p>Arrow keys will move your cursor around, and typing stuff will make it appear

								in the file.  This is pretty much like every other editor you&rsquo;ve ever used.  If

								you haven&rsquo;t used Nano before, that stuff along the bottom of the terminal is a

								reference to the most commonly used commands.  <code>^</code> is a convention for &ldquo;Ctrl&rdquo;,

								so <code>^O</code> means Ctrl-o (the case of the letter doesn&rsquo;t actually matter), which

								will save the file you&rsquo;re working on.  Ctrl-x will quit, which is probably the

								first important thing to know about any given editor.</p>


								<h2><a name=d-i-y-utilities href=#d-i-y-utilities>#</a> d.i.y. utilities</h2>


								<p>So back to putting commands in text files.  Here&rsquo;s a file I just created in

								my editor:</p>


								<!-- exec -->


								<pre><code>$ cat okpoems

								#!/bin/bash


								# find all the marker files and get the name of

								# the directory containing each

								find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname


								exit 0

								</code></pre>


								<!-- end -->


								<p>This is known as a script.  There are a handful of things to notice here.

								First, there&rsquo;s this fragment:</p>


								<pre><code>#!/bin/bash

								</code></pre>


								<p>The <code>#!</code> right at the beginning, followed by the path to a program, is a

								special sequence that lets the kernel know what program should be used to

								interpret the contents of the file.  <code>/bin/bash</code> is the path on the filesystem

								where Bash itself lives.  You might see this referred to as a shebang or a hash

								bang.</p>


								<p>Lines that start with a <code>#</code> are comments, used to describe the code to a human

								reader.  The <code>exit 0</code> tells Bash that the currently running script should exit

								with a status of 0, which basically means &ldquo;nothing went wrong&rdquo;.</p>


								<p>If you examine the directory listing for <code>okpoems</code>, you&rsquo;ll see something

								important:</p>


								<!-- exec -->


								<pre><code>$ ls -l okpoems

								-rwxrwxr-x 1 brennen brennen 163 Apr 19 00:08 okpoems

								</code></pre>


								<!-- end -->


								<p>That looks pretty cryptic.  For the moment, just remember that those little

								<code>x</code>s in the first bit mean that the file has been marked e<strong>x</strong>ecutable.  We

								accomplish this by saying something like:</p>


								<pre><code>$ chmod +x ./okpoems

								</code></pre>


								<p>Once that&rsquo;s done, it and the shebang line in combination mean that typing

								<code>./okpoems</code> will have the same effect as typing <code>bash okpoems</code>:</p>


								<!-- exec -->


								<pre><code>$ ./okpoems

								/home/brennen/p1k3/archives/2013/2/9

								/home/brennen/p1k3/archives/2012/3/17

								/home/brennen/p1k3/archives/2012/3/26

								</code></pre>


								<!-- end -->


								<h2><a name=heavy-lifting href=#heavy-lifting>#</a> heavy lifting</h2>


								<p><code>okpoems</code> demonstrates the basics, but it doesn&rsquo;t do very much.  Here&rsquo;s

								a script with a little more substance to it:</p>


								<!-- exec -->


								<pre><code>$ cat markpoem

								#!/bin/bash


								# $1 is the first parameter to our script

								POEM=$1


								# Complain and exit if we weren't given a path:

								if [ ! $POEM ]; then

								  echo 'usage: markpoem &lt;path&gt;'


								  # Confusingly, an exit status of 0 means to the shell that everything went

								  # fine, while any other number means that something went wrong.

								  exit 64

								fi


								if [ ! -e $POEM ]; then

								  echo "$POEM not found"

								  exit 66

								fi


								echo "marking $POEM an ok poem"


								POEM_BASENAME=$(basename $POEM)


								# If the target is a plain file instead of a directory, make it into

								# a directory and move the content into $POEM/index:

								if [ -f $POEM ]; then

								  echo "making $POEM into a directory, moving content to"

								  echo "  $POEM/index"

								  TEMPFILE="/tmp/$POEM_BASENAME.$(date +%s.%N)"

								  mv $POEM $TEMPFILE

								  mkdir $POEM

								  mv $TEMPFILE $POEM/index

								fi


								if [ -d $POEM ]; then

								  # touch(1) will either create the file or update its timestamp:

								  touch $POEM/meta-ok-poem

								else

								  echo "something broke - why isn't $POEM a directory?"

								  file $POEM

								fi


								# Signal that all is copacetic:

								echo kthxbai

								exit 0

								</code></pre>


								<!-- end -->


								<p>Both of these scripts are imperfect, but they were quick to write, they&rsquo;re made

								out of standard commands, and I don&rsquo;t yet hate myself for them:  All signs that

								I&rsquo;m not totally on the wrong track with the <code>meta-ok-poem</code> abstraction, and

								could live with it as part of an ongoing writing project.  <code>okpoems</code> and

								<code>markpoem</code> would also be easy to use with custom keybindings in my editor.  In

								a few more lines of code, I can build a system to wade through the list of

								candidate files and quickly mark the interesting ones.</p>


								<h2><a name=generality href=#generality>#</a> generality</h2>


								<p>So what&rsquo;s lacking here?  Well, probably a bunch of things, feature-wise.  I can

								imagine writing a script to unmark a poem, for example.  That said, there&rsquo;s one

								really glaring problem.  &ldquo;Ok poem&rdquo; is only one kind of property a blog entry

								might possess.  Suppose I wanted a way to express that a poem is terrible?</p>


								<p>It turns out I already know how to add properties to an entry.  If I generalize

								just a little, the tools become much more flexible.</p>


								<!-- exec -->


								<pre><code>$ ./addprop /home/brennen/p1k3/archives/2012/3/26 meta-terrible-poem

								marking /home/brennen/p1k3/archives/2012/3/26 with meta-terrible-poem

								kthxbai

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ ./findprop meta-terrible-poem

								/home/brennen/p1k3/archives/2012/3/26

								</code></pre>


								<!-- end -->


								<p><code>addprop</code> is only a little different from <code>markpoem</code>.  It takes two parameters

								instead of one - the target entry and a property to add.</p>


								<!-- exec -->


								<pre><code>$ cat addprop

								#!/bin/bash


								ENTRY=$1

								PROPERTY=$2


								# Complain and exit if we weren't given a path and a property:

								if [[ ! $ENTRY || ! $PROPERTY ]]; then

								  echo "usage: addprop &lt;path&gt; &lt;property&gt;"

								  exit 64

								fi


								if [ ! -e $ENTRY ]; then

								  echo "$ENTRY not found"

								  exit 66

								fi


								echo "marking $ENTRY with $PROPERTY"


								# If the target is a plain file instead of a directory, make it into

								# a directory and move the content into $ENTRY/index:

								if [ -f $ENTRY ]; then

								  echo "making $ENTRY into a directory, moving content to"

								  echo "  $ENTRY/index"


								  # Get a safe temporary file:

								  TEMPFILE=`mktemp`


								  mv $ENTRY $TEMPFILE

								  mkdir $ENTRY

								  mv $TEMPFILE $ENTRY/index

								fi


								if [ -d $ENTRY ]; then

								  touch $ENTRY/$PROPERTY

								else

								  echo "something broke - why isn't $ENTRY a directory?"

								  file $ENTRY

								fi


								echo kthxbai

								exit 0

								</code></pre>


								<!-- end -->


								<p>Meanwhile, <code>findprop</code> is more or less <code>okpoems</code>, but with a parameter for the

								property to find:</p>


								<!-- exec -->


								<pre><code>$ cat findprop

								#!/bin/bash


								if [ ! $1 ]

								then

								  echo "usage: findprop &lt;property&gt;"

								  exit

								fi


								# find all the marker files and get the name of

								# the directory containing each

								find ~/p1k3/archives -name $1 | xargs -n1 dirname


								exit 0

								</code></pre>


								<!-- end -->


								<p>These scripts aren&rsquo;t much more complicated than their poem-specific

								counterparts, but now they can be used to solve problems I haven&rsquo;t even thought

								of yet, and included in other scripts that need their functionality.</p>


								<hr />


								<h1><a name=general-purpose-programmering href=#general-purpose-programmering>#</a> 5. general purpose programmering</h1>


								<p>I didn&rsquo;t set out to write a book about programming, <em>as such</em>, but because

								programming and the command line are so inextricably linked, this text

								draws near the subject almost of its own accord.</p>


								<p>If you&rsquo;re not terribly interested in programming, this chapter can easily

								enough be skipped.  It&rsquo;s more in the way of philosophical rambling than

								concrete instruction, and will be of most use to those with an existing

								background in writing code.</p>


								<p style="text-align:center;"> ✢</p>


								<p>If you&rsquo;ve used computers for more than a few years, you&rsquo;re probably viscerally

								aware that most software is fragile and most systems decay.  In the time since

								I took my first tentative steps into the little world of a computer (a friend&rsquo;s

								dad&rsquo;s unidentifiable gaming machine, my own father&rsquo;s blue monochrome Zenith

								laptop, the Apple II) the churn has been overwhelming.  By now I&rsquo;ve learned my

								way around vastly more software &mdash; operating systems, programming languages and

								development environments, games, editors, chat clients, mail systems &mdash; than I

								presently could use if I wanted to.  Most of it has gone the way of some

								ancient civilization, surviving (if at all) only in faint, half-understood

								cultural echoes and occasional museum-piece displays.  Every user of technology

								becomes, in time, a refugee from an irretrievably recent past.</p>


								<p>And yet, despite all this, the shell endures.  Most of the ideas in this book

								are older than I am.  Most of them could have been applied in 1994 or

								thereabouts, when I first logged on to multiuser systems running AT&amp;T Unix.

								Since the early 1990s, systems built on a fundamental substrate of Unix-like

								behavior and abstractions have proliferated wildly, becoming foundational at

								once to the modern web, the ecosystem of free and open software, and the

								technological dominance ca. 2014 of companies like Apple, Google, and Facebook.</p>


								<p>Why is this, exactly?</p>


								<p style="text-align:center;"> ✣</p>


								<p>As I&rsquo;ve said (and hopefully shown), the commands you write in your shell

								are essentially little programs.  Like other programs, they can be stored

								for later use and recombined with other commands, creating new uses for

								your ideas.</p>


								<p>It would be hard to say that there&rsquo;s any <em>one</em> reason command line environments

								remain so vital after decades of evolution and hard-won refinement in computer

								interfaces, but it seems like this combinatory nature is somewhere near the

								heart of it.  The command line often lacks the polish of other interfaces we

								depend on, but in exchange it offers a richness and freedom of expression

								rarely seen elsewhere, and invites its users to build upon its basic

								facilities.</p>


								<p>What is it that makes last chapter&rsquo;s <code>addprop</code> preferable to the more specific

								<code>markpoem</code>?  Let&rsquo;s look at an alternative implementation of <code>markpoem</code>:</p>


								<!-- exec -->


								<pre><code>$ cat simple_markpoem

								#!/bin/bash


								addprop $1 meta-ok-poem

								</code></pre>


								<!-- end -->


								<p>Is this script trivial?  Absolutely.  It&rsquo;s so trivial that it barely seems to

								exist, because I already wrote <code>addprop</code> to do all the heavy lifting and play

								well with others, freeing us to imagine new uses for its central idea without

								worrying about the implementation details.</p>


								<p>Unlike <code>markpoem</code>, <code>addprop</code> doesn&rsquo;t know anything about poetry.  All it knows

								about, in fact, is putting a file (or three) in a particular place.  And this

								is in keeping with a basic insight of Unix:  Pieces of software that do one

								very simple thing generalize well.  Good command line tools are like a hex

								wrench, a hammer, a utility knife:  They embody knowledge of turning, of

								striking, of cutting &mdash; and with this kind of knowledge at hand, the user can

								change the world even though no individual tool is made with complete knowledge

								of the world as a whole.  There&rsquo;s a lot of power in the accumulation of small

								competencies.</p>


								<p>Of course, if your code is only good at one thing, to be of any use, it has to

								talk to code that&rsquo;s good at other things.  There&rsquo;s another basic insight in the

								Unix tradition:  Tools should be composable.  All those little programs have to

								share some assumptions, have to speak some kind of trade language, in order to

								combine usefully.  Which is how we&rsquo;ve arrived at standard IO, pipelines,

								filesystems, and text as as a lowest-common-denominator medium of exchange.  If

								you think about most of these things, they have some very rough edges, but they

								give otherwise simple tools ways to communicate without becoming

								super-complicated along the way.</p>


								<p style="text-align:center;"> ✤</p>


								<p>What is the command line?</p>


								<p>The command line is an environment of tool use.</p>


								<p>So are kitchens, workshops, libraries, and programming languages.</p>


								<p style="text-align:center;"> ✥</p>


								<p>Here&rsquo;s a confession:  I don&rsquo;t like writing shell scripts very much, and I

								can&rsquo;t blame anyone else for feeling the same way.</p>


								<p>That doesn&rsquo;t mean you shouldn&rsquo;t <em>know</em> about them, or that you shouldn&rsquo;t

								<em>write</em> them.  I write little ones all the time, and the ability to puzzle

								through other people&rsquo;s scripts comes in handy.  Oftentimes, the best, most

								tasteful way to automate something is to build a script out of the commonly

								available commands.  The standard tools are already there on millions of

								machines.  Many of them have been pretty well understood for a generation, and

								most will probably be around for a generation or three to come.  They do neat

								stuff.  Scripts let you build on ideas you&rsquo;ve already worked out, and give

								repeatable operations a memorable, user-friendly name.  They encourage reuse of

								existing programs, and help express your ideas to people who&rsquo;ll come after you.</p>


								<p>One of the reliable markers of powerful software is that it can be scripted: It

								extends to its users some of the same power that its authors used in creating

								it.  Scriptable software is to some extent <em>living</em> software.  It&rsquo;s a book that

								you, the reader, get to help write.</p>


								<p>In all these ways, shell scripts are wonderful, a little bit magical, and

								quietly indispensable to the machinery of modern civilization.</p>


								<p>Unfortunately, in all the ways that a shell like Bash is weird, finicky, and

								covered in 40 years of incidental cruft, long-form Bash scripts are even worse.

								Bash is a useful glue language, particularly if you&rsquo;re already comfortable

								wiring commands together.  Syntactic and conceptual innovations like pipes are

								beautiful and necessary.  What Bash is <em>not</em>, despite its power, is a very good

								general purpose programming language.  It&rsquo;s just not especially good at things

								like math, or complex data structures, or not looking like a punctuation-heavy

								variety of alphabet soup.</p>


								<p>It turns out that there&rsquo;s a threshold of complexity beyond which life becomes

								easier if you switch from shell scripting to a more robust language.  Just

								where this threshold is located varies a lot between users and problems, but I

								often think about switching languages before a script gets bigger than I can

								view on my screen all at once.  <code>addprop</code> is a good example:</p>


								<!-- exec -->


								<pre><code>$ wc -l ../script/addprop

								41 ../script/addprop

								</code></pre>


								<!-- end -->


								<p>41 lines is a touch over what fits on one screen in the editor I usually use.

								If I were going to add much in the way of features, I&rsquo;d think pretty hard about

								porting it to another language first.</p>


								<p>What&rsquo;s cool is that if you know a language like C, Python, Perl, Ruby, PHP, or

								JavaScript, your code can participate in the shell environment as a first class

								citizen simply by respecting the conventions of standard IO, files, and command

								line arguments.  Often, in order to create a useful utility, it&rsquo;s only

								necessary to deal with <code>STDIN</code>, or operate on a particular sort of file, and

								most languages offer simple conventions for doing these things.</p>


								<p style="text-align:center;"> *</p>


								<p>I think the shell can be taught and understood as a humane environment, despite

								all of its ugliness and complication, because it offers the materials of its

								own construction to its users, whatever their concerns.  The writer, the

								philosopher, the scientist, the programmer:  Files and text and pipes know

								little enough about these things, but in their very indifference to the

								specifics of any one complex purpose, they&rsquo;re adaptable to the basic needs of

								many.  Simple utilities which enact simple kinds of knowledge survive and

								recombine because there is a wisdom to be found in small things.</p>


								<p>Files and text know nothing about poetry, nothing in particular of the human

								soul.  Neither do pen and ink, printing presses or codex books, but somehow we

								got Shakespeare and Montaigne.</p>


								<hr />


								<h1><a name=one-of-these-things-is-not-like-the-others href=#one-of-these-things-is-not-like-the-others>#</a> 6. one of these things is not like the others</h1>


								<p>If you&rsquo;re the sort of person who took a few detours into the history of

								religion in college, you might be familiar with some of the ways people used to

								do textual comparison.  When pen, paper, and typesetting were what scholars had

								to work with, they did some fairly sophisticated things in order to expose the

								relationships between multiple pieces of text.</p>


								<p style="text-align:center;"> <img src="images/throckmorton_small.jpg" height=320 width=470></p>


								<p>Here&rsquo;s a book I got in college:  <em>Gospel Parallels: A Comparison of the

								Synoptic Gospels</em>, Burton H. Throckmorton, Jr., Ed.  It breaks up three books

								from the New Testament by the stories and themes that they contain, and shows

								the overlapping sections of each book that contain parallel texts.  You can

								work your way through and see what parts only show up in one book, or in two

								but not the other, or in all three.  Pages are arranged like so:</p>


								<pre>

								                 § JESUS DOES SOME STUFF

								     ________________________________________________

								    |  MAT            |    MAR             |  LUK    |

								    |-----------------+--------------------+---------|

								    | Stuff           |                    |         |

								    |                 | Stuff              |         |

								    |                 | Stuff              | Stuff   |

								    |                 | Stuff              |         |

								    |                 | Stuff              |         |

								    |                 |                    |         |

								</pre>


								<p>The way I understand it, a book like this one only scratches the surface of the

								field.  Tools like this support a lot of theory about which books copied each

								other and how, and what other sources they might have copied that we&rsquo;ve since

								lost.</p>


								<p>This is some <em>incredibly</em> dry material, even if you kind of dig thinking about

								the questions it addresses.  It takes a special temperament to actually sit

								poring over fragmentary texts in ancient languages and do these painstaking

								comparisons.  Even if you&rsquo;re a writer or editor and work with a lot of

								revisions of a text, there&rsquo;s a good chance you rarely do this kind of

								comparison on your own work, because that shit is <em>tedious</em>.</p>


								<h2><a name=diff href=#diff>#</a> diff</h2>


								<p>It turns out that academics aren&rsquo;t the only people who need tools for comparing

								different versions of a text.  Working programmers, in fact, need to do this

								<em>constantly</em>.  Programmers are also happiest when putting off the <em>actual</em> task

								at hand to solve some incidental problem that cropped up along the way, so by

								now there are a lot of ways to say &ldquo;here&rsquo;s how this file is different from this

								file&rdquo;, or &ldquo;here&rsquo;s how this file is different from itself a year ago&rdquo;.</p>


								<p>Let&rsquo;s look at a couple of shell scripts from an earlier chapter:</p>


								<!-- exec -->


								<pre><code>$ cat ../script/okpoems

								#!/bin/bash


								# find all the marker files and get the name of

								# the directory containing each

								find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname


								exit 0

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ cat ../script/findprop

								#!/bin/bash


								if [ ! $1 ]

								then

								  echo "usage: findprop &lt;property&gt;"

								  exit

								fi


								# find all the marker files and get the name of

								# the directory containing each

								find ~/p1k3/archives -name $1 | xargs -n1 dirname


								exit 0

								</code></pre>


								<!-- end -->


								<p>It&rsquo;s pretty obvious these are similar files, but do we know what <em>exactly</em>

								changed between them at a glance?  It wouldn&rsquo;t be hard to figure out, once.  If

								you wanted to be really certain about it, you could print them out, set them

								side by side, and go over them with a highlighter.</p>


								<p>Now imagine doing that for a bunch of files, some of them hundreds or thousands

								of lines long.  I&rsquo;ve actually done that before, colored markers and all, but I

								didn&rsquo;t feel smart while I was doing it.  This is a job for software.</p>


								<!-- exec -->


								<pre><code>$ diff ../script/okpoems ../script/findprop

								2a3,8

								&gt; if [ ! $1 ]

								&gt; then

								&gt;   echo "usage: findprop &lt;property&gt;"

								&gt;   exit

								&gt; fi

								&gt;

								5c11

								&lt; find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname

								---

								&gt; find ~/p1k3/archives -name $1 | xargs -n1 dirname

								</code></pre>


								<!-- end -->


								<p>That&rsquo;s not the most human-friendly output, but it&rsquo;s a little simpler than it

								seems at first glance.  It&rsquo;s basically just a way of describing the changes

								needed to turn <code>okpoems</code> into <code>findprop</code>.  The string <code>2a3,8</code> can be read as

								&ldquo;at line 2, add lines 3 through 8&rdquo;.  Lines with a <code>&gt;</code> in front of them are

								added.  <code>5c11</code> can be read as &ldquo;line 5 in the original file becomes line 11 in

								the new file&rdquo;, and the <code>&lt;</code> line is replaced with the <code>&gt;</code> line.  If you wanted,

								you could take a copy of the original file and apply these instructions by hand

								in your text editor, and you&rsquo;d wind up with the new file.</p>


								<p>A lot of people (me included) prefer what&rsquo;s known as a &ldquo;unified&rdquo; diff, because

								it&rsquo;s easier to read and offers context for the changed lines.  We can ask for

								one of these with <code>diff -u</code>:</p>


								<!-- exec -->


								<pre><code>$ diff -u ../script/okpoems ../script/findprop

								--- ../script/okpoems   2014-04-19 00:08:03.321230818 -0600

								+++ ../script/findprop  2014-04-21 21:51:29.360846449 -0600

								@@ -1,7 +1,13 @@

								 #!/bin/bash


								+if [ ! $1 ]

								+then

								+  echo "usage: findprop &lt;property&gt;"

								+  exit

								+fi

								+

								 # find all the marker files and get the name of

								 # the directory containing each

								-find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname

								+find ~/p1k3/archives -name $1 | xargs -n1 dirname


								 exit 0

								</code></pre>


								<!-- end -->


								<p>That&rsquo;s a little longer, and has some metadata we might not always care about,

								but if you look for lines starting with <code>+</code> and <code>-</code>, it&rsquo;s easy to read as

								&ldquo;added these, took away these&rdquo;.  This diff tells us at a glance that we added

								some lines to complain if we didn&rsquo;t get a command line argument, and replaced

								<code>'meta-ok-poem'</code> in the <code>find</code> command with that argument.  Since it shows us

								some context, we have a pretty good idea where those lines are in the file

								and what they&rsquo;re for.</p>


								<p>What if we don&rsquo;t care exactly <em>how</em> the files differ, but only whether they

								do?</p>


								<!-- exec -->


								<pre><code>$ diff -q ../script/okpoems ../script/findprop

								Files ../script/okpoems and ../script/findprop differ

								</code></pre>


								<!-- end -->


								<p>I use <code>diff</code> a lot in the course of my day job, because I spend a lot of time

								needing to know just how two programs differ.  Just as importantly, I often

								need to know how (or whether!) the <em>output</em> of programs differs.  As a concrete

								example, I want to make sure that <code>findprop meta-ok-poem</code> is really a suitable

								replacement for <code>okpoems</code>.  Since I expect their output to be identical, I can

								do this:</p>


								<!-- exec -->


								<pre><code>$ ../script/okpoems &gt; okpoem_output

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ ../script/findprop meta-ok-poem &gt; findprop_output

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ diff -s okpoem_output findprop_output

								Files okpoem_output and findprop_output are identical

								</code></pre>


								<!-- end -->


								<p>The <code>-s</code> just means that <code>diff</code> should explicitly tell us if files are the

								<strong>s</strong>ame.  Otherwise, it&rsquo;d output nothing at all, because there aren&rsquo;t any

								differences.</p>


								<p>As with many other tools, <code>diff</code> doesn&rsquo;t very much care whether it&rsquo;s looking at

								shell scripts or a list of filenames or what-have-you.  If you read the man

								page, you&rsquo;ll find some features geared towards people writing C-like

								programming languages, but its real specialty is just text files with lines

								made out of characters, which works well for lots of code, but certainly could

								be applied to English prose.</p>


								<p>Since I have a couple of versions ready to hand, let&rsquo;s apply this to a text

								with some well-known variations and a bit of a literary legacy.  Here&rsquo;s the

								first day of the Genesis creation narrative in a couple of English

								translations:</p>


								<!-- exec -->


								<pre><code>$ cat genesis_nkj

								In the beginning God created the heavens and the earth.  The earth was without

								form, and void; and darkness was on the face of the deep.  And the Spirit of

								God was hovering over the face of the waters.  Then God said, "Let there be

								light"; and there was light.  And God saw the light, that it was good; and God

								divided the light from the darkness.  God called the light Day, and the darkness

								He called Night.  So the evening and the morning were the first day.

								</code></pre>


								<!-- end -->


								<!-- exec -->


								<pre><code>$ cat genesis_nrsv

								In the beginning when God created the heavens and the earth, the earth was a

								formless void and darkness covered the face of the deep, while a wind from

								God swept over the face of the waters.  Then God said, "Let there be light";

								and there was light.  And God saw that the light was good; and God separated

								the light from the darkness.  God called the light Day, and the darkness he

								called Night.  And there was evening and there was morning, the first day.

								</code></pre>


								<!-- end -->


								<p>What happens if we diff them?</p>


								<!-- exec -->


								<pre><code>$ diff -u genesis_nkj genesis_nrsv

								--- genesis_nkj 2014-05-11 16:28:29.692508461 -0600

								+++ genesis_nrsv    2014-05-11 16:28:29.744508459 -0600

								@@ -1,6 +1,6 @@

								-In the beginning God created the heavens and the earth.  The earth was without

								-form, and void; and darkness was on the face of the deep.  And the Spirit of

								-God was hovering over the face of the waters.  Then God said, "Let there be

								-light"; and there was light.  And God saw the light, that it was good; and God

								-divided the light from the darkness.  God called the light Day, and the darkness

								-He called Night.  So the evening and the morning were the first day.

								+In the beginning when God created the heavens and the earth, the earth was a

								+formless void and darkness covered the face of the deep, while a wind from

								+God swept over the face of the waters.  Then God said, "Let there be light";

								+and there was light.  And God saw that the light was good; and God separated

								+the light from the darkness.  God called the light Day, and the darkness he

								+called Night.  And there was evening and there was morning, the first day.

								</code></pre>


								<!-- end -->


								<p>Kind of useless, right?  If a given line differs by so much as a character,

								it&rsquo;s not the same line.  This highlights the limitations of <code>diff</code> for comparing

								things that</p>


								<ul>

								<li>aren&rsquo;t logically grouped by line</li>

								<li>aren&rsquo;t easily thought of as versions of the same text with some lines changed</li>

								</ul>


								<p>We could edit the files into a more logically defined structure, like

								one-line-per-verse, and try again:</p>


								<!-- exec -->


								<pre><code>$ diff -u genesis_nkj_by_verse genesis_nrsv_by_verse

								--- genesis_nkj_by_verse    2014-05-11 16:51:14.312457198 -0600

								+++ genesis_nrsv_by_verse   2014-05-11 16:53:02.484453134 -0600

								@@ -1,5 +1,5 @@

								-In the beginning God created the heavens and the earth.

								-The earth was without form, and void; and darkness was on the face of the deep.  And the Spirit of God was hovering over the face of the waters.

								+In the beginning when God created the heavens and the earth,

								+the earth was a formless void and darkness covered the face of the deep, while a wind from God swept over the face of the waters.

								 Then God said, "Let there be light"; and there was light.

								-And God saw the light, that it was good; and God divided the light from the darkness.

								-God called the light Day, and the darkness He called Night.  So the evening and the morning were the first day.

								+And God saw that the light was good; and God separated the light from the darkness.

								+God called the light Day, and the darkness he called Night.  And there was evening and there was morning, the first day.

								</code></pre>


								<!-- end -->


								<p>It might be a little more descriptive, but editing all that text just for a

								quick comparison felt suspiciously like work, and anyway the output still

								doesn&rsquo;t seem very useful.</p>


								<h2><a name=wdiff href=#wdiff>#</a> wdiff</h2>


								<p>For cases like this, I&rsquo;m fond of a tool called <code>wdiff</code>:</p>


								<!-- exec -->


								<pre><code>$ wdiff genesis_nkj genesis_nrsv

								In the beginning {+when+} God created the heavens and the [-earth.  The-] {+earth, the+} earth was [-without

								form, and void;-] {+a

								formless void+} and darkness [-was on-] {+covered+} the face of the [-deep.  And the Spirit of-] {+deep, while a wind from+}

								God [-was hovering-] {+swept+} over the face of the waters.  Then God said, "Let there be light";

								and there was light.  And God saw [-the light,-] that [-it-] {+the light+} was good; and God

								[-divided-] {+separated+}

								the light from the darkness.  God called the light Day, and the darkness

								[-He-] {+he+}

								called Night.  [-So the-]  {+And there was+} evening and [-the morning were-] {+there was morning,+} the first day.

								</code></pre>


								<!-- end -->


								<p>Deleted words are surrounded by <code>[- -]</code> and inserted ones by <code>{+ +}</code>.  You can

								even ask it to spit out HTML tags for insertion and deletion&hellip;</p>


								<pre><code>$ wdiff -w '&lt;del&gt;' -x '&lt;/del&gt;' -y '&lt;ins&gt;' -z '&lt;/ins&gt;' genesis_nkj genesis_nrsv

								</code></pre>


								<p>&hellip;and come up with something your browser will render like this:</p>


								<blockquote>

								<p>In the beginning <ins>when</ins> God created the heavens and the <del>earth.  The</del> <ins>earth, the</ins> earth was <del>without

								form, and void;</del> <ins>a

								formless void</ins> and darkness <del>was on</del> <ins>covered</ins> the face of the <del>deep.  And the Spirit of</del> <ins>deep, while a wind from</ins>

								God <del>was hovering</del> <ins>swept</ins> over the face of the waters.  Then God said, "Let there be light";

								and there was light.  And God saw <del>the light,</del> that <del>it</del> <ins>the light</ins> was good; and God

								<del>divided</del> <ins>separated</ins>

								the light from the darkness.  God called the light Day, and the darkness

								<del>He</del> <ins>he</ins>

								called Night.  <del>So the</del>  <ins>And there was</ins> evening and <del>the morning were</del> <ins>there was morning,</ins> the first day.</p>

								</blockquote>


								<p>Burton H. Throckmorton, Jr. this ain&rsquo;t.  Still, it has its uses.</p>


								<hr />


								<h1><a name=the-command-line-as-as-a-shared-world href=#the-command-line-as-as-a-shared-world>#</a> 7. the command line as as a shared world</h1>


								<p>In an earlier chapter, I wrote:</p>


								<blockquote><p>You can think of the shell as a kind of environment you inhabit, in much

								the way your character inhabits an adventure game.</p></blockquote>


								<p>It turns out that sometimes there are other human inhabitants of this

								environment.</p>


								<p>Unix was built on a model known as &ldquo;time-sharing&rdquo;.  This is an idea with a lot

								of history, but the very short version is that when computers were rare and

								expensive, it made sense for lots of people to be able to use them at once.

								This is part of the story of how ideas like e-mail and chat were originally

								born, well before networks took over the world:  As ways for the many users of

								one computer to communicate on the same machine.</p>


								<p>Says Dennis Ritchie:</p>


								<blockquote><p>What we wanted to preserve was not just a good environment in which to do

								programming, but a system around which a fellowship could form. We knew from

								experience that the essence of communal computing, as supplied by

								remote-access, time-shared machines, is not just to type programs into a

								terminal instead of a keypunch, but to encourage close communication.</p></blockquote>


								<p>Times have changed, and while it&rsquo;s mundane to use software that&rsquo;s shared

								between many users, it&rsquo;s not nearly as common as it once was for a bunch of us

								to be logged into the same computer all at once.</p>


								<p style="text-align:center;"> ★</p>


								<p>In the mid 1990s, when I was first exposed to Unix, it was by opening up a

								program called NCSA Telnet on one of the Macs at school and connecting to a

								server called mother.esu1.k12.ne.us.</p>


								<p>NCSA Telnet was a terminal, not unlike the kind that you use to open a shell on

								your own Linux computer, a piece of software that itself emulated actual,

								physical hardware from an earlier era.  Hardware terminals were basically very

								simple computers with keyboards, screens, and just enough networking brains to

								talk to a <em>real</em> computer somewhere else.  You&rsquo;ll still come across these

								scattered around big institutional environments.  The last time I looked over

								the shoulder of an airline checkin desk clerk, for example, I saw green

								monochrome text that was probably coming from an IBM mainframe somewhere

								far away.</p>


								<p>Part of what was exciting about being logged into a computer somewhere else

								was that you could <em>talk to people</em>.</p>


								<p style="text-align:center;"> ★</p>


								<p><em>{This chapter is a work in progress.}</em></p>


								<hr />


								<h1><a name=the-command-line-and-the-web href=#the-command-line-and-the-web>#</a> 8. the command line and the web</h1>


								<p>Web browsers are really complicated these days.  They&rsquo;re full of rendering

								engines, audio and video players, programming languages, development tools,

								databases &mdash; you name it, and there&rsquo;s a fair chance it&rsquo;s in there somewhere.

								The modern web browser is kitchen sink software, and to make matters worse, it

								is <em>totally surrounded</em> by technobabble.  It can take <em>years</em> to come to terms

								with the ocean of words about web stuff and sort out the meaningful ones from

								the snake oil and bureaucratic mysticism.</p>


								<p>All of which can make the web itself seem like a really complicated landscape,

								and obscure the simplicity of its basic design, which is this:</p>


								<p>Some programs pass text around to one another.</p>


								<p>Which might sound familiar.</p>


								<p>The gist of it is that the web is made out of URLs, &ldquo;Uniform Resource

								Locators&rdquo;, which are paths to things.  If you squint, these look kind of like

								paths to files on your filesystem.  When you visit a URL in your browser, it

								asks a server for a certain path, and the server gives it back some text.  When

								you click a button to submit a form, your browser sends some text to the server

								and waits to see what it says back.  The text that gets passed around is

								(usually) written in a language with particular significance to web browsers,

								but if you look at it directly, it&rsquo;s a format that humans can understand.</p>


								<p>Let&rsquo;s illustrate this.  I&rsquo;ve written a really simple web page that lives at

								<a href="http://p1k3.com/hello_world.html"><code>http://p1k3.com/hello_world.html</code></a>.</p>


								<pre><code>$ curl 'https://p1k3.com/hello_world.html'

								&lt;html&gt;

								  &lt;head&gt;

								    &lt;title&gt;hello, world&lt;/title&gt;

								  &lt;/head&gt;


								  &lt;body&gt;

								    &lt;h1&gt;hi everybody&lt;/h1&gt;


								    &lt;p&gt;How are things?&lt;/p&gt;

								  &lt;/body&gt;

								&lt;/html&gt;

								</code></pre>


								<p><code>curl</code> is a program with lots and lots of features &mdash; it too is a little bit

								of a kitchen sink &mdash; but it has one core purpose, which is to grab things from

								URLs and spit them back out.  It&rsquo;s a little bit like <code>cat</code> for things that live

								on the web.  Try the above command with just about any URL you can think of,

								and you&rsquo;ll probably get <em>something</em> back.  Let&rsquo;s try this book:</p>


								<pre><code>$ curl 'https://p1k3.com/userland-book/' | head

								&lt;!DOCTYPE html&gt;

								&lt;html lang=en&gt;

								&lt;head&gt;

								  &lt;meta charset="utf-8"&gt;

								  &lt;title&gt;userland: a book about the command line for humans&lt;/title&gt;

								  &lt;link rel=stylesheet href="userland.css" /&gt;

								  &lt;script src="js/jquery.js" type="text/javascript"&gt;&lt;/script&gt;

								&lt;/head&gt;


								&lt;body&gt;

								</code></pre>


								<p><code>hello_world.html</code> and <code>userland-book</code> are both written in HyperText Markup

								Language.  HTML is just text with a specific kind of structure.  It&rsquo;s been

								around for quite a while now, and has grown up a lot in 20 years, but at heart

								it still looks a lot <a href="http://info.cern.ch/hypertext/WWW/TheProject.html">like it did in 1991</a>.</p>


								<p>The basic idea is that the contents of a web page are marked up with tags.

								A tag looks like this:</p>


								<pre><code>&lt;title&gt;hi!&lt;/title&gt; -,

								 |     |            |

								 |     `- content   |

								 |                  `- closing tag

								 `-opening tag

								</code></pre>


								<p>Sometimes you&rsquo;ll see tags with what are known as &ldquo;attributes&rdquo;:</p>


								<pre><code>&lt;a href="https://p1k3.com/userland-book"&gt;userland&lt;/a&gt;

								</code></pre>


								<p>This is how links are written in HTML.  <code>href="..."</code> tells the browser where to

								go when the user clicks on &ldquo;<a href="http://p1k3.com/userland-book">userland</a>&rdquo;.</p>


								<p>Tags are a way to describe not so much what something <em>looks like</em> as what

								something <em>means</em>.  Browsers are, in large part, big collections of knowledge

								about the meanings of tags and ways to represent those meanings.</p>


								<p>While the browser you use day-to-day has (probably) a graphical interface and

								does all sorts of things impossible to render in a terminal, some of the

								earliest web browsers were entirely text-based, and text-mode browsers still

								exist.  Lynx, which originated at the University of Kansas in the early 1990s,

								is still actively maintained:</p>


								<pre><code>$ lynx -dump 'http://p1k3.com/userland-book/' | head

								                                    userland

								     __________________________________________________________________


								                 [1]# a book about the command line for humans


								   Late last year, [2]a side trip into text utilities got me thinking

								   about how much my writing habits depend on the Linux command line. This

								   struck me as a good hook for talking about the tools I use every day

								   with an audience of mixed technical background.

								</code></pre>


								<p>If you invoke Lynx without any options, it&rsquo;ll start up in interactive mode, and

								you can navigate between links with the arrow keys.  <code>lynx -dump</code> spits a

								rendered version of a page to standard output, with links annotated in square

								brackets and printed as footnotes.  Another useful option here is <code>-listonly</code>,

								which will print just the list of links contained within a page:</p>


								<pre><code>$ lynx -dump -listonly 'http://p1k3.com/userland-book/' | head


								References


								   2. http://p1k3.com/2013/8/4

								   3. http://p1k3.com/userland-book.git

								   4. https://github.com/brennen/userland-book

								   5. http://p1k3.com/userland-book/

								   6. https://twitter.com/brennen

								   9. http://p1k3.com/userland-book/#a-book-about-the-command-line-for-humans

								  10. http://p1k3.com/userland-book/#copying

								</code></pre>


								<p>An alternative to Lynx is w3m, which copes a little more gracefully with the

								complexities of modern web layout.</p>


								<pre><code>$ w3m -dump 'http://p1k3.com/userland-book/' | head

								userland


								━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


								# a book about the command line for humans


								Late last year, a side trip into text utilities got me thinking about how much

								my writing habits depend on the Linux command line. This struck me as a good

								hook for talking about the tools I use every day with an audience of mixed

								technical background.

								</code></pre>


								<p>Neither of these tools can easily replace enormously capable applications like

								Chrome or Firefox, but they have their place in the toolbox, and help to

								demonstrate how the web is built (in part) on principles we&rsquo;ve already seen at

								work.</p>


								<hr />


								<h1><a name=a-miscellany-of-tools-and-techniques href=#a-miscellany-of-tools-and-techniques>#</a> 9. a miscellany of tools and techniques</h1>


								<h2><a name=dict href=#dict>#</a> dict</h2>


								<p>Want to know the definition of a word, or find useful synonyms?</p>


								<pre><code>$ dict concatenate | head -10

								4 definitions found


								From The Collaborative International Dictionary of English v.0.48 [gcide]:


								  Concatenate \Con*cat"e*nate\ (k[o^]n*k[a^]t"[-e]*n[=a]t), v. t.

								     [imp. &amp; p. p. {Concatenated}; p. pr. &amp; vb. n.

								     {Concatenating}.] [L. concatenatus, p. p. of concatenare to

								     concatenate. See {Catenate}.]

								     To link together; to unite in a series or chain, as things

								     depending on one another.

								</code></pre>


								<h2><a name=aspell href=#aspell>#</a> aspell</h2>


								<p>Need to interactively spell-check your presentation notes?</p>


								<pre><code>$ aspell check presentation

								</code></pre>


								<p>Just want a list of potentially-misspelled words in a given file?</p>


								<!-- exec -->


								<pre><code>$ aspell list &lt; ../literary_environment/index.md | sort | uniq -ci | sort -nr | head -5

								     40 td

								     24 Veselka

								     17 Reuel

								     16 Brunner

								     15 Tiptree

								</code></pre>


								<!-- end -->


								<h2><a name=mostcommon href=#mostcommon>#</a> mostcommon</h2>


								<p>Something like that last sequence sure does seem to show up a lot in my work:

								Spit out the <em>n</em> most common lines in the input, one way or another.   Here&rsquo;s

								a little script to be less repetitive about it.</p>


								<!-- exec -->


								<pre><code>$ aspell list &lt; ../literary_environment/index.md | ./mostcommon -i -n5

								     40 td

								     24 Veselka

								     17 Reuel

								     16 Brunner

								     15 Tiptree

								</code></pre>


								<!-- end -->


								<p>This turns out to be pretty simple:</p>


								<!-- exec -->


								<pre><code>$ cat ./mostcommon

								#!/usr/bin/env bash


								# Optionally specify number of lines to show, defaulting to 10:

								TOSHOW=10

								CASEOPT=""


								while getopts ":in:" opt; do

								  case $opt in

								    i)

								      CASEOPT="-i"

								      ;;

								    n)

								      TOSHOW=$OPTARG

								      ;;

								    \?)

								      echo "Invalid option: -$OPTARG" &gt;&amp;2

								      exit 1

								      ;;

								    :)

								      echo "Option -$OPTARG requires an argument." &gt;&amp;2

								      exit 1

								      ;;

								  esac

								done


								# sort and then uniqify STDIN,

								# sort numerically on the first field,

								# chop off everything but $TOSHOW lines of input


								sort &lt; /dev/stdin | uniq -c $CASEOPT | sort -k1 -nr | head -$TOSHOW

								</code></pre>


								<!-- end -->


								<p>Notice, though, that it doesn&rsquo;t handle opening files directly.  If you wanted

								to find the most common lines in a file with it, you&rsquo;d have to say something

								like <code>mostcommon &lt; filename</code> in order to redirect the file to <code>mostcommon</code>&rsquo;s

								input.</p>


								<p>Also notice that most of the script is boilerplate for handling a couple of

								options.  The work is all done in a oneliner.  Worth it?  Maybe not, but an

								interesting exercise.</p>


								<h2><a name=cal-and-ncal href=#cal-and-ncal>#</a> cal and ncal</h2>


								<p>Want to know what the calendar looks like for this month?</p>


								<pre><code>$ cal

								     April 2014

								Su Mo Tu We Th Fr Sa

								       1  2  3  4  5

								 6  7  8  9 10 11 12

								13 14 15 16 17 18 19

								20 21 22 23 24 25 26

								27 28 29 30

								</code></pre>


								<p>How about for September, 1950, in a more compact format?</p>


								<!-- exec -->


								<pre><code>$ ncal -m9 1950

								    September 1950

								Su     3 10 17 24

								Mo     4 11 18 25

								Tu     5 12 19 26

								We     6 13 20 27

								Th     7 14 21 28

								Fr  1  8 15 22 29

								Sa  2  9 16 23 30

								</code></pre>


								<!-- end -->


								<p>Need to know the date of Easter this year?</p>


								<!-- exec -->


								<pre><code>$ ncal -e

								April 20 2014

								</code></pre>


								<!-- end -->


								<h2><a name=seq href=#seq>#</a> seq</h2>


								<p>Need the numbers 1-5?</p>


								<!-- exec -->


								<pre><code>$ seq 1 5

								1

								2

								3

								4

								5

								</code></pre>


								<!-- end -->


								<h2><a name=shuf href=#shuf>#</a> shuf</h2>


								<p>Want to shuffle some lines?</p>


								<!-- exec -->


								<pre><code>$ seq 1 5 | shuf

								2

								1

								4

								3

								5

								</code></pre>


								<!-- end -->


								<h2><a name=ptx href=#ptx>#</a> ptx</h2>


								<p>Want to make a <a href="http://en.wikipedia.org/wiki/Key_Word_in_Context">permuted index</a> of some phrase?</p>


								<!-- exec -->


								<pre><code>$ echo 'i like american music' | ptx

								                              i like   american music

								                                       i like american music

								                                   i   like american music

								                     i like american   music

								</code></pre>


								<!-- end -->


								<h2><a name=figlet href=#figlet>#</a> figlet</h2>


								<p>Need to make ASCII art of some giant letters?</p>


								<!-- exec -->


								<pre><code>$ figlet "R T F M"

								 ____    _____   _____   __  __

								|  _ \  |_   _| |  ___| |  \/  |

								| |_) |   | |   | |_    | |\/| |

								|  _ &lt;    | |   |  _|   | |  | |

								|_| \_\   |_|   |_|     |_|  |_|

								</code></pre>


								<!-- end -->


								<h2><a name=cowsay href=#cowsay>#</a> cowsay</h2>


								<p>How about ASCII art of a <del>cow</del> dragon saying something?</p>


								<!-- exec -->


								<pre><code>$ cowsay -f dragon "RTFM, man"

								 ___________

								&lt; RTFM, man &gt;

								 -----------

								      \                    / \  //\

								       \    |\___/|      /   \//  \\

								            /0  0  \__  /    //  | \ \

								           /     /  \/_/    //   |  \  \

								           @_^_@'/   \/_   //    |   \   \

								           //_^_/     \/_ //     |    \    \

								        ( //) |        \///      |     \     \

								      ( / /) _|_ /   )  //       |      \     _\

								    ( // /) '/,_ _ _/  ( ; -.    |    _ _\.-~        .-~~~^-.

								  (( / / )) ,-{        _      `-.|.-~-.           .~         `.

								 (( // / ))  '/\      /                 ~-. _ .-~      .-~^-.  \

								 (( /// ))      `.   {            }                   /      \  \

								  (( / ))     .----~-.\        \-'                 .~         \  `. \^-.

								             ///.----..&gt;        \             _ -~             `.  ^-`  ^-_

								               ///-._ _ _ _ _ _ _}^ - - - - ~                     ~-- ,.-~

								                                                                  /.-~

								</code></pre>


								<!-- end -->


								<hr />


								<h1><a name=endmatter href=#endmatter>#</a> endmatter</h1>


								<h2><a name=further-reading href=#further-reading>#</a> further reading</h2>


								<ul>

								<li><em>The Unix Programming Environment</em> - Brian W. Kernighan, Rob Pike</li>

								<li><a href="http://cm.bell-labs.com/cm/cs/who/dmr/hist.html">The Evolution of the Unix Time-sharing System</a> - Dennis M. Ritchie</li>

								<li><a href="https://www.youtube.com/watch?v=tc4ROCJYbm0">AT&amp;T Archives: The UNIX Operating System</a> (YouTube)</li>

								<li><a href="https://medium.com/message/tilde-club-i-had-a-couple-drinks-and-woke-up-with-1-000-nerds-a8904f0a2ebf">I had a couple drinks and woke up with 1,000 nerds</a> - Paul Ford</li>

								</ul>


								<h2><a name=code href=#code>#</a> code</h2>


								<p>As of July 2018, source for this work can be found <a

								href="https://code.p1k3.com/gitea/brennen/userland-book">on code.p1k3.com</a>.

								I welcome feedback there, <a href="https://mastodon.social/brennen">on

								Mastodon</a>, or by mail to userland@p1k3.com.</p>


								<h2><a name=copying href=#copying>#</a> copying</h2>


								<p>This work is licensed under a

								<a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">Creative

								Commons Attribution-ShareAlike 4.0 International License</a>.</p>


								<p><a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">

								<img alt="Creative Commons License" src="images/by-sa-4.png" />

								</a></p>


								<hr />

								<script>

								$(document).ready(function () {

								  // ☜ ☝ ☞ ☟ ☆ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪

								  var closed_sigil = 'show';

								  var open_sigil = 'hide';


								  var togglesigil = function (elem) {

								    var sigil = $(elem).html();

								    if (sigil === closed_sigil) {

								      $(elem).html(open_sigil);

								    } else {

								      $(elem).html(closed_sigil);

								    }

								  };


								  $(".details").each(function () {

								    var $this = $(this);

								    var $button = $('<button class=clicker-button>' + closed_sigil + '</button>');

								    var $details_full = $(this).find('.full');


								    $button.click(function (e) {

								      e.preventDefault();

								      $details_full.toggle({

								        duration: 550

								      });

								      togglesigil(this);

								    });


								    $(this).find('.clicker').append($button);

								    $button.show();

								  });


								  $('.details .full').hide();

								});

								</script>

								</body>

								</html>