|
<!DOCTYPE html>
|
|
<html lang=en>
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<title>userland: a book about the command line for humans</title>
|
|
<link rel=stylesheet href="userland.css" />
|
|
<link rel="alternate" type="application/atom+xml" title="changes" href="//p1k3.com/userland-book/feed.xml" />
|
|
<script src="js/jquery.js" type="text/javascript"></script>
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<h1 class=bigtitle>userland</h1>
|
|
<hr />
|
|
|
|
<h1><a name=a-book-about-the-command-line-for-humans href=#a-book-about-the-command-line-for-humans>#</a> a book about the command line for humans</h1>
|
|
|
|
<p>In the fall of 2013, <a href="//p1k3.com/2013/8/4">thinking about</a> text utilities got
|
|
me thinking in turn about how my writing habits depend on the Linux command
|
|
line. This seems like a good hook for explaining some tools I use every day,
|
|
so now I’m writing a short, haphazard book.</p>
|
|
|
|
<p>This isn’t a book about system administration, writing complex software, or
|
|
becoming a wizard. I am not a wizard, and I don’t subscribe to the idea that
|
|
wizardry is required to use these tools. In fact, I barely know what I’m doing
|
|
most of the time. I still get some stuff done.</p>
|
|
|
|
<p>This is a work in progress. It probably gets some stuff wrong.</p>
|
|
|
|
<p>– bpb / <a href="https://p1k3.com">p1k3</a> / <a href="https://twitter.com/brennen">@brennen</a></p>
|
|
|
|
<div class=details>
|
|
<h2 class=clicker><a name=contents href=#contents>#</a> contents</h2>
|
|
<div class=full>
|
|
<div class=contents><ul>
|
|
<li><a href="#a-book-about-the-command-line-for-humans">a book about the command line for humans</a>
|
|
|
|
<ul>
|
|
<li><a href="#contents">contents</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#get-you-a-shell">0. get you a shell</a>
|
|
|
|
<ul>
|
|
<li><a href="#get-an-account-on-a-social-unix-server">get an account on a social unix server</a></li>
|
|
<li><a href="#use-a-raspberry-pi-or-beaglebone">use a raspberry pi or beaglebone</a></li>
|
|
<li><a href="#use-a-virtual-machine">use a virtual machine</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#the-command-line-as-literary-environment">1. the command line as literary environment</a>
|
|
|
|
<ul>
|
|
<li><a href="#terms-and-definitions">terms and definitions</a></li>
|
|
<li><a href="#twisty-little-passages">twisty little passages</a></li>
|
|
<li><a href="#cat">cat</a></li>
|
|
<li><a href="#wildcards">wildcards</a></li>
|
|
<li><a href="#sort">sort</a></li>
|
|
<li><a href="#options">options</a></li>
|
|
<li><a href="#uniq">uniq</a></li>
|
|
<li><a href="#standard-IO">standard IO</a></li>
|
|
<li><a href="#code-help-code-and-man-pages"><code>–help</code> and man pages</a></li>
|
|
<li><a href="#wc">wc</a></li>
|
|
<li><a href="#head-tail-and-cut">head, tail, and cut</a></li>
|
|
<li><a href="#tab-separated-values">tab separated values</a></li>
|
|
<li><a href="#finding-text-grep">finding text: grep</a></li>
|
|
<li><a href="#now-you-have-n-problems">now you have n problems</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#a-literary-problem">2. a literary problem</a></li>
|
|
<li><a href="#programmerthink">3. programmerthink</a></li>
|
|
<li><a href="#script">4. script</a>
|
|
|
|
<ul>
|
|
<li><a href="#learn-you-an-editor">learn you an editor</a></li>
|
|
<li><a href="#d-i-y-utilities">d.i.y. utilities</a></li>
|
|
<li><a href="#heavy-lifting">heavy lifting</a></li>
|
|
<li><a href="#generality">generality</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#general-purpose-programmering">5. general purpose programmering</a></li>
|
|
<li><a href="#one-of-these-things-is-not-like-the-others">6. one of these things is not like the others</a>
|
|
|
|
<ul>
|
|
<li><a href="#diff">diff</a></li>
|
|
<li><a href="#wdiff">wdiff</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#the-command-line-as-as-a-shared-world">7. the command line as as a shared world</a></li>
|
|
<li><a href="#the-command-line-and-the-web">8. the command line and the web</a></li>
|
|
<li><a href="#a-miscellany-of-tools-and-techniques">9. a miscellany of tools and techniques</a>
|
|
|
|
<ul>
|
|
<li><a href="#dict">dict</a></li>
|
|
<li><a href="#aspell">aspell</a></li>
|
|
<li><a href="#mostcommon">mostcommon</a></li>
|
|
<li><a href="#cal-and-ncal">cal and ncal</a></li>
|
|
<li><a href="#seq">seq</a></li>
|
|
<li><a href="#shuf">shuf</a></li>
|
|
<li><a href="#ptx">ptx</a></li>
|
|
<li><a href="#figlet">figlet</a></li>
|
|
<li><a href="#cowsay">cowsay</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#endmatter">endmatter</a>
|
|
|
|
<ul>
|
|
<li><a href="#further-reading">further reading</a></li>
|
|
<li><a href="#code">code</a></li>
|
|
<li><a href="#copying">copying</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
<hr />
|
|
|
|
<h1><a name=get-you-a-shell href=#get-you-a-shell>#</a> 0. get you a shell</h1>
|
|
|
|
<p>You don’t have to have a shell at hand to get something out of this book.
|
|
Still, as with most practical subjects, you’ll learn more if you try things out
|
|
as you go. You shouldn’t feel guilty about skipping this section. It will
|
|
always be here later if you need it.</p>
|
|
|
|
<p>Not so long ago, it was common for schools and ISPs to hand out shell accounts
|
|
on big shared systems. People learned the command line as a side effect of
|
|
reading their e-mail.</p>
|
|
|
|
<p>That doesn’t happen as often now, but in the meanwhile computers have become
|
|
relatively cheap and free software is abundant. If you’re reading this on the
|
|
web, you can probably get access to a shell. Some options follow.</p>
|
|
|
|
<h2><a name=get-an-account-on-a-social-unix-server href=#get-an-account-on-a-social-unix-server>#</a> get an account on a social unix server</h2>
|
|
|
|
<p>Check out <a href="https://tilde.town/">tilde.town</a>:</p>
|
|
|
|
<blockquote><p>tilde.town is an intentional digital community for making art, socializing, and
|
|
learning. Unlike many online spaces, users interact with tilde.town through a
|
|
direct connection instead of a web site. This means using a tool called ssh and
|
|
other text based tools.</p></blockquote>
|
|
|
|
<h2><a name=use-a-raspberry-pi-or-beaglebone href=#use-a-raspberry-pi-or-beaglebone>#</a> use a raspberry pi or beaglebone</h2>
|
|
|
|
<p>Do you have a single-board computer laying around? Perfect. If you already
|
|
run the standard Raspbian, Debian on a BeagleBone, or a similar-enough Linux,
|
|
you don’t need much else. I wrote most of this text on a Raspberry Pi, and the
|
|
example commands should all work there.</p>
|
|
|
|
<h2><a name=use-a-virtual-machine href=#use-a-virtual-machine>#</a> use a virtual machine</h2>
|
|
|
|
<p>A few options:</p>
|
|
|
|
<ul>
|
|
<li><a href="https://docs.vagrantup.com/v2/getting-started/index.html">Use Vagrant to spin up a machine in Virtualbox</a></li>
|
|
<li><a href="https://www.digitalocean.com/community/tutorials/how-to-create-your-first-digitalocean-droplet-virtual-server">Use DigitalOcean to create a remotely-hosted VM running Linux</a></li>
|
|
</ul>
|
|
|
|
|
|
<hr />
|
|
|
|
<h1><a name=the-command-line-as-literary-environment href=#the-command-line-as-literary-environment>#</a> 1. the command line as literary environment</h1>
|
|
|
|
<p>There’re a lot of ways to structure an introduction to the command line. I’m
|
|
going to start with writing as a point of departure because, aside from web
|
|
development, it’s what I use a computer for most. I want to shine a light on
|
|
the humane potential of ideas that are usually understood as nerd trivia.
|
|
Computers have utterly transformed the practice of writing within the space of
|
|
my lifetime, but it seems to me that writers as a class miss out on many of the
|
|
software tools and patterns taken as a given in more “technical” fields.</p>
|
|
|
|
<p>Writing, particularly writing of any real scope or complexity, is very much a
|
|
technical task. It makes demands, both physical and psychological, of its
|
|
practitioners. As with woodworkers, graphic artists, and farmers, writers
|
|
exhibit strong preferences in their tools, materials, and environment, and they
|
|
do so because they’re engaged in a physically and cognitively challenging task.</p>
|
|
|
|
<p>My thesis is that the modern Linux command line is a pretty good environment
|
|
for working with English prose and prosody, and that maybe this will illuminate
|
|
the ways it could be useful in your own work with a computer, whatever that
|
|
work happens to be.</p>
|
|
|
|
<h2><a name=terms-and-definitions href=#terms-and-definitions>#</a> terms and definitions</h2>
|
|
|
|
<p>What software are we actually talking about when we say “the command line”?</p>
|
|
|
|
<p>For the purposes of this discussion, we’re talking about an environment built
|
|
on a very old paradigm called Unix.</p>
|
|
|
|
<p style="text-align:center;"> <img src="images/jp_unix.jpg" height=320 width=470></p>
|
|
|
|
<p>…except what classical Unix really looks like is this:</p>
|
|
|
|
<p style="text-align:center;"> <img src="images/blinking.gif" width=470></p>
|
|
|
|
<p>The Unix-like environment we’re going to use isn’t very classical, really.
|
|
It’s an operating system kernel called Linux, combined with a bunch of things
|
|
written by other people (people in the GNU and Debian projects, and many
|
|
others). Purists will tell you that this isn’t properly Unix at all. In
|
|
strict historical terms they’re right, or at least a certain kind of right, but
|
|
for the purposes of my cultural agenda I’m going to ignore them right now.</p>
|
|
|
|
<p style="text-align:center;"> <img src="images/debian.png"></p>
|
|
|
|
<p>This is what’s called a shell. There are many different shells, but they
|
|
pretty much all operate on the same idea: You navigate a filesystem and run
|
|
programs by typing commands. Commands can be combined in various ways to make
|
|
programs of their own, and in fact the way you use the computer is often just
|
|
to write little programs that invoke other programs, turtles-all-the-way-down
|
|
style.</p>
|
|
|
|
<p>The standard shell these days is something called Bash, so we’ll use Bash.
|
|
It’s what you’ll most often see in the wild. Like most shells, Bash is ugly
|
|
and stupid in more ways than it is possible to easily summarize. It’s also an
|
|
incredibly powerful and expressive piece of software.</p>
|
|
|
|
<h2><a name=twisty-little-passages href=#twisty-little-passages>#</a> twisty little passages</h2>
|
|
|
|
<p>Have you ever played a text-based adventure game or MUD, of the kind that
|
|
describes a setting and takes commands for movement and so on? Readers of a
|
|
certain age and temperament might recognize the opening of Crowther & Woods'
|
|
<em>Adventure</em>, the great-granddaddy of text adventure games:</p>
|
|
|
|
<pre><code>YOU ARE STANDING AT THE END OF A ROAD BEFORE A SMALL BRICK BUILDING.
|
|
AROUND YOU IS A FOREST. A SMALL STREAM FLOWS OUT OF THE BUILDING ANd
|
|
DOWN A GULLY.
|
|
|
|
> GO EAST
|
|
|
|
YOU ARE INSIDE A BUILDING, A WELL HOUSE FOR A LARGE SPRING.
|
|
|
|
THERE ARE SOME KEYS ON THE GROUND HERE.
|
|
|
|
THERE IS A SHINY BRASS LAMP NEARBY.
|
|
|
|
THERE IS FOOD HERE.
|
|
|
|
THERE IS A BOTTLE OF WATER HERE.
|
|
</code></pre>
|
|
|
|
<p>You can think of the shell as a kind of environment you inhabit, in much the
|
|
way your character inhabits an adventure game. The difference is that instead
|
|
of navigating around virtual rooms and hallways with commands like <code>LOOK</code> and
|
|
<code>EAST</code>, you navigate between directories by typing commands like <code>ls</code> and <code>cd
|
|
notes</code>:</p>
|
|
|
|
<pre><code>$ ls
|
|
code Downloads notes p1k3 photos scraps userland-book
|
|
$ cd notes
|
|
$ ls
|
|
notes.txt sparkfun TODO.txt
|
|
</code></pre>
|
|
|
|
<p><code>ls</code> lists files. Some files are directories, which means they can contain
|
|
other files, and you can step inside of them by typing <code>cd</code> (for <strong>c</strong>hange
|
|
<strong>d</strong>irectory).</p>
|
|
|
|
<p>In the Macintosh and Windows world, directories have been called
|
|
“folders” for a long time now. This isn’t the <em>worst</em> metaphor for what’s
|
|
going on, and it’s so pervasive by now that it’s not worth fighting about.
|
|
It’s also not exactly a <em>great</em> metaphor, since computer filesystems aren’t
|
|
built very much like the filing cabinets of yore. A directory acts a lot like
|
|
a container of some sort, but it’s an infinitely expandable one which may
|
|
contain nested sub-spaces much larger than itself. Directories are frequently
|
|
like the TARDIS: Bigger on the inside.</p>
|
|
|
|
<h2><a name=cat href=#cat>#</a> cat</h2>
|
|
|
|
<p>When you’re in the shell, you have many tools at your disposal - programs that
|
|
can be used on many different files, or chained together with other programs.
|
|
They tend to have weird, cryptic names, but a lot of them do very simple
|
|
things. Tasks that might be a menu item in a big program like Word, like
|
|
counting the number of words in a document or finding a particular phrase, are
|
|
often programs unto themselves. We’ll start with something even more basic
|
|
than that.</p>
|
|
|
|
<p>Suppose you have some files, and you’re curious what’s in them. For example,
|
|
suppose you’ve got a list of authors you’re planning to reference, and you just
|
|
want to check its contents real quick-like. This is where our friend <code>cat</code>
|
|
comes in:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat authors_sff
|
|
Ursula K. Le Guin
|
|
Jo Walton
|
|
Pat Cadigan
|
|
John Ronald Reuel Tolkien
|
|
Vanessa Veselka
|
|
James Tiptree, Jr.
|
|
John Brunner
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>“Why,” you might be asking, “is the command to dump out the contents of a file
|
|
to a screen called <code>cat</code>? What do felines have to do with anything?”</p>
|
|
|
|
<p>It turns out that <code>cat</code> is actually short for “catenate”, which is a long
|
|
word basically meaning “stick things together”. In programming, we usually
|
|
refer to sticking two bits of text together as “string concatenation”, probably
|
|
because programmers like to feel like they’re being very precise about very
|
|
simple actions.</p>
|
|
|
|
<p>Suppose you wanted to see the contents of a <em>set</em> of author lists:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat authors_sff authors_contemporary_fic authors_nat_hist
|
|
Ursula K. Le Guin
|
|
Jo Walton
|
|
Pat Cadigan
|
|
John Ronald Reuel Tolkien
|
|
Vanessa Veselka
|
|
James Tiptree, Jr.
|
|
John Brunner
|
|
Eden Robinson
|
|
Vanessa Veselka
|
|
Miriam Toews
|
|
Gwendolyn L. Waring
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<h2><a name=wildcards href=#wildcards>#</a> wildcards</h2>
|
|
|
|
<p>We’re working with three filenames: <code>authors_sff</code>, <code>authors_contemporary_fic</code>,
|
|
and <code>authors_nat_hist</code>. That’s an awful lot of typing every time we want to do
|
|
something to all three files. Fortunately, our shell offers a shorthand for
|
|
“all the files that start with <code>authors_</code>”:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat authors_*
|
|
Eden Robinson
|
|
Vanessa Veselka
|
|
Miriam Toews
|
|
Gwendolyn L. Waring
|
|
Ursula K. Le Guin
|
|
Jo Walton
|
|
Pat Cadigan
|
|
John Ronald Reuel Tolkien
|
|
Vanessa Veselka
|
|
James Tiptree, Jr.
|
|
John Brunner
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>In Bash-land, <code>*</code> basically means “anything”, and is known in the vernacular,
|
|
somewhat poetically, as a “wildcard”. You should always be careful with
|
|
wildcards, especially if you’re doing anything destructive. They can and will
|
|
surprise the unwary. Still, once you’re used to the idea, they will save you a
|
|
lot of RSI.</p>
|
|
|
|
<h2><a name=sort href=#sort>#</a> sort</h2>
|
|
|
|
<p>There’s a problem here. Our author list is out of order, and thus confusing to
|
|
reference. Fortunately, since one of the most basic things you can do to a
|
|
list is to sort it, someone else has already solved this problem for us.
|
|
Here’s a command that will give us some organization:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort authors_*
|
|
Eden Robinson
|
|
Gwendolyn L. Waring
|
|
James Tiptree, Jr.
|
|
John Brunner
|
|
John Ronald Reuel Tolkien
|
|
Jo Walton
|
|
Miriam Toews
|
|
Pat Cadigan
|
|
Ursula K. Le Guin
|
|
Vanessa Veselka
|
|
Vanessa Veselka
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Does it bother you that they aren’t sorted by last name? Me too. As a partial
|
|
solution, we can ask <code>sort</code> to use the second “field” in each line as its sort
|
|
<strong>k</strong>ey (by default, sort treats whitespace as a division between fields):</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort -k2 authors_*
|
|
John Brunner
|
|
Pat Cadigan
|
|
Ursula K. Le Guin
|
|
Gwendolyn L. Waring
|
|
Eden Robinson
|
|
John Ronald Reuel Tolkien
|
|
James Tiptree, Jr.
|
|
Miriam Toews
|
|
Vanessa Veselka
|
|
Vanessa Veselka
|
|
Jo Walton
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>That’s closer, right? It sorted on “Cadigan” and “Veselka” instead of “Pat”
|
|
and “Vanessa”. (Of course, it’s still far from perfect, because the
|
|
second field in each line isn’t necessarily the person’s last name.)</p>
|
|
|
|
<h2><a name=options href=#options>#</a> options</h2>
|
|
|
|
<p>Above, when we wanted to ask <code>sort</code> to behave differently, we gave it what is
|
|
known as an option. Most programs with command-line interfaces will allow
|
|
their behavior to be changed by adding various options. Options usually
|
|
(but not always!) look like <code>-o</code> or <code>--option</code>.</p>
|
|
|
|
<p>For example, if we wanted to see just the unique lines, irrespective of case,
|
|
for a file called colors:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat colors
|
|
RED
|
|
blue
|
|
red
|
|
BLUE
|
|
Green
|
|
green
|
|
GREEN
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>We could write this:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort -uf colors
|
|
blue
|
|
Green
|
|
RED
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Here <code>-u</code> stands for <strong>u</strong>nique and <code>-f</code> stands for <strong>f</strong>old case, which means
|
|
to treat upper- and lower-case letters as the same for comparison purposes. You’ll
|
|
often see a group of short options following the <code>-</code> like this.</p>
|
|
|
|
<h2><a name=uniq href=#uniq>#</a> uniq</h2>
|
|
|
|
<p>Did you notice how Vanessa Veselka shows up twice in our list of authors?
|
|
That’s useful if we want to remember that she’s in more than one category, but
|
|
it’s redundant if we’re just worried about membership in the overall set of
|
|
authors. We can make sure our list doesn’t contain repeating lines by using
|
|
<code>sort</code>, just like with that list of colors:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort -u -k2 authors_*
|
|
John Brunner
|
|
Pat Cadigan
|
|
Ursula K. Le Guin
|
|
Gwendolyn L. Waring
|
|
Eden Robinson
|
|
John Ronald Reuel Tolkien
|
|
James Tiptree, Jr.
|
|
Miriam Toews
|
|
Vanessa Veselka
|
|
Jo Walton
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>But there’s another approach to this — <code>sort</code> is good at only displaying a line
|
|
once, but suppose we wanted to see a count of how many different lists an
|
|
author shows up on? <code>sort</code> doesn’t do that, but a command called <code>uniq</code> does,
|
|
if you give it the option <code>-c</code> for <strong>c</strong>ount.</p>
|
|
|
|
<p><code>uniq</code> moves through the lines in its input, and if it sees a line more than
|
|
once in sequence, it will only print that line once. If you have a bunch of
|
|
files and you just want to see the unique lines across all of those files, you
|
|
probably need to run them through <code>sort</code> first. How do you do that?</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort authors_* | uniq -c
|
|
1 Eden Robinson
|
|
1 Gwendolyn L. Waring
|
|
1 James Tiptree, Jr.
|
|
1 John Brunner
|
|
1 John Ronald Reuel Tolkien
|
|
1 Jo Walton
|
|
1 Miriam Toews
|
|
1 Pat Cadigan
|
|
1 Ursula K. Le Guin
|
|
2 Vanessa Veselka
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<h2><a name=standard-IO href=#standard-IO>#</a> standard IO</h2>
|
|
|
|
<p>The <code>|</code> is called a “pipe”. In the command above, it tells your shell that
|
|
instead of printing the output of <code>sort authors_*</code> right to your terminal, it
|
|
should send it to <code>uniq -c</code>.</p>
|
|
|
|
<p style="text-align:center;"> <img src="images/pipe.gif"></p>
|
|
|
|
<p>Pipes are some of the most important magic in the shell. When the people who
|
|
built Unix in the first place give interviews about the stuff they remember
|
|
from the early days, a lot of them reminisce about the invention of pipes and
|
|
all of the new stuff it immediately made possible.</p>
|
|
|
|
<p>Pipes help you control a thing called “standard IO”. In the world of the
|
|
command line, programs take <strong>i</strong>nput and produce <strong>o</strong>utput. A pipe is a way
|
|
to hook the output from one program to the input of another.</p>
|
|
|
|
<p>Unlike a lot of the weirdly named things you’ll encounter in software, the
|
|
metaphor here is obvious and makes pretty good sense. It even kind of looks
|
|
like a physical pipe.</p>
|
|
|
|
<p>What if, instead of sending the output of one program to the input of another,
|
|
you’d like to store it in a file for later use?</p>
|
|
|
|
<p>Check it out:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort authors_* | uniq > ./all_authors
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat all_authors
|
|
Eden Robinson
|
|
Gwendolyn L. Waring
|
|
James Tiptree, Jr.
|
|
John Brunner
|
|
John Ronald Reuel Tolkien
|
|
Jo Walton
|
|
Miriam Toews
|
|
Pat Cadigan
|
|
Ursula K. Le Guin
|
|
Vanessa Veselka
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>I like to think of the <code>></code> as looking like a little funnel. It can be
|
|
dangerous — you should always make sure that you’re not going to clobber
|
|
an existing file you actually want to keep.</p>
|
|
|
|
<p>If you want to tack more stuff on to the end of an existing file, you can use
|
|
<code>>></code> instead. To test that, let’s use <code>echo</code>, which prints out whatever string
|
|
you give it on a line by itself:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ echo 'hello' > hello_world
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ echo 'world' >> hello_world
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat hello_world
|
|
hello
|
|
world
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>You can also take a file and pull it directly back into the input of a given
|
|
program, which is a bit like a funnel going the other direction:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ nl < all_authors
|
|
1 Eden Robinson
|
|
2 Gwendolyn L. Waring
|
|
3 James Tiptree, Jr.
|
|
4 John Brunner
|
|
5 John Ronald Reuel Tolkien
|
|
6 Jo Walton
|
|
7 Miriam Toews
|
|
8 Pat Cadigan
|
|
9 Ursula K. Le Guin
|
|
10 Vanessa Veselka
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p><code>nl</code> is just a way to <strong>n</strong>umber <strong>l</strong>ines. This command accomplishes pretty much
|
|
the same thing as <code>cat all_authors | nl</code>, or <code>nl all_authors</code>. You won’t see
|
|
it used as often as <code>|</code> and <code>></code>, since most utilities can read files on their
|
|
own, but it can save you typing <code>cat</code> quite as often.</p>
|
|
|
|
<p>We’ll use these features liberally from here on out.</p>
|
|
|
|
<h2><a name=code-help-code-and-man-pages href=#code-help-code-and-man-pages>#</a> <code>--help</code> and man pages</h2>
|
|
|
|
<p>You can change the behavior of most tools by giving them different options.
|
|
This is all well and good if you already know what options are available,
|
|
but what if you don’t?</p>
|
|
|
|
<p>Often, you can ask the tool itself:</p>
|
|
|
|
<pre><code>$ sort --help
|
|
Usage: sort [OPTION]... [FILE]...
|
|
or: sort [OPTION]... --files0-from=F
|
|
Write sorted concatenation of all FILE(s) to standard output.
|
|
|
|
Mandatory arguments to long options are mandatory for short options too.
|
|
Ordering options:
|
|
|
|
-b, --ignore-leading-blanks ignore leading blanks
|
|
-d, --dictionary-order consider only blanks and alphanumeric characters
|
|
-f, --ignore-case fold lower case to upper case characters
|
|
-g, --general-numeric-sort compare according to general numerical value
|
|
-i, --ignore-nonprinting consider only printable characters
|
|
-M, --month-sort compare (unknown) < 'JAN' < ... < 'DEC'
|
|
-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)
|
|
-n, --numeric-sort compare according to string numerical value
|
|
-R, --random-sort sort by random hash of keys
|
|
--random-source=FILE get random bytes from FILE
|
|
-r, --reverse reverse the result of comparisons
|
|
</code></pre>
|
|
|
|
<p>…and so on. (It goes on for a while in this vein.)</p>
|
|
|
|
<p>If that doesn’t work, or doesn’t provide enough info, the next thing to try is
|
|
called a man page. (“man” is short for “manual”. It’s sort of an unfortunate
|
|
abbreviation.)</p>
|
|
|
|
<pre><code>$ man sort
|
|
|
|
SORT(1) User Commands SORT(1)
|
|
|
|
|
|
|
|
NAME
|
|
sort - sort lines of text files
|
|
|
|
SYNOPSIS
|
|
sort [OPTION]... [FILE]...
|
|
sort [OPTION]... --files0-from=F
|
|
|
|
DESCRIPTION
|
|
Write sorted concatenation of all FILE(s) to standard output.
|
|
</code></pre>
|
|
|
|
<p>…and so on. Manual pages vary in quality, and it can take a while to get
|
|
used to reading them, but they’re very often the best place to look for help.</p>
|
|
|
|
<p>If you’re not sure what <em>program</em> you want to use to solve a given problem, you
|
|
might try searching all the man pages on the system for a keyword. <code>man</code>
|
|
itself has an option to let you do this - <code>man -k keyword</code> - but most systems
|
|
also have a shortcut called <code>apropos</code>, which I like to use because it’s easy to
|
|
remember if you imagine yourself saying “apropos of [some problem I have]…”</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ apropos -s1 sort
|
|
apt-sortpkgs (1) - Utility to sort package index files
|
|
bunzip2 (1) - a block-sorting file compressor, v1.0.6
|
|
bzip2 (1) - a block-sorting file compressor, v1.0.6
|
|
comm (1) - compare two sorted files line by line
|
|
sort (1) - sort lines of text files
|
|
tsort (1) - perform topological sort
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>It’s useful to know that the manual represented by <code>man</code> has numbered sections
|
|
for different kinds of manual pages. Most of what the average user needs to
|
|
know about lives in section 1, “User Commands”, so you’ll often see the names
|
|
of different tools written like <code>sort(1)</code> or <code>cat(1)</code>. This can be a good way
|
|
to make it clear in writing that you’re talking about a specific piece of
|
|
software rather than a verb or a small carnivorous mammal. (I specified <code>-s1</code>
|
|
for section 1 above just to cut down on clutter, though in practice I usually
|
|
don’t bother.)</p>
|
|
|
|
<p>Like other literary traditions, Unix is littered with this sort of convention.
|
|
This one just happens to date from a time when the manual was still a physical
|
|
book.</p>
|
|
|
|
<h2><a name=wc href=#wc>#</a> wc</h2>
|
|
|
|
<p><code>wc</code> stands for <strong>w</strong>ord <strong>c</strong>ount. It does about what you’d expect - it
|
|
counts the number of words in its input.</p>
|
|
|
|
<pre><code>$ wc index.md
|
|
736 4117 24944 index.md
|
|
</code></pre>
|
|
|
|
<p>736 is the number of lines, 4117 the number of words, and 24944 the number of
|
|
characters in the file I’m writing right now. I use this constantly. Most
|
|
obviously, it’s a good way to get an idea of how much you’ve written. <code>wc</code> is
|
|
the tool I used to track my progress the last time I tried National Novel
|
|
Writing Month:</p>
|
|
|
|
<pre><code>$ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | tail -1
|
|
6585 total
|
|
</code></pre>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cowsay 'embarrassing.'
|
|
_______________
|
|
< embarrassing. >
|
|
---------------
|
|
\ ^__^
|
|
\ (oo)\_______
|
|
(__)\ )\/\
|
|
||----w |
|
|
|| ||
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Anyway. The less obvious thing about <code>wc</code> is that you can use it to count the
|
|
output of other commands. Want to know <em>how many</em> unique authors we have?</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort authors_* | uniq | wc -l
|
|
10
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>This kind of thing is trivial, but it comes in handy more often than you might
|
|
think.</p>
|
|
|
|
<h2><a name=head-tail-and-cut href=#head-tail-and-cut>#</a> head, tail, and cut</h2>
|
|
|
|
<p>Remember our old pal <code>cat</code>, which just splats everything it’s given back to
|
|
standard output?</p>
|
|
|
|
<p>Sometimes you’ve got a piece of output that’s more than you actually want to
|
|
deal with at once. Maybe you just want to glance at the first few lines in a
|
|
file:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ head -3 colors
|
|
RED
|
|
blue
|
|
red
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>…or maybe you want to see the last thing in a list:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort colors | uniq -i | tail -1
|
|
red
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>…or maybe you’re only interested in the first “field” in some list. You might
|
|
use <code>cut</code> here, asking it to treat spaces as delimiters between fields and
|
|
return only the first field for each line of its input:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cut -d' ' -f1 ./authors_*
|
|
Eden
|
|
Vanessa
|
|
Miriam
|
|
Gwendolyn
|
|
Ursula
|
|
Jo
|
|
Pat
|
|
John
|
|
Vanessa
|
|
James
|
|
John
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Suppose we’re curious what the few most commonly occurring first names on our
|
|
author list are? Here’s an approach, silly but effective, that combines a lot
|
|
of what we’ve discussed so far and looks like plenty of one-liners I wind up
|
|
writing in real life:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
|
|
1 Ursula
|
|
2 John
|
|
2 Vanessa
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Let’s walk through this one step by step:</p>
|
|
|
|
<p>First, we have <code>cut</code> extract the first field of each line in our author lists.</p>
|
|
|
|
<pre><code>cut -d' ' -f1 ./authors_*
|
|
</code></pre>
|
|
|
|
<p>Then we sort these results</p>
|
|
|
|
<pre><code>| sort
|
|
</code></pre>
|
|
|
|
<p>and pass them to <code>uniq</code>, asking it for a case-insensitive count of each
|
|
repeated line</p>
|
|
|
|
<pre><code>| uniq -ci
|
|
</code></pre>
|
|
|
|
<p>then sort again, numerically,</p>
|
|
|
|
<pre><code>| sort -n
|
|
</code></pre>
|
|
|
|
<p>and finally, we chop off everything but the last three lines:</p>
|
|
|
|
<pre><code>| tail -3
|
|
</code></pre>
|
|
|
|
<p>If you wanted to make sure to count an individual author’s first name
|
|
only once, even if that author appears more than once in the files,
|
|
you could instead do:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
|
|
1 Ursula
|
|
1 Vanessa
|
|
2 John
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<h2><a name=tab-separated-values href=#tab-separated-values>#</a> tab separated values</h2>
|
|
|
|
<p>Notice above how we had to tell <code>cut</code> that “fields” in <code>authors_*</code> are
|
|
delimited by spaces? It turns out that if you don’t use <code>-d</code>, <code>cut</code> defaults
|
|
to using tab characters for a delimiter.</p>
|
|
|
|
<p>Tab characters are sort of weird little animals. You can’t usually <em>see</em> them
|
|
directly — they’re like a space character that takes up more than one space
|
|
when displayed. By convention, one tab is usually rendered as 8 spaces, but
|
|
it’s up to the software that’s displaying the character what it wants to do.</p>
|
|
|
|
<p>(In fact, it’s more complicated than that: Tabs are often rendered as marking
|
|
<em>tab stops</em>, which is a concept I remember from 7th grade typing classes, but
|
|
haven’t actually thought about in my day-to-day life for nearly 20 years.)</p>
|
|
|
|
<p>Here’s a version of our <code>all_authors</code> that’s been rearranged so that the first
|
|
field is the author’s last name, the second is their first name, the third is
|
|
their middle name or initial (if we know it) and the fourth is any suffix.
|
|
Fields are separated by a single tab character:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat all_authors.tsv
|
|
Robinson Eden
|
|
Waring Gwendolyn L.
|
|
Tiptree James Jr.
|
|
Brunner John
|
|
Tolkien John Ronald Reuel
|
|
Walton Jo
|
|
Toews Miriam
|
|
Cadigan Pat
|
|
Le Guin Ursula K.
|
|
Veselka Vanessa
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>That looks kind of garbled, right? In order to make it a little more obvious
|
|
what’s happening, let’s use <code>cat -T</code>, which displays tab characters as <code>^I</code>:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat -T all_authors.tsv
|
|
Robinson^IEden
|
|
Waring^IGwendolyn^IL.
|
|
Tiptree^IJames^I^IJr.
|
|
Brunner^IJohn
|
|
Tolkien^IJohn^IRonald Reuel
|
|
Walton^IJo
|
|
Toews^IMiriam
|
|
Cadigan^IPat
|
|
Le Guin^IUrsula^IK.
|
|
Veselka^IVanessa
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>It looks odd when displayed because some names are at or nearly at 8 characters long.
|
|
“Robinson”, at 8 characters, overshoots the first tab stop, so “Eden” gets indented
|
|
further than other first names, and so on.</p>
|
|
|
|
<p>Fortunately, in order to make this more human-readable, we can pass it through
|
|
<code>expand</code>, which turns tabs into a given number of spaces (8 by default):</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ expand -t14 all_authors.tsv
|
|
Robinson Eden
|
|
Waring Gwendolyn L.
|
|
Tiptree James Jr.
|
|
Brunner John
|
|
Tolkien John Ronald Reuel
|
|
Walton Jo
|
|
Toews Miriam
|
|
Cadigan Pat
|
|
Le Guin Ursula K.
|
|
Veselka Vanessa
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Now it’s easy to sort by last name:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ sort -k1 all_authors.tsv | expand -t14
|
|
Brunner John
|
|
Cadigan Pat
|
|
Le Guin Ursula K.
|
|
Robinson Eden
|
|
Tiptree James Jr.
|
|
Toews Miriam
|
|
Tolkien John Ronald Reuel
|
|
Veselka Vanessa
|
|
Walton Jo
|
|
Waring Gwendolyn L.
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Or just extract middle names and initials:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cut -f3 all_authors.tsv
|
|
|
|
L.
|
|
|
|
|
|
Ronald Reuel
|
|
|
|
|
|
|
|
K.
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>It probably won’t surprise you to learn that there’s a corresponding <code>paste</code>
|
|
command, which takes two or more files and stitches them together with tab
|
|
characters. Let’s extract a couple of things from our author list and put them
|
|
back together in a different order:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cut -f1 all_authors.tsv > lastnames
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cut -f2 all_authors.tsv > firstnames
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ paste firstnames lastnames | sort -k2 | expand -t12
|
|
John Brunner
|
|
Pat Cadigan
|
|
Ursula Le Guin
|
|
Eden Robinson
|
|
James Tiptree
|
|
Miriam Toews
|
|
John Tolkien
|
|
Vanessa Veselka
|
|
Jo Walton
|
|
Gwendolyn Waring
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>As these examples show, TSV is something very like a primitive spreadsheet: A
|
|
way to represent information in columns and rows. In fact, it’s a close cousin
|
|
of CSV, which is often used as a lowest-common-denominator format for
|
|
transferring spreadsheets, and which represents data something like this:</p>
|
|
|
|
<pre><code>last,first,middle,suffix
|
|
Tolkien,John,Ronald Reuel,
|
|
Tiptree,James,,Jr.
|
|
</code></pre>
|
|
|
|
<p>The advantage of tabs is that they’re supported by a bunch of the standard
|
|
tools. A disadvantage is that they’re kind of ugly and can be weird to deal
|
|
with, but they’re useful anyway, and character-delimited rows are often a
|
|
good-enough way to hack your way through problems that call for basic
|
|
structure.</p>
|
|
|
|
<h2><a name=finding-text-grep href=#finding-text-grep>#</a> finding text: grep</h2>
|
|
|
|
<p>After all those contortions, what if you actually just want to see <em>which lists</em>
|
|
an individual author appears on?</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ grep 'Vanessa' ./authors_*
|
|
./authors_contemporary_fic:Vanessa Veselka
|
|
./authors_sff:Vanessa Veselka
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p><code>grep</code> takes a string to search for and, optionally, a list of files to search
|
|
in. If you don’t specify files, it’ll look through standard input instead:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat ./authors_* | grep 'Vanessa'
|
|
Vanessa Veselka
|
|
Vanessa Veselka
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Most of the time, piping the output of <code>cat</code> to <code>grep</code> is considered silly,
|
|
because <code>grep</code> knows how to find things in files on its own. Many thousands of
|
|
words have been written on this topic by leading lights of the nerd community.</p>
|
|
|
|
<p>You’ve probably noticed that this result doesn’t contain filenames (and thus
|
|
isn’t very useful to us). That’s because all <code>grep</code> saw was the lines in the
|
|
files, not the names of the files themselves.</p>
|
|
|
|
<h2><a name=now-you-have-n-problems href=#now-you-have-n-problems>#</a> now you have n problems</h2>
|
|
|
|
<p>To close out this introductory chapter, let’s spend a little time on a topic
|
|
that will likely vex, confound, and (occasionally) delight you for as long as
|
|
you are acquainted with the command line.</p>
|
|
|
|
<p>When I was talking about <code>grep</code> a moment ago, I fudged the details more than a
|
|
little by saying that it expects a string to search for. What <code>grep</code>
|
|
<em>actually</em> expects is a <em>pattern</em>. Moreover, it expects a specific kind of
|
|
pattern, what’s known as a <em>regular expression</em>, a cumbersome phrase frequently
|
|
shortened to regex.</p>
|
|
|
|
<p>There’s a lot of theory about what makes up a regular expression. Fortunately,
|
|
very little of it matters to the short version that will let you get useful
|
|
stuff done. The short version is that a regex is like using wildcards in the
|
|
shell to match groups of files, but for text in general and with more magic.</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ grep 'Jo.*' ./authors_*
|
|
./authors_sff:Jo Walton
|
|
./authors_sff:John Ronald Reuel Tolkien
|
|
./authors_sff:John Brunner
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>The pattern <code>Jo.*</code> says that we’re looking for lines which contain a literal
|
|
<code>Jo</code>, followed by any quantity (including none) of any character. In a regex,
|
|
<code>.</code> means “anything” and <code>*</code> means “any amount of the preceding thing”.</p>
|
|
|
|
<p><code>.</code> and <code>*</code> are magical. In the particular dialect of regexen understood
|
|
by <code>grep</code>, other magical things include:</p>
|
|
|
|
<table>
|
|
<tr><td><code>^</code> </td> <td>start of a line </td></tr>
|
|
<tr><td><code>$</code> </td> <td>end of a line </td></tr>
|
|
<tr><td><code>[abc]</code></td> <td>one of a, b, or c </td></tr>
|
|
<tr><td><code>[a-z]</code></td> <td>a character in the range a through z</td></tr>
|
|
<tr><td><code>[0-9]</code></td> <td>a character in the range 0 through 9</td></tr>
|
|
|
|
<tr><td><code>+</code> </td> <td>one or more of the preceding thing </td></tr>
|
|
<tr><td><code>?</code> </td> <td>0 or 1 of the preceding thing </td></tr>
|
|
<tr><td><code>*</code> </td> <td>any number of the preceding thing </td></tr>
|
|
|
|
<tr><td><code>(foo|bar)</code></td> <td>"foo" or "bar"</td></tr>
|
|
<tr><td><code>(foo)?</code></td> <td>optional "foo"</td></tr>
|
|
</table>
|
|
|
|
|
|
<p>It’s actually a little more complicated than that: By default, if you want to
|
|
use a lot of the magical characters, you have to prefix them with <code>\</code>. This is
|
|
both ugly and confusing, so unless you’re writing a very simple pattern, it’s
|
|
often easiest to call <code>grep -E</code>, for <strong>E</strong>xtended regular expressions, which
|
|
means that lots of characters will have special meanings.</p>
|
|
|
|
<p>Authors with 4-letter first names:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ grep -iE '^[a-z]{4} ' ./authors_*
|
|
./authors_contemporary_fic:Eden Robinson
|
|
./authors_sff:John Ronald Reuel Tolkien
|
|
./authors_sff:John Brunner
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>A count of authors named John:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ grep -c '^John ' ./all_authors
|
|
2
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Lines in this file matching the words “magic” or “magical”:</p>
|
|
|
|
<pre><code>$ grep -iE 'magic(al)?' ./index.md
|
|
Pipes are some of the most important magic in the shell. When the people who
|
|
shell to match groups of files, but with more magic.
|
|
`.` and `*` are magical. In the particular dialect of regexen understood
|
|
by `grep`, other magical things include:
|
|
use a lot of the magical characters, you have to prefix them with `\`. This is
|
|
Lines in this file matching the words "magic" or "magical":
|
|
$ grep -iE 'magic(al)?' ./index.md
|
|
</code></pre>
|
|
|
|
<p>Find some “-agic” words in a big list of words:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ grep -iE '(m|tr|pel)agic' /usr/share/dict/words
|
|
magic
|
|
magic's
|
|
magical
|
|
magically
|
|
magician
|
|
magician's
|
|
magicians
|
|
pelagic
|
|
tragic
|
|
tragically
|
|
tragicomedies
|
|
tragicomedy
|
|
tragicomedy's
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p><code>grep</code> isn’t the only - or even the most important - tool that makes use of
|
|
regular expressions, but it’s a good place to start because it’s one of the
|
|
fundamental building blocks for so many other operations. Filtering lists of
|
|
things, matching patterns within collections, and writing concise descriptions
|
|
of how text should be transformed are at the heart of a practical approach to
|
|
Unix-like systems. Regexen turn out to be a seductively powerful way to do
|
|
these things - so much so that they’ve crept their way into text editors,
|
|
databases, and full-featured programming languages.</p>
|
|
|
|
<p>There’s a dark side to all of this, for the truth about regular expressions is
|
|
that they are ugly, inconsistent, brittle, and <em>incredibly</em> difficult to think
|
|
clearly about. They take years to master and reward the wielder with great
|
|
power, but they are also a trap: a temptation towards the path of cleverness
|
|
masquerading as wisdom.</p>
|
|
|
|
<p style="text-align:center;"> ✑</p>
|
|
|
|
<p>I’ll be returning to this theme, but for the time being let’s move on. Now
|
|
that we’ve established, however haphazardly, some of the basics, let’s consider
|
|
their application to a real-world task.</p>
|
|
|
|
<hr />
|
|
|
|
<h1><a name=a-literary-problem href=#a-literary-problem>#</a> 2. a literary problem</h1>
|
|
|
|
<p>The <a href="../literary_environment">previous chapter</a> introduced a bunch of tools
|
|
using contrived examples. Now we’ll look at a real problem, and work through a
|
|
solution by building on tools we’ve already covered.</p>
|
|
|
|
<p>So on to the problem: I write poetry.</p>
|
|
|
|
<p>{rimshot dot wav}</p>
|
|
|
|
<p>Most of the poems I have written are not very good, but lately I’ve been
|
|
thinking that I’d like to comb through the last ten years' worth and pull
|
|
the least-embarrassing stuff into a single collection.</p>
|
|
|
|
<p>I’ve hinted at how the contents of my blog are stored as files, but let’s take
|
|
a look at the whole thing:</p>
|
|
|
|
<pre><code>$ ls -F ~/p1k3/archives/
|
|
1997/ 2003/ 2009/ bones/ meta/
|
|
1998/ 2004/ 2010/ chapbook/ winfield/
|
|
1999/ 2005/ 2011/ cli/ wip/
|
|
2000/ 2006/ 2012/ colophon/
|
|
2001/ 2007/ 2013/ europe/
|
|
2002/ 2008/ 2014/ hack/
|
|
</code></pre>
|
|
|
|
<p>(<code>ls</code>, again, just lists files. <code>-F</code> tells it to append a character that shows
|
|
it what type of file we’re looking at, such as a trailing / for directories.
|
|
<code>~</code> is a shorthand that means “my home directory”, which in this case is
|
|
<code>/home/brennen</code>.)</p>
|
|
|
|
<p>Each of the directories here holds other directories. The ones for each year
|
|
have sub-directories for the months of the year, which in turn contain files
|
|
for the days. The files are just little pieces of HTML and Markdown and some
|
|
other stuff. Many years ago, before I had much of an idea how to program, I
|
|
wrote a script to glue them all together into a web page and serve them up to
|
|
visitors. This all sounds complicated, but all it really means is that if I
|
|
want to write a blog entry, I just open a file and type some stuff. Here’s an
|
|
example for March 1st:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat ~/p1k3/archives/2014/3/1
|
|
<h1>Saturday, March 1</h1>
|
|
|
|
<markdown>
|
|
Sometimes I'm going along on a Saturday morning, still a little dazed from the
|
|
night before, and I think something like "I should just go write a detailed
|
|
analysis of hooded sweatshirts". Mostly these thoughts don't survive contact
|
|
with an actual keyboard. It's almost certainly for the best.
|
|
</markdown>
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>And here’s an older one that contains a short poem:</p>
|
|
|
|
<!-- took this one out of exec block 'cause later i
|
|
made a dir out of it... -->
|
|
|
|
|
|
<pre><code>$ cat ~/p1k3/archives/2012/10/9
|
|
<h1>tuesday, october 9</h1>
|
|
|
|
<freeverse>i am a stateful machine
|
|
i exist in a manifold of consequence
|
|
a clattering miscellany of impure functions
|
|
and side effects</freeverse>
|
|
</code></pre>
|
|
|
|
<p>Notice that <code><freeverse></code> bit? It kind of looks like an HTML tag, but it’s
|
|
not. What it actually does is tell my blog script that it should format the
|
|
text it contains like a poem. The specifics don’t matter for our purposes
|
|
(yet), but this convention is going to come in handy, because the first thing I
|
|
want to do is get a list of all the entries that contain poems.</p>
|
|
|
|
<p>Remember <code>grep</code>?</p>
|
|
|
|
<pre><code>$ grep -ri '<freeverse>' ~/p1k3/archives > ~/possible_poems
|
|
</code></pre>
|
|
|
|
<p>Let’s step through this bit by bit:</p>
|
|
|
|
<p>First, I’m asking <code>grep</code> to search <strong>r</strong>ecursively, <strong>i</strong>gnoring case.
|
|
“Recursively” just means that every time the program finds a directory, it
|
|
should descend into that directory and search in any files there as well.</p>
|
|
|
|
<pre><code>grep -ri
|
|
</code></pre>
|
|
|
|
<p>Next comes a pattern to search for. It’s in single quotes because the
|
|
characters <code><</code> and <code>></code> have a special meaning to the shell, and here we need
|
|
the shell to understand that it should treat them as literal angle brackets
|
|
instead.</p>
|
|
|
|
<pre><code>'<freeverse>'
|
|
</code></pre>
|
|
|
|
<p>This is the path I want to search:</p>
|
|
|
|
<pre><code>~/p1k3/archives
|
|
</code></pre>
|
|
|
|
<p>Finally, because there are so many entries to search, I know the process will
|
|
be slow and produce a large list, so I tell the shell to redirect it to a file
|
|
called <code>possible_poems</code> in my home directory:</p>
|
|
|
|
<pre><code>> ~/possible_poems
|
|
</code></pre>
|
|
|
|
<p>This is quite a few instances…</p>
|
|
|
|
<pre><code>$ wc -l ~/possible_poems
|
|
679 /home/brennen/possible_poems
|
|
</code></pre>
|
|
|
|
<p>…and it’s also not super-pretty to look at:</p>
|
|
|
|
<pre><code>$ head -5 ~/possible_poems
|
|
/home/brennen/p1k3/archives/2011/10/14:<freeverse>i've got this friend has a real knack
|
|
/home/brennen/p1k3/archives/2011/4/25:<freeverse>i can't claim to strive for it
|
|
/home/brennen/p1k3/archives/2011/8/10:<freeverse>one diminishes or becomes greater
|
|
/home/brennen/p1k3/archives/2011/8/12:<freeverse>
|
|
/home/brennen/p1k3/archives/2011/1/1:<freeverse>six years on
|
|
</code></pre>
|
|
|
|
<p>Still, it’s a decent start. I can see paths to the files I have to check, and
|
|
usually a first line. Since I use a fancy text editor, I can just go down the
|
|
list opening each file in a new window and copying the stuff I’m interested in
|
|
to a new file.</p>
|
|
|
|
<p>This is good enough for government work, but what if instead of jumping around
|
|
between hundreds of files, I’d rather read everything in one file and just weed
|
|
out the bad ones as I go?</p>
|
|
|
|
<pre><code>$ cat `grep -ril '<freeverse>' ~/p1k3/archives` > ~/possible_poems_full
|
|
</code></pre>
|
|
|
|
<p>This probably bears some explaining. <code>grep</code> is still doing all the real work
|
|
here. The main difference from before is that <code>-l</code> tells grep to just list any
|
|
files it finds which contain a match.</p>
|
|
|
|
<pre><code>`grep -ril '<freeverse>' ~/p1k3/archives`
|
|
</code></pre>
|
|
|
|
<p>Notice those backticks around the grep command? This part is a little
|
|
trippier. It turns out that if you put backticks around something in a
|
|
command, it’ll get executed and replaced with its result, which in turn gets
|
|
executed as part of the larger command. So what we’re really saying is
|
|
something like:</p>
|
|
|
|
<pre><code>$ cat [all of the files in the blog directory with <freeverse> in them]
|
|
</code></pre>
|
|
|
|
<p>Did you catch that? I just wrote a command that rewrote itself as a
|
|
<em>different</em>, more specific command. And it appears to have worked on the
|
|
first try:</p>
|
|
|
|
<pre><code>$ wc ~/possible_poems_full
|
|
17628 80980 528699 /home/brennen/possible_poems_full
|
|
</code></pre>
|
|
|
|
<p>Welcome to wizard school.</p>
|
|
|
|
<hr />
|
|
|
|
<h1><a name=programmerthink href=#programmerthink>#</a> 3. programmerthink</h1>
|
|
|
|
<p>In the <a href="#a-literary-problem">preceding chapter</a>, I worked through accumulating
|
|
a big piece of text from some other, smaller texts. I started with a bunch of
|
|
files and wound up with one big file called <code>potential_poems_full</code>.</p>
|
|
|
|
<p>Let’s talk for a minute about how programmers approach problems like this one.
|
|
What I’ve just done is sort of an old-school humanities take on things:
|
|
Metaphorically speaking, I took a book off the shelf and hauled it down to the
|
|
copy machine to xerox a bunch of pages, and now I’m going to start in on them
|
|
with a highlighter and some Post-Its or something. A process like this will
|
|
often trigger a cascade of questions in the programmer-mind:</p>
|
|
|
|
<ul>
|
|
<li>What if, halfway through the project, I realize my selection criteria were all
|
|
wrong and have to backtrack?</li>
|
|
<li>What if I discover corrections that also need to be made in the source documents?</li>
|
|
<li>What if I want to access metadata, like the original location of a file?</li>
|
|
<li>What if I want to quickly re-order the poems according to some new criteria?</li>
|
|
<li>Why am I storing the same text in two different places?</li>
|
|
</ul>
|
|
|
|
|
|
<p>A unifying theme of these questions is that they could all be answered by
|
|
involving a little more abstraction.</p>
|
|
|
|
<p style="text-align:center;"> ★</p>
|
|
|
|
<p>Some kinds of abstraction are so common in the physical world that we can
|
|
forget they’re part of a sophisticated technology. For example, a good deal of
|
|
bicycle maintenance can be accomplished with a cheap multi-tool containing a
|
|
few different sizes of hex wrench and a couple of screwdrivers.</p>
|
|
|
|
<p>A hex wrench or screwdriver doesn’t really know anything about bicycles. All
|
|
it <em>really</em> knows about is fitting into a space and allowing torque to be
|
|
applied. Standardized fasteners and adjustment mechanisms on a bicycle ensure
|
|
that the work can be done anywhere, by anyone with a certain set of tools.
|
|
Standard tools mean that if you can work on a particular bike, you can work on
|
|
<em>most</em> bikes, and even on things that aren’t bikes at all, but were designed by
|
|
people with the same abstractions in mind.</p>
|
|
|
|
<p>The relationship between a wrench, a bolt, and the purpose of a bolt is a lot
|
|
like something we call <em>indirection</em> in software. Programs like <code>grep</code> or
|
|
<code>cat</code> don’t really know anything about poetry. All they <em>really</em> know about is
|
|
finding lines of text in input, or sticking inputs together. Files, lines, and
|
|
text are like standardized fasteners that allow a user who can work on one kind
|
|
of data (be it poetry, a list of authors, the source code of a program) to use
|
|
the same tools for other problems and other data.</p>
|
|
|
|
<p style="text-align:center;"> ★</p>
|
|
|
|
<p>When I first started writing stuff on the web, I edited a page — a single HTML
|
|
file — by hand. When the entries on my nascent blog got old, I manually
|
|
cut-and-pasted them to archive files with names like <code>old_main97.html</code>, which
|
|
held all of the stuff I’d written in 1997.</p>
|
|
|
|
<p>I’m not holding this up as an example of youthful folly. In fact, it worked
|
|
fine, and just having a single, static file that you can open in any text
|
|
editor has turned out to be a <em>lot</em> more future-proof than the sophisticated
|
|
blogging software people were starting to write at the time.</p>
|
|
|
|
<p>And yet. Something about this habit nagged at my developing programmer mind
|
|
after a few years. It was just a little bit too manual and repetitive, a
|
|
little bit silly to have to write things like a table of contents by hand, or
|
|
move entries around by copy-and-pasting them to different files. Since I knew
|
|
the date for each entry, and wanted to make them navigable on that basis, why
|
|
not define a directory structure for the years and months, and then write a
|
|
file to hold each day? That way, all I’d have to do is concatenate the files
|
|
in one directory to display any given month:</p>
|
|
|
|
<pre><code>$ cat ~/p1k3/archives/2014/1/* | head -10
|
|
<h1>Sunday, January 12</h1>
|
|
|
|
<h2>the one casey is waiting for</h2>
|
|
|
|
<freeverse>
|
|
after a while
|
|
the thing about drinking
|
|
is that it just feeds
|
|
what you drink to kill
|
|
and kills
|
|
</code></pre>
|
|
|
|
<p>I ultimately wound up writing a few thousand lines of Perl to do the actual
|
|
work, but the essential idea of the thing is still little more than invoking
|
|
<code>cat</code> on some stuff.</p>
|
|
|
|
<p>I didn’t know the word for it at the time, but what I was reaching for was a
|
|
kind of indirection. By putting blog posts in a specific directory layout, I
|
|
was creating a simple model of the temporal structure that I considered their
|
|
most important property. Now, if I want to write commands that ask questions
|
|
about my blog posts or re-combine them in certain ways, I can address my
|
|
concerns to this model. Maybe, for example, I want a rough idea how many words
|
|
I’ve written in blog posts so far in 2014:</p>
|
|
|
|
<pre><code>$ find ~/p1k3/archives/2014/ -type f | xargs cat | wc -w
|
|
6677
|
|
</code></pre>
|
|
|
|
<p><code>xargs</code> is not the most intuitive command, but it’s useful and common enough to
|
|
explain here. At the end of last chapter, when I said:</p>
|
|
|
|
<pre><code>$ cat `grep -ril '<freeverse>' ~/p1k3/archives` > ~/possible_poems_full
|
|
</code></pre>
|
|
|
|
<p>I could also have written this as:</p>
|
|
|
|
<pre><code>$ grep -ril '<freeverse>' ~/p1k3/archives | xargs cat > ~/possible_poems_full
|
|
</code></pre>
|
|
|
|
<p>What this does is take its input, which starts like:</p>
|
|
|
|
<pre><code>/home/brennen/p1k3/archives/2002/10/16
|
|
/home/brennen/p1k3/archives/2002/10/27
|
|
/home/brennen/p1k3/archives/2002/10/10
|
|
</code></pre>
|
|
|
|
<p>…and run <code>cat</code> on all the things in it:</p>
|
|
|
|
<pre><code>cat /home/brennen/p1k3/archives/2002/10/16 /home/brennen/p1k3/archives/2002/10/27 /home/brennen/p1k3/archives/2002/10/10 ...
|
|
</code></pre>
|
|
|
|
<p>It can be a better idea to use <code>xargs</code>, because while backticks are
|
|
incredibly useful, they have some limitations. If you’re dealing with a very
|
|
large list of files, for example, you might exceed the maximum allowed length
|
|
for arguments to a command on your system. <code>xargs</code> is smart enough to know
|
|
that limit and run <code>cat</code> more than once if needed.</p>
|
|
|
|
<p><code>xargs</code> is actually sort of a pain to think about, and will make you jump
|
|
through some irritating hoops if you have spaces or other weirdness in your
|
|
filenames, but I wind up using it quite a bit.</p>
|
|
|
|
<p>Maybe I want to see a table of contents:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ find ~/p1k3/archives/2014/ -type d | xargs ls -v | head -10
|
|
/home/brennen/p1k3/archives/2014/:
|
|
1
|
|
2
|
|
3
|
|
4
|
|
|
|
/home/brennen/p1k3/archives/2014/1:
|
|
5
|
|
12
|
|
14
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Or find the subtitles I used in 2013:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ find ~/p1k3/archives/2012/ -type f | xargs perl -ne 'print "$1\n" if m{<h2>(.*?)</h2>}'
|
|
pursuit
|
|
fragment
|
|
this poem again
|
|
i'll do better next time
|
|
timebinding animals
|
|
more observations on gear nerdery &amp; utility fetishism
|
|
thrift
|
|
A miracle, in fact, means work
|
|
<em>technical notes for late october</em>, or <em>it gets dork out earlier these days</em>
|
|
radio
|
|
light enough to travel
|
|
12:06am
|
|
"figures like Heinlein and Gingrich"
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>The crucial thing about this is that the filesystem <em>itself</em> is just like <code>cat</code>
|
|
and <code>grep</code>: It doesn’t know anything about blogs (or poetry), and it’s
|
|
basically indifferent to the actual <em>structure</em> of a file like
|
|
<code>~/p1k3/archives/2014/1/12</code>. What the filesystem knows is that there are files
|
|
with certain names in certain places. It need not know anything about the
|
|
<em>meaning</em> of those names in order to be useful; in fact, it’s best if it stays
|
|
agnostic about the question, for this enables us to assign our own meaning to a
|
|
structure and manipulate that structure with standard tools.</p>
|
|
|
|
<p style="text-align:center;"> ★</p>
|
|
|
|
<p>Back to the problem at hand: I have this collection of files, and I know how
|
|
to extract the ones that contain poems. My goal is to see all the poems and
|
|
collect the subset of them that I still find worthwhile. Just knowing how to
|
|
grep and then edit a big file solves my problem, in a basic sort of way. And
|
|
yet: Something about this nags at my mind. I find that, just as I can already
|
|
use standard tools and the filesystem to ask questions about all of my blog
|
|
posts in a given year or month, I would like to be able to ask questions about
|
|
the set of interesting poems.</p>
|
|
|
|
<p>If I want the freedom to execute many different sorts of commands against this
|
|
set of poems, it begins to seem that I need a model.</p>
|
|
|
|
<p>When programmers talk about models, they often mean something that people in
|
|
the sciences would recognize: We find ways to represent the arrangement of
|
|
facts so that we can think about them. A structured representation of things
|
|
often means that we can <em>change</em> those things, or at least derive new
|
|
understanding of them.</p>
|
|
|
|
<p style="text-align:center;"> ★</p>
|
|
|
|
<p>At this point in the narrative, I could pretend that my next step is
|
|
immediately obvious, but in fact it’s not. I spend a couple of days thinking
|
|
off and on about how to proceed, scribbling notes during bus rides and while
|
|
drinking beers at the pizza joint down the street. I assess and discard ideas
|
|
which fall into a handful of broad approaches:</p>
|
|
|
|
<ul>
|
|
<li>Store blog entries in a relational database system which would allow me to
|
|
associate them with data like “this entry is in a collection called ‘ok
|
|
poems’”.</li>
|
|
<li>Selectively build up a file containing the list of files with ok poems, and use
|
|
it to do other tasks.</li>
|
|
<li>Define a format for metadata that lives within entry files.</li>
|
|
<li>Turn each interesting file into a directory of its own which contains a file
|
|
with the original text and another file with metadata.</li>
|
|
</ul>
|
|
|
|
|
|
<p>I discard the relational database idea immediately: I like working with files,
|
|
and I don’t feel like abandoning a model that’s served me well for my entire
|
|
adult life.</p>
|
|
|
|
<p>Building up an index file to point at the other files I’m working with has a
|
|
certain appeal. I’m already most of the way there with the <code>grep</code> output in
|
|
<code>potential_poems</code>. It would be easy to write shell commands to add, remove,
|
|
sort, and search entries. Still, it doesn’t feel like a very satisfying
|
|
solution unto itself. I’d like to know that an entry is part of the collection
|
|
just by looking at the entry, without having to cross-reference it to a list
|
|
somewhere else.</p>
|
|
|
|
<p>What about putting some meaningful text in the file itself? I thought about
|
|
a bunch of different ways to do this, some of them really complicated, and
|
|
eventually arrived at this:</p>
|
|
|
|
<pre><code><!-- collection: ok-poems -->
|
|
</code></pre>
|
|
|
|
<p>The <code><!-- --></code> bits are how you define a comment in HTML, which means that
|
|
neither my blog code nor web browsers nor my text editor have to know anything
|
|
about the format, but I can easily find files with certain values. Check it:</p>
|
|
|
|
<pre><code>$ find ~/p1k3/archives -type f | xargs perl -ne 'print "$ARGV[0]: $1 -> $2\n" if m{<!-- ([a-z]+): (.*?) -->};'
|
|
/home/brennen/p1k3/archives/2014/2/9: collection -> ok-poems
|
|
</code></pre>
|
|
|
|
<p>That’s an ugly one-liner, and I haven’t explained half of what it does, but the
|
|
comment format actually seems pretty workable for this. It’s a little tacky to
|
|
look at, but it’s simple and searchable.</p>
|
|
|
|
<p>Before we settle, though, let’s turn to the notion of making each entry into a
|
|
directory that can contain some structured metadata in a separate file.
|
|
Imagine something like:</p>
|
|
|
|
<pre><code>$ ls ~/p1k3/archives/2013/2/9
|
|
index Meta
|
|
</code></pre>
|
|
|
|
<p>Here I use the name “index” for the main part of the entry because it’s a
|
|
convention of web sites for the top-level page in a directory to be called
|
|
something like <code>index.html</code>. As it happens, my blog software already supports
|
|
this kind of file layout for entries which contain multiple parts, image files,
|
|
and so forth.</p>
|
|
|
|
<pre><code>$ head ~/p1k3/archives/2013/2/9/index
|
|
<h1>saturday, february 9</h1>
|
|
|
|
<freeverse>
|
|
midwinter midafternoon; depressed as hell
|
|
sitting in a huge cabin in the rich-people mountains
|
|
writing a sprawl, pages, of melancholic midlife bullshit
|
|
|
|
outside the snow gives way to broken clouds and the
|
|
clear unyielding light of the high country sun fills
|
|
|
|
$ cat ~/p1k3/archives/2013/2/9/Meta
|
|
collection: ok-poems
|
|
</code></pre>
|
|
|
|
<p>It would then be easy to <code>find</code> files called <code>Meta</code> and grep them for
|
|
<code>collection: ok-poems</code>.</p>
|
|
|
|
<p>What if I put metadata right in the filename itself, and dispense with the grep
|
|
altogether?</p>
|
|
|
|
<pre><code>$ ls ~/p1k3/archives/2013/2/9
|
|
index meta-ok-poem
|
|
|
|
$ find ~/p1k3/archives -name 'meta-ok-poem'
|
|
/home/brennen/archives/2013/2/9/meta-ok-poem
|
|
</code></pre>
|
|
|
|
<p>There’s a lot to like about this. For one thing, it’s immediately visible in a
|
|
directory listing. For another, it doesn’t require searching through thousands
|
|
of lines of text to extract a specific string. If a directory has a
|
|
<code>meta-ok-poem</code> in it, I can be pretty sure that it will contain an interesting
|
|
<code>index</code>.</p>
|
|
|
|
<p>What are the downsides? Well, it requires transforming lots of text files into
|
|
directories-containing-files. I might automate that process, but it’s still a
|
|
little tedious and it makes the layout of the entry archive more complicated
|
|
overall. There’s a cost to doing things this way. It lets me extend my
|
|
existing model of a blog entry to include arbitrary metadata, but it also adds
|
|
steps to writing or finding blog entries.</p>
|
|
|
|
<p>Abstractions usually cost you something. Is this one worth the hassle?
|
|
Sometimes the best way to answer that question is to start writing code that
|
|
handles a given abstraction.</p>
|
|
|
|
<hr />
|
|
|
|
<h1><a name=script href=#script>#</a> 4. script</h1>
|
|
|
|
<p>Back in chapter 1, I said that “the way you use the computer is often just to write
|
|
little programs that invoke other programs”. In fact, we’ve already gone over a
|
|
bunch of these. Grepping through the text of a previous chapter should pull
|
|
up some good examples:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ grep -E '\$ [a-z]+.*\| ' ../literary_environment/index.md
|
|
$ sort authors_* | uniq -c
|
|
$ sort authors_* | uniq > ./all_authors
|
|
$ find ~/p1k3/archives/2010/11 -regextype egrep -regex '.*([0-9]+|index)' -type f | xargs wc -w | tail -1
|
|
$ sort authors_* | uniq | wc -l
|
|
$ sort colors | uniq -i | tail -1
|
|
$ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
|
|
$ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
|
|
$ sort -k1 all_authors.tsv | expand -t14
|
|
$ paste firstnames lastnames | sort -k2 | expand -t12
|
|
$ cat ./authors_* | grep 'Vanessa'
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>None of these one-liners do all that much, but they all take input of one sort
|
|
or another and apply one or more transformations to it. They’re little formal
|
|
sentences describing how to make one thing into another, which is as good a
|
|
definition of programming as most. Or at least this is a good way to describe
|
|
programming-in-the-small. (A lot of the programs we use day-to-day are more
|
|
like essays, novels, or interminable Fantasy series where every character you
|
|
like dies horribly than they are like individual sentences.)</p>
|
|
|
|
<p>One-liners like these are all well and good when you’re staring at a terminal,
|
|
trying to figure something out - but what about when you’ve already figured it out and
|
|
you want to repeat it in the future?</p>
|
|
|
|
<p>It turns out that Bash has you covered. Since shell commands are just text,
|
|
they can live in a text file as easily as they can be typed.</p>
|
|
|
|
<h2><a name=learn-you-an-editor href=#learn-you-an-editor>#</a> learn you an editor</h2>
|
|
|
|
<p>We’ve skirted the topic so far, but now that we’re talking about writing out
|
|
text files in earnest, you’re going to want a text editor.</p>
|
|
|
|
<p>My editor is where I spend most of my time that isn’t in a web browser, because
|
|
it’s where I write both code and prose. It turns out that the features which
|
|
make a good code editor overlap a lot with the ones that make a good editor of
|
|
English sentences.</p>
|
|
|
|
<p>So what should you use? Well, there have been other contenders in recent
|
|
years, but in truth nothing comes close to dethroning the Great Old Ones of
|
|
text editing. Emacs is a creature both primal and sophisticated, like an
|
|
avatar of some interstellar civilization that evolved long before multicellular
|
|
life existed on earth and seeded the galaxy with incomprehensible artefacts and
|
|
colossal engineering projects. Vim is like a lovable chainsaw-studded robot
|
|
with the most elegant keyboard interface in history secretly emblazoned on its
|
|
shining diamond heart.</p>
|
|
|
|
<p>It’s worth the time it takes to learn one of the serious editors, but there are
|
|
easier places to start. Nano, for example, is easy to pick up, and should be
|
|
available on most systems. To start it, just say:</p>
|
|
|
|
<pre><code>$ nano file
|
|
</code></pre>
|
|
|
|
<p>You should see something like this:</p>
|
|
|
|
<p style="text-align:center;"> <img src="images/nano.png" alt="nano" /></p>
|
|
|
|
<p>Arrow keys will move your cursor around, and typing stuff will make it appear
|
|
in the file. This is pretty much like every other editor you’ve ever used. If
|
|
you haven’t used Nano before, that stuff along the bottom of the terminal is a
|
|
reference to the most commonly used commands. <code>^</code> is a convention for “Ctrl”,
|
|
so <code>^O</code> means Ctrl-o (the case of the letter doesn’t actually matter), which
|
|
will save the file you’re working on. Ctrl-x will quit, which is probably the
|
|
first important thing to know about any given editor.</p>
|
|
|
|
<h2><a name=d-i-y-utilities href=#d-i-y-utilities>#</a> d.i.y. utilities</h2>
|
|
|
|
<p>So back to putting commands in text files. Here’s a file I just created in
|
|
my editor:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat okpoems
|
|
#!/bin/bash
|
|
|
|
# find all the marker files and get the name of
|
|
# the directory containing each
|
|
find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
|
|
|
|
exit 0
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>This is known as a script. There are a handful of things to notice here.
|
|
First, there’s this fragment:</p>
|
|
|
|
<pre><code>#!/bin/bash
|
|
</code></pre>
|
|
|
|
<p>The <code>#!</code> right at the beginning, followed by the path to a program, is a
|
|
special sequence that lets the kernel know what program should be used to
|
|
interpret the contents of the file. <code>/bin/bash</code> is the path on the filesystem
|
|
where Bash itself lives. You might see this referred to as a shebang or a hash
|
|
bang.</p>
|
|
|
|
<p>Lines that start with a <code>#</code> are comments, used to describe the code to a human
|
|
reader. The <code>exit 0</code> tells Bash that the currently running script should exit
|
|
with a status of 0, which basically means “nothing went wrong”.</p>
|
|
|
|
<p>If you examine the directory listing for <code>okpoems</code>, you’ll see something
|
|
important:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ ls -l okpoems
|
|
-rwxrwxr-x 1 brennen brennen 163 Apr 19 00:08 okpoems
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>That looks pretty cryptic. For the moment, just remember that those little
|
|
<code>x</code>s in the first bit mean that the file has been marked e<strong>x</strong>ecutable. We
|
|
accomplish this by saying something like:</p>
|
|
|
|
<pre><code>$ chmod +x ./okpoems
|
|
</code></pre>
|
|
|
|
<p>Once that’s done, it and the shebang line in combination mean that typing
|
|
<code>./okpoems</code> will have the same effect as typing <code>bash okpoems</code>:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ ./okpoems
|
|
/home/brennen/p1k3/archives/2013/2/9
|
|
/home/brennen/p1k3/archives/2012/3/17
|
|
/home/brennen/p1k3/archives/2012/3/26
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<h2><a name=heavy-lifting href=#heavy-lifting>#</a> heavy lifting</h2>
|
|
|
|
<p><code>okpoems</code> demonstrates the basics, but it doesn’t do very much. Here’s
|
|
a script with a little more substance to it:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat markpoem
|
|
#!/bin/bash
|
|
|
|
# $1 is the first parameter to our script
|
|
POEM=$1
|
|
|
|
# Complain and exit if we weren't given a path:
|
|
if [ ! $POEM ]; then
|
|
echo 'usage: markpoem <path>'
|
|
|
|
# Confusingly, an exit status of 0 means to the shell that everything went
|
|
# fine, while any other number means that something went wrong.
|
|
exit 64
|
|
fi
|
|
|
|
if [ ! -e $POEM ]; then
|
|
echo "$POEM not found"
|
|
exit 66
|
|
fi
|
|
|
|
echo "marking $POEM an ok poem"
|
|
|
|
POEM_BASENAME=$(basename $POEM)
|
|
|
|
# If the target is a plain file instead of a directory, make it into
|
|
# a directory and move the content into $POEM/index:
|
|
if [ -f $POEM ]; then
|
|
echo "making $POEM into a directory, moving content to"
|
|
echo " $POEM/index"
|
|
TEMPFILE="/tmp/$POEM_BASENAME.$(date +%s.%N)"
|
|
mv $POEM $TEMPFILE
|
|
mkdir $POEM
|
|
mv $TEMPFILE $POEM/index
|
|
fi
|
|
|
|
if [ -d $POEM ]; then
|
|
# touch(1) will either create the file or update its timestamp:
|
|
touch $POEM/meta-ok-poem
|
|
else
|
|
echo "something broke - why isn't $POEM a directory?"
|
|
file $POEM
|
|
fi
|
|
|
|
# Signal that all is copacetic:
|
|
echo kthxbai
|
|
exit 0
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Both of these scripts are imperfect, but they were quick to write, they’re made
|
|
out of standard commands, and I don’t yet hate myself for them: All signs that
|
|
I’m not totally on the wrong track with the <code>meta-ok-poem</code> abstraction, and
|
|
could live with it as part of an ongoing writing project. <code>okpoems</code> and
|
|
<code>markpoem</code> would also be easy to use with custom keybindings in my editor. In
|
|
a few more lines of code, I can build a system to wade through the list of
|
|
candidate files and quickly mark the interesting ones.</p>
|
|
|
|
<h2><a name=generality href=#generality>#</a> generality</h2>
|
|
|
|
<p>So what’s lacking here? Well, probably a bunch of things, feature-wise. I can
|
|
imagine writing a script to unmark a poem, for example. That said, there’s one
|
|
really glaring problem. “Ok poem” is only one kind of property a blog entry
|
|
might possess. Suppose I wanted a way to express that a poem is terrible?</p>
|
|
|
|
<p>It turns out I already know how to add properties to an entry. If I generalize
|
|
just a little, the tools become much more flexible.</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ ./addprop /home/brennen/p1k3/archives/2012/3/26 meta-terrible-poem
|
|
marking /home/brennen/p1k3/archives/2012/3/26 with meta-terrible-poem
|
|
kthxbai
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ ./findprop meta-terrible-poem
|
|
/home/brennen/p1k3/archives/2012/3/26
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p><code>addprop</code> is only a little different from <code>markpoem</code>. It takes two parameters
|
|
instead of one - the target entry and a property to add.</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat addprop
|
|
#!/bin/bash
|
|
|
|
ENTRY=$1
|
|
PROPERTY=$2
|
|
|
|
# Complain and exit if we weren't given a path and a property:
|
|
if [[ ! $ENTRY || ! $PROPERTY ]]; then
|
|
echo "usage: addprop <path> <property>"
|
|
exit 64
|
|
fi
|
|
|
|
if [ ! -e $ENTRY ]; then
|
|
echo "$ENTRY not found"
|
|
exit 66
|
|
fi
|
|
|
|
echo "marking $ENTRY with $PROPERTY"
|
|
|
|
# If the target is a plain file instead of a directory, make it into
|
|
# a directory and move the content into $ENTRY/index:
|
|
if [ -f $ENTRY ]; then
|
|
echo "making $ENTRY into a directory, moving content to"
|
|
echo " $ENTRY/index"
|
|
|
|
# Get a safe temporary file:
|
|
TEMPFILE=`mktemp`
|
|
|
|
mv $ENTRY $TEMPFILE
|
|
mkdir $ENTRY
|
|
mv $TEMPFILE $ENTRY/index
|
|
fi
|
|
|
|
if [ -d $ENTRY ]; then
|
|
touch $ENTRY/$PROPERTY
|
|
else
|
|
echo "something broke - why isn't $ENTRY a directory?"
|
|
file $ENTRY
|
|
fi
|
|
|
|
echo kthxbai
|
|
exit 0
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Meanwhile, <code>findprop</code> is more or less <code>okpoems</code>, but with a parameter for the
|
|
property to find:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat findprop
|
|
#!/bin/bash
|
|
|
|
if [ ! $1 ]
|
|
then
|
|
echo "usage: findprop <property>"
|
|
exit
|
|
fi
|
|
|
|
# find all the marker files and get the name of
|
|
# the directory containing each
|
|
find ~/p1k3/archives -name $1 | xargs -n1 dirname
|
|
|
|
exit 0
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>These scripts aren’t much more complicated than their poem-specific
|
|
counterparts, but now they can be used to solve problems I haven’t even thought
|
|
of yet, and included in other scripts that need their functionality.</p>
|
|
|
|
<hr />
|
|
|
|
<h1><a name=general-purpose-programmering href=#general-purpose-programmering>#</a> 5. general purpose programmering</h1>
|
|
|
|
<p>I didn’t set out to write a book about programming, <em>as such</em>, but because
|
|
programming and the command line are so inextricably linked, this text
|
|
draws near the subject almost of its own accord.</p>
|
|
|
|
<p>If you’re not terribly interested in programming, this chapter can easily
|
|
enough be skipped. It’s more in the way of philosophical rambling than
|
|
concrete instruction, and will be of most use to those with an existing
|
|
background in writing code.</p>
|
|
|
|
<p style="text-align:center;"> ✢</p>
|
|
|
|
<p>If you’ve used computers for more than a few years, you’re probably viscerally
|
|
aware that most software is fragile and most systems decay. In the time since
|
|
I took my first tentative steps into the little world of a computer (a friend’s
|
|
dad’s unidentifiable gaming machine, my own father’s blue monochrome Zenith
|
|
laptop, the Apple II) the churn has been overwhelming. By now I’ve learned my
|
|
way around vastly more software — operating systems, programming languages and
|
|
development environments, games, editors, chat clients, mail systems — than I
|
|
presently could use if I wanted to. Most of it has gone the way of some
|
|
ancient civilization, surviving (if at all) only in faint, half-understood
|
|
cultural echoes and occasional museum-piece displays. Every user of technology
|
|
becomes, in time, a refugee from an irretrievably recent past.</p>
|
|
|
|
<p>And yet, despite all this, the shell endures. Most of the ideas in this book
|
|
are older than I am. Most of them could have been applied in 1994 or
|
|
thereabouts, when I first logged on to multiuser systems running AT&T Unix.
|
|
Since the early 1990s, systems built on a fundamental substrate of Unix-like
|
|
behavior and abstractions have proliferated wildly, becoming foundational at
|
|
once to the modern web, the ecosystem of free and open software, and the
|
|
technological dominance ca. 2014 of companies like Apple, Google, and Facebook.</p>
|
|
|
|
<p>Why is this, exactly?</p>
|
|
|
|
<p style="text-align:center;"> ✣</p>
|
|
|
|
<p>As I’ve said (and hopefully shown), the commands you write in your shell
|
|
are essentially little programs. Like other programs, they can be stored
|
|
for later use and recombined with other commands, creating new uses for
|
|
your ideas.</p>
|
|
|
|
<p>It would be hard to say that there’s any <em>one</em> reason command line environments
|
|
remain so vital after decades of evolution and hard-won refinement in computer
|
|
interfaces, but it seems like this combinatory nature is somewhere near the
|
|
heart of it. The command line often lacks the polish of other interfaces we
|
|
depend on, but in exchange it offers a richness and freedom of expression
|
|
rarely seen elsewhere, and invites its users to build upon its basic
|
|
facilities.</p>
|
|
|
|
<p>What is it that makes last chapter’s <code>addprop</code> preferable to the more specific
|
|
<code>markpoem</code>? Let’s look at an alternative implementation of <code>markpoem</code>:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat simple_markpoem
|
|
#!/bin/bash
|
|
|
|
addprop $1 meta-ok-poem
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Is this script trivial? Absolutely. It’s so trivial that it barely seems to
|
|
exist, because I already wrote <code>addprop</code> to do all the heavy lifting and play
|
|
well with others, freeing us to imagine new uses for its central idea without
|
|
worrying about the implementation details.</p>
|
|
|
|
<p>Unlike <code>markpoem</code>, <code>addprop</code> doesn’t know anything about poetry. All it knows
|
|
about, in fact, is putting a file (or three) in a particular place. And this
|
|
is in keeping with a basic insight of Unix: Pieces of software that do one
|
|
very simple thing generalize well. Good command line tools are like a hex
|
|
wrench, a hammer, a utility knife: They embody knowledge of turning, of
|
|
striking, of cutting — and with this kind of knowledge at hand, the user can
|
|
change the world even though no individual tool is made with complete knowledge
|
|
of the world as a whole. There’s a lot of power in the accumulation of small
|
|
competencies.</p>
|
|
|
|
<p>Of course, if your code is only good at one thing, to be of any use, it has to
|
|
talk to code that’s good at other things. There’s another basic insight in the
|
|
Unix tradition: Tools should be composable. All those little programs have to
|
|
share some assumptions, have to speak some kind of trade language, in order to
|
|
combine usefully. Which is how we’ve arrived at standard IO, pipelines,
|
|
filesystems, and text as as a lowest-common-denominator medium of exchange. If
|
|
you think about most of these things, they have some very rough edges, but they
|
|
give otherwise simple tools ways to communicate without becoming
|
|
super-complicated along the way.</p>
|
|
|
|
<p style="text-align:center;"> ✤</p>
|
|
|
|
<p>What is the command line?</p>
|
|
|
|
<p>The command line is an environment of tool use.</p>
|
|
|
|
<p>So are kitchens, workshops, libraries, and programming languages.</p>
|
|
|
|
<p style="text-align:center;"> ✥</p>
|
|
|
|
<p>Here’s a confession: I don’t like writing shell scripts very much, and I
|
|
can’t blame anyone else for feeling the same way.</p>
|
|
|
|
<p>That doesn’t mean you shouldn’t <em>know</em> about them, or that you shouldn’t
|
|
<em>write</em> them. I write little ones all the time, and the ability to puzzle
|
|
through other people’s scripts comes in handy. Oftentimes, the best, most
|
|
tasteful way to automate something is to build a script out of the commonly
|
|
available commands. The standard tools are already there on millions of
|
|
machines. Many of them have been pretty well understood for a generation, and
|
|
most will probably be around for a generation or three to come. They do neat
|
|
stuff. Scripts let you build on ideas you’ve already worked out, and give
|
|
repeatable operations a memorable, user-friendly name. They encourage reuse of
|
|
existing programs, and help express your ideas to people who’ll come after you.</p>
|
|
|
|
<p>One of the reliable markers of powerful software is that it can be scripted: It
|
|
extends to its users some of the same power that its authors used in creating
|
|
it. Scriptable software is to some extent <em>living</em> software. It’s a book that
|
|
you, the reader, get to help write.</p>
|
|
|
|
<p>In all these ways, shell scripts are wonderful, a little bit magical, and
|
|
quietly indispensable to the machinery of modern civilization.</p>
|
|
|
|
<p>Unfortunately, in all the ways that a shell like Bash is weird, finicky, and
|
|
covered in 40 years of incidental cruft, long-form Bash scripts are even worse.
|
|
Bash is a useful glue language, particularly if you’re already comfortable
|
|
wiring commands together. Syntactic and conceptual innovations like pipes are
|
|
beautiful and necessary. What Bash is <em>not</em>, despite its power, is a very good
|
|
general purpose programming language. It’s just not especially good at things
|
|
like math, or complex data structures, or not looking like a punctuation-heavy
|
|
variety of alphabet soup.</p>
|
|
|
|
<p>It turns out that there’s a threshold of complexity beyond which life becomes
|
|
easier if you switch from shell scripting to a more robust language. Just
|
|
where this threshold is located varies a lot between users and problems, but I
|
|
often think about switching languages before a script gets bigger than I can
|
|
view on my screen all at once. <code>addprop</code> is a good example:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ wc -l ../script/addprop
|
|
41 ../script/addprop
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>41 lines is a touch over what fits on one screen in the editor I usually use.
|
|
If I were going to add much in the way of features, I’d think pretty hard about
|
|
porting it to another language first.</p>
|
|
|
|
<p>What’s cool is that if you know a language like C, Python, Perl, Ruby, PHP, or
|
|
JavaScript, your code can participate in the shell environment as a first class
|
|
citizen simply by respecting the conventions of standard IO, files, and command
|
|
line arguments. Often, in order to create a useful utility, it’s only
|
|
necessary to deal with <code>STDIN</code>, or operate on a particular sort of file, and
|
|
most languages offer simple conventions for doing these things.</p>
|
|
|
|
<p style="text-align:center;"> *</p>
|
|
|
|
<p>I think the shell can be taught and understood as a humane environment, despite
|
|
all of its ugliness and complication, because it offers the materials of its
|
|
own construction to its users, whatever their concerns. The writer, the
|
|
philosopher, the scientist, the programmer: Files and text and pipes know
|
|
little enough about these things, but in their very indifference to the
|
|
specifics of any one complex purpose, they’re adaptable to the basic needs of
|
|
many. Simple utilities which enact simple kinds of knowledge survive and
|
|
recombine because there is a wisdom to be found in small things.</p>
|
|
|
|
<p>Files and text know nothing about poetry, nothing in particular of the human
|
|
soul. Neither do pen and ink, printing presses or codex books, but somehow we
|
|
got Shakespeare and Montaigne.</p>
|
|
|
|
<hr />
|
|
|
|
<h1><a name=one-of-these-things-is-not-like-the-others href=#one-of-these-things-is-not-like-the-others>#</a> 6. one of these things is not like the others</h1>
|
|
|
|
<p>If you’re the sort of person who took a few detours into the history of
|
|
religion in college, you might be familiar with some of the ways people used to
|
|
do textual comparison. When pen, paper, and typesetting were what scholars had
|
|
to work with, they did some fairly sophisticated things in order to expose the
|
|
relationships between multiple pieces of text.</p>
|
|
|
|
<p style="text-align:center;"> <img src="images/throckmorton_small.jpg" height=320 width=470></p>
|
|
|
|
<p>Here’s a book I got in college: <em>Gospel Parallels: A Comparison of the
|
|
Synoptic Gospels</em>, Burton H. Throckmorton, Jr., Ed. It breaks up three books
|
|
from the New Testament by the stories and themes that they contain, and shows
|
|
the overlapping sections of each book that contain parallel texts. You can
|
|
work your way through and see what parts only show up in one book, or in two
|
|
but not the other, or in all three. Pages are arranged like so:</p>
|
|
|
|
<pre>
|
|
§ JESUS DOES SOME STUFF
|
|
________________________________________________
|
|
| MAT | MAR | LUK |
|
|
|-----------------+--------------------+---------|
|
|
| Stuff | | |
|
|
| | Stuff | |
|
|
| | Stuff | Stuff |
|
|
| | Stuff | |
|
|
| | Stuff | |
|
|
| | | |
|
|
</pre>
|
|
|
|
|
|
<p>The way I understand it, a book like this one only scratches the surface of the
|
|
field. Tools like this support a lot of theory about which books copied each
|
|
other and how, and what other sources they might have copied that we’ve since
|
|
lost.</p>
|
|
|
|
<p>This is some <em>incredibly</em> dry material, even if you kind of dig thinking about
|
|
the questions it addresses. It takes a special temperament to actually sit
|
|
poring over fragmentary texts in ancient languages and do these painstaking
|
|
comparisons. Even if you’re a writer or editor and work with a lot of
|
|
revisions of a text, there’s a good chance you rarely do this kind of
|
|
comparison on your own work, because that shit is <em>tedious</em>.</p>
|
|
|
|
<h2><a name=diff href=#diff>#</a> diff</h2>
|
|
|
|
<p>It turns out that academics aren’t the only people who need tools for comparing
|
|
different versions of a text. Working programmers, in fact, need to do this
|
|
<em>constantly</em>. Programmers are also happiest when putting off the <em>actual</em> task
|
|
at hand to solve some incidental problem that cropped up along the way, so by
|
|
now there are a lot of ways to say “here’s how this file is different from this
|
|
file”, or “here’s how this file is different from itself a year ago”.</p>
|
|
|
|
<p>Let’s look at a couple of shell scripts from an earlier chapter:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat ../script/okpoems
|
|
#!/bin/bash
|
|
|
|
# find all the marker files and get the name of
|
|
# the directory containing each
|
|
find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
|
|
|
|
exit 0
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat ../script/findprop
|
|
#!/bin/bash
|
|
|
|
if [ ! $1 ]
|
|
then
|
|
echo "usage: findprop <property>"
|
|
exit
|
|
fi
|
|
|
|
# find all the marker files and get the name of
|
|
# the directory containing each
|
|
find ~/p1k3/archives -name $1 | xargs -n1 dirname
|
|
|
|
exit 0
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>It’s pretty obvious these are similar files, but do we know what <em>exactly</em>
|
|
changed between them at a glance? It wouldn’t be hard to figure out, once. If
|
|
you wanted to be really certain about it, you could print them out, set them
|
|
side by side, and go over them with a highlighter.</p>
|
|
|
|
<p>Now imagine doing that for a bunch of files, some of them hundreds or thousands
|
|
of lines long. I’ve actually done that before, colored markers and all, but I
|
|
didn’t feel smart while I was doing it. This is a job for software.</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ diff ../script/okpoems ../script/findprop
|
|
2a3,8
|
|
> if [ ! $1 ]
|
|
> then
|
|
> echo "usage: findprop <property>"
|
|
> exit
|
|
> fi
|
|
>
|
|
5c11
|
|
< find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
|
|
---
|
|
> find ~/p1k3/archives -name $1 | xargs -n1 dirname
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>That’s not the most human-friendly output, but it’s a little simpler than it
|
|
seems at first glance. It’s basically just a way of describing the changes
|
|
needed to turn <code>okpoems</code> into <code>findprop</code>. The string <code>2a3,8</code> can be read as
|
|
“at line 2, add lines 3 through 8”. Lines with a <code>></code> in front of them are
|
|
added. <code>5c11</code> can be read as “line 5 in the original file becomes line 11 in
|
|
the new file”, and the <code><</code> line is replaced with the <code>></code> line. If you wanted,
|
|
you could take a copy of the original file and apply these instructions by hand
|
|
in your text editor, and you’d wind up with the new file.</p>
|
|
|
|
<p>A lot of people (me included) prefer what’s known as a “unified” diff, because
|
|
it’s easier to read and offers context for the changed lines. We can ask for
|
|
one of these with <code>diff -u</code>:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ diff -u ../script/okpoems ../script/findprop
|
|
--- ../script/okpoems 2014-04-19 00:08:03.321230818 -0600
|
|
+++ ../script/findprop 2014-04-21 21:51:29.360846449 -0600
|
|
@@ -1,7 +1,13 @@
|
|
#!/bin/bash
|
|
|
|
+if [ ! $1 ]
|
|
+then
|
|
+ echo "usage: findprop <property>"
|
|
+ exit
|
|
+fi
|
|
+
|
|
# find all the marker files and get the name of
|
|
# the directory containing each
|
|
-find ~/p1k3/archives -name 'meta-ok-poem' | xargs -n1 dirname
|
|
+find ~/p1k3/archives -name $1 | xargs -n1 dirname
|
|
|
|
exit 0
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>That’s a little longer, and has some metadata we might not always care about,
|
|
but if you look for lines starting with <code>+</code> and <code>-</code>, it’s easy to read as
|
|
“added these, took away these”. This diff tells us at a glance that we added
|
|
some lines to complain if we didn’t get a command line argument, and replaced
|
|
<code>'meta-ok-poem'</code> in the <code>find</code> command with that argument. Since it shows us
|
|
some context, we have a pretty good idea where those lines are in the file
|
|
and what they’re for.</p>
|
|
|
|
<p>What if we don’t care exactly <em>how</em> the files differ, but only whether they
|
|
do?</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ diff -q ../script/okpoems ../script/findprop
|
|
Files ../script/okpoems and ../script/findprop differ
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>I use <code>diff</code> a lot in the course of my day job, because I spend a lot of time
|
|
needing to know just how two programs differ. Just as importantly, I often
|
|
need to know how (or whether!) the <em>output</em> of programs differs. As a concrete
|
|
example, I want to make sure that <code>findprop meta-ok-poem</code> is really a suitable
|
|
replacement for <code>okpoems</code>. Since I expect their output to be identical, I can
|
|
do this:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ ../script/okpoems > okpoem_output
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ ../script/findprop meta-ok-poem > findprop_output
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ diff -s okpoem_output findprop_output
|
|
Files okpoem_output and findprop_output are identical
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>The <code>-s</code> just means that <code>diff</code> should explicitly tell us if files are the
|
|
<strong>s</strong>ame. Otherwise, it’d output nothing at all, because there aren’t any
|
|
differences.</p>
|
|
|
|
<p>As with many other tools, <code>diff</code> doesn’t very much care whether it’s looking at
|
|
shell scripts or a list of filenames or what-have-you. If you read the man
|
|
page, you’ll find some features geared towards people writing C-like
|
|
programming languages, but its real specialty is just text files with lines
|
|
made out of characters, which works well for lots of code, but certainly could
|
|
be applied to English prose.</p>
|
|
|
|
<p>Since I have a couple of versions ready to hand, let’s apply this to a text
|
|
with some well-known variations and a bit of a literary legacy. Here’s the
|
|
first day of the Genesis creation narrative in a couple of English
|
|
translations:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat genesis_nkj
|
|
In the beginning God created the heavens and the earth. The earth was without
|
|
form, and void; and darkness was on the face of the deep. And the Spirit of
|
|
God was hovering over the face of the waters. Then God said, "Let there be
|
|
light"; and there was light. And God saw the light, that it was good; and God
|
|
divided the light from the darkness. God called the light Day, and the darkness
|
|
He called Night. So the evening and the morning were the first day.
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat genesis_nrsv
|
|
In the beginning when God created the heavens and the earth, the earth was a
|
|
formless void and darkness covered the face of the deep, while a wind from
|
|
God swept over the face of the waters. Then God said, "Let there be light";
|
|
and there was light. And God saw that the light was good; and God separated
|
|
the light from the darkness. God called the light Day, and the darkness he
|
|
called Night. And there was evening and there was morning, the first day.
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>What happens if we diff them?</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ diff -u genesis_nkj genesis_nrsv
|
|
--- genesis_nkj 2014-05-11 16:28:29.692508461 -0600
|
|
+++ genesis_nrsv 2014-05-11 16:28:29.744508459 -0600
|
|
@@ -1,6 +1,6 @@
|
|
-In the beginning God created the heavens and the earth. The earth was without
|
|
-form, and void; and darkness was on the face of the deep. And the Spirit of
|
|
-God was hovering over the face of the waters. Then God said, "Let there be
|
|
-light"; and there was light. And God saw the light, that it was good; and God
|
|
-divided the light from the darkness. God called the light Day, and the darkness
|
|
-He called Night. So the evening and the morning were the first day.
|
|
+In the beginning when God created the heavens and the earth, the earth was a
|
|
+formless void and darkness covered the face of the deep, while a wind from
|
|
+God swept over the face of the waters. Then God said, "Let there be light";
|
|
+and there was light. And God saw that the light was good; and God separated
|
|
+the light from the darkness. God called the light Day, and the darkness he
|
|
+called Night. And there was evening and there was morning, the first day.
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Kind of useless, right? If a given line differs by so much as a character,
|
|
it’s not the same line. This highlights the limitations of <code>diff</code> for comparing
|
|
things that</p>
|
|
|
|
<ul>
|
|
<li>aren’t logically grouped by line</li>
|
|
<li>aren’t easily thought of as versions of the same text with some lines changed</li>
|
|
</ul>
|
|
|
|
|
|
<p>We could edit the files into a more logically defined structure, like
|
|
one-line-per-verse, and try again:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ diff -u genesis_nkj_by_verse genesis_nrsv_by_verse
|
|
--- genesis_nkj_by_verse 2014-05-11 16:51:14.312457198 -0600
|
|
+++ genesis_nrsv_by_verse 2014-05-11 16:53:02.484453134 -0600
|
|
@@ -1,5 +1,5 @@
|
|
-In the beginning God created the heavens and the earth.
|
|
-The earth was without form, and void; and darkness was on the face of the deep. And the Spirit of God was hovering over the face of the waters.
|
|
+In the beginning when God created the heavens and the earth,
|
|
+the earth was a formless void and darkness covered the face of the deep, while a wind from God swept over the face of the waters.
|
|
Then God said, "Let there be light"; and there was light.
|
|
-And God saw the light, that it was good; and God divided the light from the darkness.
|
|
-God called the light Day, and the darkness He called Night. So the evening and the morning were the first day.
|
|
+And God saw that the light was good; and God separated the light from the darkness.
|
|
+God called the light Day, and the darkness he called Night. And there was evening and there was morning, the first day.
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>It might be a little more descriptive, but editing all that text just for a
|
|
quick comparison felt suspiciously like work, and anyway the output still
|
|
doesn’t seem very useful.</p>
|
|
|
|
<h2><a name=wdiff href=#wdiff>#</a> wdiff</h2>
|
|
|
|
<p>For cases like this, I’m fond of a tool called <code>wdiff</code>:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ wdiff genesis_nkj genesis_nrsv
|
|
In the beginning {+when+} God created the heavens and the [-earth. The-] {+earth, the+} earth was [-without
|
|
form, and void;-] {+a
|
|
formless void+} and darkness [-was on-] {+covered+} the face of the [-deep. And the Spirit of-] {+deep, while a wind from+}
|
|
God [-was hovering-] {+swept+} over the face of the waters. Then God said, "Let there be light";
|
|
and there was light. And God saw [-the light,-] that [-it-] {+the light+} was good; and God
|
|
[-divided-] {+separated+}
|
|
the light from the darkness. God called the light Day, and the darkness
|
|
[-He-] {+he+}
|
|
called Night. [-So the-] {+And there was+} evening and [-the morning were-] {+there was morning,+} the first day.
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Deleted words are surrounded by <code>[- -]</code> and inserted ones by <code>{+ +}</code>. You can
|
|
even ask it to spit out HTML tags for insertion and deletion…</p>
|
|
|
|
<pre><code>$ wdiff -w '<del>' -x '</del>' -y '<ins>' -z '</ins>' genesis_nkj genesis_nrsv
|
|
</code></pre>
|
|
|
|
<p>…and come up with something your browser will render like this:</p>
|
|
|
|
<blockquote>
|
|
<p>In the beginning <ins>when</ins> God created the heavens and the <del>earth. The</del> <ins>earth, the</ins> earth was <del>without
|
|
form, and void;</del> <ins>a
|
|
formless void</ins> and darkness <del>was on</del> <ins>covered</ins> the face of the <del>deep. And the Spirit of</del> <ins>deep, while a wind from</ins>
|
|
God <del>was hovering</del> <ins>swept</ins> over the face of the waters. Then God said, "Let there be light";
|
|
and there was light. And God saw <del>the light,</del> that <del>it</del> <ins>the light</ins> was good; and God
|
|
<del>divided</del> <ins>separated</ins>
|
|
the light from the darkness. God called the light Day, and the darkness
|
|
<del>He</del> <ins>he</ins>
|
|
called Night. <del>So the</del> <ins>And there was</ins> evening and <del>the morning were</del> <ins>there was morning,</ins> the first day.</p>
|
|
</blockquote>
|
|
|
|
|
|
<p>Burton H. Throckmorton, Jr. this ain’t. Still, it has its uses.</p>
|
|
|
|
<hr />
|
|
|
|
<h1><a name=the-command-line-as-as-a-shared-world href=#the-command-line-as-as-a-shared-world>#</a> 7. the command line as as a shared world</h1>
|
|
|
|
<p>In an earlier chapter, I wrote:</p>
|
|
|
|
<blockquote><p>You can think of the shell as a kind of environment you inhabit, in much
|
|
the way your character inhabits an adventure game.</p></blockquote>
|
|
|
|
<p>It turns out that sometimes there are other human inhabitants of this
|
|
environment.</p>
|
|
|
|
<p>Unix was built on a model known as “time-sharing”. This is an idea with a lot
|
|
of history, but the very short version is that when computers were rare and
|
|
expensive, it made sense for lots of people to be able to use them at once.
|
|
This is part of the story of how ideas like e-mail and chat were originally
|
|
born, well before networks took over the world: As ways for the many users of
|
|
one computer to communicate on the same machine.</p>
|
|
|
|
<p>Says Dennis Ritchie:</p>
|
|
|
|
<blockquote><p>What we wanted to preserve was not just a good environment in which to do
|
|
programming, but a system around which a fellowship could form. We knew from
|
|
experience that the essence of communal computing, as supplied by
|
|
remote-access, time-shared machines, is not just to type programs into a
|
|
terminal instead of a keypunch, but to encourage close communication.</p></blockquote>
|
|
|
|
<p>Times have changed, and while it’s mundane to use software that’s shared
|
|
between many users, it’s not nearly as common as it once was for a bunch of us
|
|
to be logged into the same computer all at once.</p>
|
|
|
|
<p style="text-align:center;"> ★</p>
|
|
|
|
<p>In the mid 1990s, when I was first exposed to Unix, it was by opening up a
|
|
program called NCSA Telnet on one of the Macs at school and connecting to a
|
|
server called mother.esu1.k12.ne.us.</p>
|
|
|
|
<p>NCSA Telnet was a terminal, not unlike the kind that you use to open a shell on
|
|
your own Linux computer, a piece of software that itself emulated actual,
|
|
physical hardware from an earlier era. Hardware terminals were basically very
|
|
simple computers with keyboards, screens, and just enough networking brains to
|
|
talk to a <em>real</em> computer somewhere else. You’ll still come across these
|
|
scattered around big institutional environments. The last time I looked over
|
|
the shoulder of an airline checkin desk clerk, for example, I saw green
|
|
monochrome text that was probably coming from an IBM mainframe somewhere
|
|
far away.</p>
|
|
|
|
<p>Part of what was exciting about being logged into a computer somewhere else
|
|
was that you could <em>talk to people</em>.</p>
|
|
|
|
<p style="text-align:center;"> ★</p>
|
|
|
|
<p><em>{This chapter is a work in progress.}</em></p>
|
|
|
|
<hr />
|
|
|
|
<h1><a name=the-command-line-and-the-web href=#the-command-line-and-the-web>#</a> 8. the command line and the web</h1>
|
|
|
|
<p>Web browsers are really complicated these days. They’re full of rendering
|
|
engines, audio and video players, programming languages, development tools,
|
|
databases — you name it, and there’s a fair chance it’s in there somewhere.
|
|
The modern web browser is kitchen sink software, and to make matters worse, it
|
|
is <em>totally surrounded</em> by technobabble. It can take <em>years</em> to come to terms
|
|
with the ocean of words about web stuff and sort out the meaningful ones from
|
|
the snake oil and bureaucratic mysticism.</p>
|
|
|
|
<p>All of which can make the web itself seem like a really complicated landscape,
|
|
and obscure the simplicity of its basic design, which is this:</p>
|
|
|
|
<p>Some programs pass text around to one another.</p>
|
|
|
|
<p>Which might sound familiar.</p>
|
|
|
|
<p>The gist of it is that the web is made out of URLs, “Uniform Resource
|
|
Locators”, which are paths to things. If you squint, these look kind of like
|
|
paths to files on your filesystem. When you visit a URL in your browser, it
|
|
asks a server for a certain path, and the server gives it back some text. When
|
|
you click a button to submit a form, your browser sends some text to the server
|
|
and waits to see what it says back. The text that gets passed around is
|
|
(usually) written in a language with particular significance to web browsers,
|
|
but if you look at it directly, it’s a format that humans can understand.</p>
|
|
|
|
<p>Let’s illustrate this. I’ve written a really simple web page that lives at
|
|
<a href="http://p1k3.com/hello_world.html"><code>http://p1k3.com/hello_world.html</code></a>.</p>
|
|
|
|
<pre><code>$ curl 'https://p1k3.com/hello_world.html'
|
|
<html>
|
|
<head>
|
|
<title>hello, world</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>hi everybody</h1>
|
|
|
|
<p>How are things?</p>
|
|
</body>
|
|
</html>
|
|
</code></pre>
|
|
|
|
<p><code>curl</code> is a program with lots and lots of features — it too is a little bit
|
|
of a kitchen sink — but it has one core purpose, which is to grab things from
|
|
URLs and spit them back out. It’s a little bit like <code>cat</code> for things that live
|
|
on the web. Try the above command with just about any URL you can think of,
|
|
and you’ll probably get <em>something</em> back. Let’s try this book:</p>
|
|
|
|
<pre><code>$ curl 'https://p1k3.com/userland-book/' | head
|
|
<!DOCTYPE html>
|
|
<html lang=en>
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<title>userland: a book about the command line for humans</title>
|
|
<link rel=stylesheet href="userland.css" />
|
|
<script src="js/jquery.js" type="text/javascript"></script>
|
|
</head>
|
|
|
|
<body>
|
|
</code></pre>
|
|
|
|
<p><code>hello_world.html</code> and <code>userland-book</code> are both written in HyperText Markup
|
|
Language. HTML is just text with a specific kind of structure. It’s been
|
|
around for quite a while now, and has grown up a lot in 20 years, but at heart
|
|
it still looks a lot <a href="http://info.cern.ch/hypertext/WWW/TheProject.html">like it did in 1991</a>.</p>
|
|
|
|
<p>The basic idea is that the contents of a web page are marked up with tags.
|
|
A tag looks like this:</p>
|
|
|
|
<pre><code><title>hi!</title> -,
|
|
| | |
|
|
| `- content |
|
|
| `- closing tag
|
|
`-opening tag
|
|
</code></pre>
|
|
|
|
<p>Sometimes you’ll see tags with what are known as “attributes”:</p>
|
|
|
|
<pre><code><a href="https://p1k3.com/userland-book">userland</a>
|
|
</code></pre>
|
|
|
|
<p>This is how links are written in HTML. <code>href="..."</code> tells the browser where to
|
|
go when the user clicks on “<a href="http://p1k3.com/userland-book">userland</a>”.</p>
|
|
|
|
<p>Tags are a way to describe not so much what something <em>looks like</em> as what
|
|
something <em>means</em>. Browsers are, in large part, big collections of knowledge
|
|
about the meanings of tags and ways to represent those meanings.</p>
|
|
|
|
<p>While the browser you use day-to-day has (probably) a graphical interface and
|
|
does all sorts of things impossible to render in a terminal, some of the
|
|
earliest web browsers were entirely text-based, and text-mode browsers still
|
|
exist. Lynx, which originated at the University of Kansas in the early 1990s,
|
|
is still actively maintained:</p>
|
|
|
|
<pre><code>$ lynx -dump 'http://p1k3.com/userland-book/' | head
|
|
userland
|
|
__________________________________________________________________
|
|
|
|
[1]# a book about the command line for humans
|
|
|
|
Late last year, [2]a side trip into text utilities got me thinking
|
|
about how much my writing habits depend on the Linux command line. This
|
|
struck me as a good hook for talking about the tools I use every day
|
|
with an audience of mixed technical background.
|
|
</code></pre>
|
|
|
|
<p>If you invoke Lynx without any options, it’ll start up in interactive mode, and
|
|
you can navigate between links with the arrow keys. <code>lynx -dump</code> spits a
|
|
rendered version of a page to standard output, with links annotated in square
|
|
brackets and printed as footnotes. Another useful option here is <code>-listonly</code>,
|
|
which will print just the list of links contained within a page:</p>
|
|
|
|
<pre><code>$ lynx -dump -listonly 'http://p1k3.com/userland-book/' | head
|
|
|
|
References
|
|
|
|
2. http://p1k3.com/2013/8/4
|
|
3. http://p1k3.com/userland-book.git
|
|
4. https://github.com/brennen/userland-book
|
|
5. http://p1k3.com/userland-book/
|
|
6. https://twitter.com/brennen
|
|
9. http://p1k3.com/userland-book/#a-book-about-the-command-line-for-humans
|
|
10. http://p1k3.com/userland-book/#copying
|
|
</code></pre>
|
|
|
|
<p>An alternative to Lynx is w3m, which copes a little more gracefully with the
|
|
complexities of modern web layout.</p>
|
|
|
|
<pre><code>$ w3m -dump 'http://p1k3.com/userland-book/' | head
|
|
userland
|
|
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
|
|
# a book about the command line for humans
|
|
|
|
Late last year, a side trip into text utilities got me thinking about how much
|
|
my writing habits depend on the Linux command line. This struck me as a good
|
|
hook for talking about the tools I use every day with an audience of mixed
|
|
technical background.
|
|
</code></pre>
|
|
|
|
<p>Neither of these tools can easily replace enormously capable applications like
|
|
Chrome or Firefox, but they have their place in the toolbox, and help to
|
|
demonstrate how the web is built (in part) on principles we’ve already seen at
|
|
work.</p>
|
|
|
|
<hr />
|
|
|
|
<h1><a name=a-miscellany-of-tools-and-techniques href=#a-miscellany-of-tools-and-techniques>#</a> 9. a miscellany of tools and techniques</h1>
|
|
|
|
<h2><a name=dict href=#dict>#</a> dict</h2>
|
|
|
|
<p>Want to know the definition of a word, or find useful synonyms?</p>
|
|
|
|
<pre><code>$ dict concatenate | head -10
|
|
4 definitions found
|
|
|
|
From The Collaborative International Dictionary of English v.0.48 [gcide]:
|
|
|
|
Concatenate \Con*cat"e*nate\ (k[o^]n*k[a^]t"[-e]*n[=a]t), v. t.
|
|
[imp. & p. p. {Concatenated}; p. pr. & vb. n.
|
|
{Concatenating}.] [L. concatenatus, p. p. of concatenare to
|
|
concatenate. See {Catenate}.]
|
|
To link together; to unite in a series or chain, as things
|
|
depending on one another.
|
|
</code></pre>
|
|
|
|
<h2><a name=aspell href=#aspell>#</a> aspell</h2>
|
|
|
|
<p>Need to interactively spell-check your presentation notes?</p>
|
|
|
|
<pre><code>$ aspell check presentation
|
|
</code></pre>
|
|
|
|
<p>Just want a list of potentially-misspelled words in a given file?</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ aspell list < ../literary_environment/index.md | sort | uniq -ci | sort -nr | head -5
|
|
40 td
|
|
24 Veselka
|
|
17 Reuel
|
|
16 Brunner
|
|
15 Tiptree
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<h2><a name=mostcommon href=#mostcommon>#</a> mostcommon</h2>
|
|
|
|
<p>Something like that last sequence sure does seem to show up a lot in my work:
|
|
Spit out the <em>n</em> most common lines in the input, one way or another. Here’s
|
|
a little script to be less repetitive about it.</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ aspell list < ../literary_environment/index.md | ./mostcommon -i -n5
|
|
40 td
|
|
24 Veselka
|
|
17 Reuel
|
|
16 Brunner
|
|
15 Tiptree
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>This turns out to be pretty simple:</p>
|
|
|
|
<!-- exec -->
|
|
|
|
|
|
<pre><code>$ cat ./mostcommon
|
|
#!/usr/bin/env bash
|
|
|
|
# Optionally specify number of lines to show, defaulting to 10:
|
|
TOSHOW=10
|
|
CASEOPT=""
|
|
|
|
while getopts ":in:" opt; do
|
|
case $opt in
|
|
i)
|
|
CASEOPT="-i"
|
|
;;
|
|
n)
|
|
TOSHOW=$OPTARG
|
|
;;
|
|
\?)
|
|
echo "Invalid option: -$OPTARG" >&2
|
|
exit 1
|
|
;;
|
|
:)
|
|
echo "Option -$OPTARG requires an argument." >&2
|
|
exit 1
|
|
;;
|
|
esac
|
|
done
|
|
|
|
# sort and then uniqify STDIN,
|
|
# sort numerically on the first field,
|
|
# chop off everything but $TOSHOW lines of input
|
|
|
|
sort < /dev/stdin | uniq -c $CASEOPT | sort -k1 -nr | head -$TOSHOW
|
|
</code></pre>
|
|
|
|
<!-- end -->
|
|
|
|
|
|
<p>Notice, though, that it doesn’t handle opening files directly. If you wanted
|
|
to find the most common lines in a file with it, you’d have to say something
|
|
like <code>mostcommon < filename</code> in order to redirect the file to <code>mostcommon</code>’s
|
|
input.</p>
|
|
|
|
<p>Also notice that most of the script is boilerplate for handling a couple of
|
|
options. The work is all done in a oneliner. Worth it? Maybe not, but an
|
|
interesting exercise.</p>
|
|
|
|
<h2><a name=cal-and-ncal href= |