Brennen Bearnes 4 years ago
parent
commit
b6961299e2

+ 3
- 5
footer.html View File

@@ -1,10 +1,8 @@
1 1
 <script>
2 2
 $(document).ready(function () {
3
-
4
-  // ☜ ☝ ☞ ☟
5
-  // ☆ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪ 
6
-  var closed_sigil = '⇩';
7
-  var open_sigil = '⇧';
3
+  // ☜ ☝ ☞ ☟ ☆ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪ 
4
+  var closed_sigil = 'show';
5
+  var open_sigil = 'hide';
8 6
 
9 7
   var togglesigil = function (elem) {
10 8
     var sigil = $(elem).html();

+ 212
- 28
index.html View File

@@ -76,8 +76,9 @@ mirror</a>, and welcome feedback there.</p>
76 76
 <li><a href="#code-help-code-and-man-pages"><code>&ndash;help</code> and man pages</a></li>
77 77
 <li><a href="#wc">wc</a></li>
78 78
 <li><a href="#head-tail-and-cut">head, tail, and cut</a></li>
79
+<li><a href="#tab-separated-values">tab separated values</a></li>
79 80
 <li><a href="#finding-text-grep">finding text: grep</a></li>
80
-<li><a href="#now-you-have-n-problems-regex-rabbit-holes">now you have n problems: regex + rabbit holes</a></li>
81
+<li><a href="#now-you-have-n-problems-regex-and-rabbit-holes">now you have n problems: regex and rabbit holes</a></li>
81 82
 </ul>
82 83
 </li>
83 84
 <li><a href="#a-literary-problem">2. a literary problem</a></li>
@@ -861,6 +862,190 @@ you could instead do:</p>
861 862
 <!-- end -->
862 863
 
863 864
 
865
+<h2><a name=tab-separated-values href=#tab-separated-values>#</a> tab separated values</h2>
866
+
867
+<p>Notice above how we had to tell <code>cut</code> that &ldquo;fields&rdquo; in <code>authors_*</code> are
868
+delimited by spaces?  It turns out that if you don&rsquo;t use <code>-d</code>, <code>cut</code> defaults
869
+to using tab characters for a delimiter.</p>
870
+
871
+<p>Tab characters are sort of weird little animals.  You can&rsquo;t usually <em>see</em> them
872
+directly &ndash; they&rsquo;re like a space character that takes up more than one space
873
+when displayed.  By convention, one tab is usually rendered as 8 spaces, but
874
+it&rsquo;s up to the software that&rsquo;s displaying the character what it wants to do.</p>
875
+
876
+<p>(In fact, it&rsquo;s more complicated than that:  Tabs are often rendered as marking
877
+<em>tab stops</em>, which is a concept I remember from 7th grade typing classes, but
878
+haven&rsquo;t actually thought about in my day-to-day life for nearly 20 years.)</p>
879
+
880
+<p>Here&rsquo;s a version of our <code>all_authors</code> that&rsquo;s been rearranged so that the first
881
+field is the author&rsquo;s last name, the second is their first name, the third is
882
+their middle name or initial (if we know it) and the fourth is any suffix.
883
+Fields are separated by a single tab character:</p>
884
+
885
+<!-- exec -->
886
+
887
+
888
+<pre><code>$ cat all_authors.tsv
889
+Robinson    Eden
890
+Waring  Gwendolyn   L.
891
+Tiptree James       Jr.
892
+Brunner John
893
+Tolkien John    Ronald Reuel
894
+Walton  Jo
895
+Toews   Miriam
896
+Cadigan Pat
897
+Le Guin Ursula  K.
898
+Veselka Vanessa
899
+</code></pre>
900
+
901
+<!-- end -->
902
+
903
+
904
+<p>That looks kind of garbled, right?  In order to make it a little more obvious
905
+what&rsquo;s happening, let&rsquo;s use <code>cat -T</code>, which displays tab characters as <code>^I</code>:</p>
906
+
907
+<!-- exec -->
908
+
909
+
910
+<pre><code>$ cat -T all_authors.tsv
911
+Robinson^IEden
912
+Waring^IGwendolyn^IL.
913
+Tiptree^IJames^I^IJr.
914
+Brunner^IJohn
915
+Tolkien^IJohn^IRonald Reuel
916
+Walton^IJo
917
+Toews^IMiriam
918
+Cadigan^IPat
919
+Le Guin^IUrsula^IK.
920
+Veselka^IVanessa
921
+</code></pre>
922
+
923
+<!-- end -->
924
+
925
+
926
+<p>It looks odd when displayed because some names are at or nearly at 8 characters long.
927
+&ldquo;Robinson&rdquo;, at 8 characters, overshoots the first tab stop, so &ldquo;Eden&rdquo; gets indented
928
+further than other first names, and so on.</p>
929
+
930
+<p>Fortunately, in order to make this more human-readable, we can pass it through
931
+<code>expand</code>, which turns tabs into a given number of spaces (8 by default):</p>
932
+
933
+<!-- exec -->
934
+
935
+
936
+<pre><code>$ expand -t14 all_authors.tsv
937
+Robinson      Eden
938
+Waring        Gwendolyn     L.
939
+Tiptree       James                       Jr.
940
+Brunner       John
941
+Tolkien       John          Ronald Reuel
942
+Walton        Jo
943
+Toews         Miriam
944
+Cadigan       Pat
945
+Le Guin       Ursula        K.
946
+Veselka       Vanessa
947
+</code></pre>
948
+
949
+<!-- end -->
950
+
951
+
952
+<p>Now it&rsquo;s easy to sort by last name:</p>
953
+
954
+<!-- exec -->
955
+
956
+
957
+<pre><code>$ sort -k1 all_authors.tsv | expand -t14
958
+Brunner       John
959
+Cadigan       Pat
960
+Le Guin       Ursula        K.
961
+Robinson      Eden
962
+Tiptree       James                       Jr.
963
+Toews         Miriam
964
+Tolkien       John          Ronald Reuel
965
+Veselka       Vanessa
966
+Walton        Jo
967
+Waring        Gwendolyn     L.
968
+</code></pre>
969
+
970
+<!-- end -->
971
+
972
+
973
+<p>Or just extract middle names and initials:</p>
974
+
975
+<!-- exec -->
976
+
977
+
978
+<pre><code>$ cut -f3 all_authors.tsv | grep .
979
+L.
980
+Ronald Reuel
981
+K.
982
+</code></pre>
983
+
984
+<!-- end -->
985
+
986
+
987
+<p>It probably won&rsquo;t surprise you to learn that there&rsquo;s a corresponding <code>paste</code>
988
+command, which takes two or more files and stitches them together with tab
989
+characters.  Let&rsquo;s extract a couple of things from our author list and put them
990
+back together in a different order:</p>
991
+
992
+<!-- exec -->
993
+
994
+
995
+<pre><code>$ cut -f1 all_authors.tsv &gt; lastnames
996
+</code></pre>
997
+
998
+<!-- end -->
999
+
1000
+
1001
+
1002
+
1003
+<!-- exec -->
1004
+
1005
+
1006
+<pre><code>$ cut -f2 all_authors.tsv &gt; firstnames
1007
+</code></pre>
1008
+
1009
+<!-- end -->
1010
+
1011
+
1012
+
1013
+
1014
+<!-- exec -->
1015
+
1016
+
1017
+<pre><code>$ paste firstnames lastnames | sort -k2 | expand -t12
1018
+John        Brunner
1019
+Pat         Cadigan
1020
+Ursula      Le Guin
1021
+Eden        Robinson
1022
+James       Tiptree
1023
+Miriam      Toews
1024
+John        Tolkien
1025
+Vanessa     Veselka
1026
+Jo          Walton
1027
+Gwendolyn   Waring
1028
+</code></pre>
1029
+
1030
+<!-- end -->
1031
+
1032
+
1033
+<p>As these examples show, TSV is something very like a primitive spreadsheet:  A
1034
+way to represent information in columns and rows.  In fact, it&rsquo;s a close cousin
1035
+of CSV, which is often used as a lowest-common-denominator format for
1036
+transferring spreadsheets, and which represents data something like this:</p>
1037
+
1038
+<pre><code>last,first,middle,suffix
1039
+Tolkien,John,Ronald Reuel,
1040
+Tiptree,James,,Jr.
1041
+</code></pre>
1042
+
1043
+<p>The advantage of tabs is that they&rsquo;re supported by a bunch of the standard
1044
+tools.  A disadvantage is that they&rsquo;re kind of ugly and can be weird to deal
1045
+with, but they&rsquo;re useful anyway, and character-delimited rows are often a
1046
+good-enough way to hack your way through problems that call for basic
1047
+structure.</p>
1048
+
864 1049
 <h2><a name=finding-text-grep href=#finding-text-grep>#</a> finding text: grep</h2>
865 1050
 
866 1051
 <p>After all those contortions, what if you actually just want to see <em>which lists</em>
@@ -899,7 +1084,7 @@ words have been written on this topic by leading lights of the nerd community.</
899 1084
 isn&rsquo;t very useful to us).  That&rsquo;s because all <code>grep</code> saw was the lines in the
900 1085
 files, not the names of the files themselves.</p>
901 1086
 
902
-<h2><a name=now-you-have-n-problems-regex-rabbit-holes href=#now-you-have-n-problems-regex-rabbit-holes>#</a> now you have n problems: regex + rabbit holes</h2>
1087
+<h2><a name=now-you-have-n-problems-regex-and-rabbit-holes href=#now-you-have-n-problems-regex-and-rabbit-holes>#</a> now you have n problems: regex and rabbit holes</h2>
903 1088
 
904 1089
 <p>To close out this introductory chapter, let&rsquo;s spend a little time on a topic
905 1090
 that will likely vex, confound, and (occasionally) delight you for as long as
@@ -936,18 +1121,18 @@ shell to match groups of files, but for text in general and with more magic.</p>
936 1121
 by <code>grep</code>, other magical things include:</p>
937 1122
 
938 1123
 <table>
939
-  <tr><td><code>^</code>    </td>  <td>start of a line                        </td></tr>
940
-  <tr><td><code>$</code>    </td>  <td>end of a line                          </td></tr>
941
-  <tr><td><code>[abc]</code></td>  <td>one of a, b, or c                      </td></tr>
942
-  <tr><td><code>[a-z]</code></td>  <td>a character in the range a through z   </td></tr>
943
-  <tr><td><code>[0-9]</code></td>  <td>a character in the range 0 through 9   </td></tr>
944
-
945
-  <tr><td><code>+</code>    </td>  <td>one or more of the preceding thing     </td></tr>
946
-  <tr><td><code>?</code>    </td>  <td>0 or 1 of the preceding thing          </td></tr>
947
-  <tr><td><code>*</code>    </td>  <td>any number of the preceding thing      </td></tr>
948
-
949
-  <tr><td><code>(foo|bar)</code></td>  <td>"foo" or "bar"</td></tr>
950
-  <tr><td><code>(foo)?</code></td>     <td>optional "foo"</td></tr>
1124
+    <tr><td><code>^</code>    </td>  <td>start of a line                     </td></tr>
1125
+    <tr><td><code>$</code>    </td>  <td>end of a line                       </td></tr>
1126
+    <tr><td><code>[abc]</code></td>  <td>one of a, b, or c                   </td></tr>
1127
+    <tr><td><code>[a-z]</code></td>  <td>a character in the range a through z</td></tr>
1128
+    <tr><td><code>[0-9]</code></td>  <td>a character in the range 0 through 9</td></tr>
1129
+
1130
+    <tr><td><code>+</code>    </td>  <td>one or more of the preceding thing  </td></tr>
1131
+    <tr><td><code>?</code>    </td>  <td>0 or 1 of the preceding thing       </td></tr>
1132
+    <tr><td><code>*</code>    </td>  <td>any number of the preceding thing   </td></tr>
1133
+
1134
+    <tr><td><code>(foo|bar)</code></td>  <td>"foo" or "bar"</td></tr>
1135
+    <tr><td><code>(foo)?</code></td>     <td>optional "foo"</td></tr>
951 1136
 </table>
952 1137
 
953 1138
 
@@ -1549,6 +1734,9 @@ the same thing as `cat all_authors | nl`, or `nl all_authors`.  You won't see
1549 1734
     $ sort colors | uniq -i | tail -1
1550 1735
     $ cut -d' ' -f1 ./authors_* | sort | uniq -ci | sort -n | tail -3
1551 1736
     $ sort -u ./authors_* | cut -d' ' -f1 | uniq -ci | sort -n | tail -3
1737
+    $ sort -k1 all_authors.tsv | expand -t14
1738
+    $ cut -f3 all_authors.tsv | grep .
1739
+    $ paste firstnames lastnames | sort -k2 | expand -t12
1552 1740
     $ cat ./authors_* | grep 'Vanessa'
1553 1741
 </code></pre>
1554 1742
 
@@ -2447,11 +2635,9 @@ If you squint, these look kind of like paths to files on your filesystem.</p>
2447 2635
 <hr />
2448 2636
 <script>
2449 2637
 $(document).ready(function () {
2450
-
2451
-  // ☜ ☝ ☞ ☟
2452
-  // ☆ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪ 
2453
-  var closed_sigil = '⇩';
2454
-  var open_sigil = '⇧';
2638
+  // ☜ ☝ ☞ ☟ ☆ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪ 
2639
+  var closed_sigil = 'show';
2640
+  var open_sigil = 'hide';
2455 2641
 
2456 2642
   var togglesigil = function (elem) {
2457 2643
     var sigil = $(elem).html();
@@ -2462,20 +2648,18 @@ $(document).ready(function () {
2462 2648
     }
2463 2649
   };
2464 2650
 
2465
-  var togglebutton = function (e) {
2466
-    e.preventDefault();
2467
-    $details_full.toggle({
2468
-      duration: 550
2469
-    });
2470
-    togglesigil(this);
2471
-  };
2472
-
2473 2651
   $(".details").each(function () {
2474 2652
     var $this = $(this);
2475 2653
     var $button = $('<button class=clicker-button>' + closed_sigil + '</button>');
2476 2654
     var $details_full = $(this).find('.full');
2477 2655
 
2478
-    $button.click(togglebutton);
2656
+    $button.click(function (e) {
2657
+      e.preventDefault();
2658
+      $details_full.toggle({
2659
+        duration: 550
2660
+      });
2661
+      togglesigil(this);
2662
+    });
2479 2663
 
2480 2664
     $(this).find('.clicker').append($button);
2481 2665
     $button.show();

+ 10
- 0
literary_environment/all_authors.tsv View File

@@ -0,0 +1,10 @@
1
+Robinson	Eden
2
+Waring	Gwendolyn	L.
3
+Tiptree	James		Jr.
4
+Brunner	John
5
+Tolkien	John	Ronald Reuel
6
+Walton	Jo
7
+Toews	Miriam
8
+Cadigan	Pat
9
+Le Guin	Ursula	K.
10
+Veselka	Vanessa

+ 10
- 0
literary_environment/firstnames View File

@@ -0,0 +1,10 @@
1
+Eden
2
+Gwendolyn
3
+James
4
+John
5
+John
6
+Jo
7
+Miriam
8
+Pat
9
+Ursula
10
+Vanessa

+ 170
- 14
literary_environment/index.md View File

@@ -670,6 +670,162 @@ you could instead do:
670 670
 
671 671
 <!-- end -->
672 672
 
673
+tab separated values
674
+--------------------
675
+
676
+Notice above how we had to tell `cut` that "fields" in `authors_*` are
677
+delimited by spaces?  It turns out that if you don't use `-d`, `cut` defaults
678
+to using tab characters for a delimiter.
679
+
680
+Tab characters are sort of weird little animals.  You can't usually _see_ them
681
+directly -- they're like a space character that takes up more than one space
682
+when displayed.  By convention, one tab is usually rendered as 8 spaces, but
683
+it's up to the software that's displaying the character what it wants to do.
684
+
685
+(In fact, it's more complicated than that:  Tabs are often rendered as marking
686
+_tab stops_, which is a concept I remember from 7th grade typing classes, but
687
+haven't actually thought about in my day-to-day life for nearly 20 years.)
688
+
689
+Here's a version of our `all_authors` that's been rearranged so that the first
690
+field is the author's last name, the second is their first name, the third is
691
+their middle name or initial (if we know it) and the fourth is any suffix.
692
+Fields are separated by a single tab character:
693
+
694
+<!-- exec -->
695
+
696
+    $ cat all_authors.tsv
697
+    Robinson	Eden
698
+    Waring	Gwendolyn	L.
699
+    Tiptree	James		Jr.
700
+    Brunner	John
701
+    Tolkien	John	Ronald Reuel
702
+    Walton	Jo
703
+    Toews	Miriam
704
+    Cadigan	Pat
705
+    Le Guin	Ursula	K.
706
+    Veselka	Vanessa
707
+
708
+<!-- end -->
709
+
710
+That looks kind of garbled, right?  In order to make it a little more obvious
711
+what's happening, let's use `cat -T`, which displays tab characters as `^I`:
712
+
713
+<!-- exec -->
714
+
715
+    $ cat -T all_authors.tsv
716
+    Robinson^IEden
717
+    Waring^IGwendolyn^IL.
718
+    Tiptree^IJames^I^IJr.
719
+    Brunner^IJohn
720
+    Tolkien^IJohn^IRonald Reuel
721
+    Walton^IJo
722
+    Toews^IMiriam
723
+    Cadigan^IPat
724
+    Le Guin^IUrsula^IK.
725
+    Veselka^IVanessa
726
+
727
+<!-- end -->
728
+
729
+It looks odd when displayed because some names are at or nearly at 8 characters long.
730
+"Robinson", at 8 characters, overshoots the first tab stop, so "Eden" gets indented
731
+further than other first names, and so on.
732
+
733
+Fortunately, in order to make this more human-readable, we can pass it through
734
+`expand`, which turns tabs into a given number of spaces (8 by default):
735
+
736
+<!-- exec -->
737
+
738
+    $ expand -t14 all_authors.tsv
739
+    Robinson      Eden
740
+    Waring        Gwendolyn     L.
741
+    Tiptree       James                       Jr.
742
+    Brunner       John
743
+    Tolkien       John          Ronald Reuel
744
+    Walton        Jo
745
+    Toews         Miriam
746
+    Cadigan       Pat
747
+    Le Guin       Ursula        K.
748
+    Veselka       Vanessa
749
+
750
+<!-- end -->
751
+
752
+Now it's easy to sort by last name:
753
+
754
+<!-- exec -->
755
+
756
+    $ sort -k1 all_authors.tsv | expand -t14
757
+    Brunner       John
758
+    Cadigan       Pat
759
+    Le Guin       Ursula        K.
760
+    Robinson      Eden
761
+    Tiptree       James                       Jr.
762
+    Toews         Miriam
763
+    Tolkien       John          Ronald Reuel
764
+    Veselka       Vanessa
765
+    Walton        Jo
766
+    Waring        Gwendolyn     L.
767
+
768
+<!-- end -->
769
+
770
+Or just extract middle names and initials:
771
+
772
+<!-- exec -->
773
+
774
+    $ cut -f3 all_authors.tsv | grep .
775
+    L.
776
+    Ronald Reuel
777
+    K.
778
+
779
+<!-- end -->
780
+
781
+It probably won't surprise you to learn that there's a corresponding `paste`
782
+command, which takes two or more files and stitches them together with tab
783
+characters.  Let's extract a couple of things from our author list and put them
784
+back together in a different order:
785
+
786
+<!-- exec -->
787
+
788
+    $ cut -f1 all_authors.tsv > lastnames
789
+    
790
+<!-- end -->
791
+
792
+<!-- exec -->
793
+
794
+    $ cut -f2 all_authors.tsv > firstnames
795
+    
796
+<!-- end -->
797
+
798
+<!-- exec -->
799
+
800
+    $ paste firstnames lastnames | sort -k2 | expand -t12
801
+    John        Brunner
802
+    Pat         Cadigan
803
+    Ursula      Le Guin
804
+    Eden        Robinson
805
+    James       Tiptree
806
+    Miriam      Toews
807
+    John        Tolkien
808
+    Vanessa     Veselka
809
+    Jo          Walton
810
+    Gwendolyn   Waring
811
+
812
+<!-- end -->
813
+
814
+As these examples show, TSV is something very like a primitive spreadsheet:  A
815
+way to represent information in columns and rows.  In fact, it's a close cousin
816
+of CSV, which is often used as a lowest-common-denominator format for
817
+transferring spreadsheets, and which represents data something like this:
818
+
819
+    last,first,middle,suffix
820
+    Tolkien,John,Ronald Reuel,
821
+    Tiptree,James,,Jr.
822
+
823
+The advantage of tabs is that they're supported by a bunch of the standard
824
+tools.  A disadvantage is that they're kind of ugly and can be weird to deal
825
+with, but they're useful anyway, and character-delimited rows are often a
826
+good-enough way to hack your way through problems that call for basic
827
+structure.
828
+
673 829
 finding text: grep
674 830
 ------------------
675 831
 
@@ -703,8 +859,8 @@ You've probably noticed that this result doesn't contain filenames (and thus
703 859
 isn't very useful to us).  That's because all `grep` saw was the lines in the
704 860
 files, not the names of the files themselves.
705 861
 
706
-now you have n problems: regex + rabbit holes
707
----------------------------------------------
862
+now you have n problems: regex and rabbit holes
863
+-----------------------------------------------
708 864
 
709 865
 To close out this introductory chapter, let's spend a little time on a topic
710 866
 that will likely vex, confound, and (occasionally) delight you for as long as
@@ -738,18 +894,18 @@ The pattern `Jo.*` says that we're looking for lines which contain a literal
738 894
 by `grep`, other magical things include:
739 895
 
740 896
 <table>
741
-  <tr><td><code>^</code>    </td>  <td>start of a line                        </td></tr>
742
-  <tr><td><code>$</code>    </td>  <td>end of a line                          </td></tr>
743
-  <tr><td><code>[abc]</code></td>  <td>one of a, b, or c                      </td></tr>
744
-  <tr><td><code>[a-z]</code></td>  <td>a character in the range a through z   </td></tr>
745
-  <tr><td><code>[0-9]</code></td>  <td>a character in the range 0 through 9   </td></tr>
746
-
747
-  <tr><td><code>+</code>    </td>  <td>one or more of the preceding thing     </td></tr>
748
-  <tr><td><code>?</code>    </td>  <td>0 or 1 of the preceding thing          </td></tr>
749
-  <tr><td><code>*</code>    </td>  <td>any number of the preceding thing      </td></tr>
750
-
751
-  <tr><td><code>(foo|bar)</code></td>  <td>"foo" or "bar"</td></tr>
752
-  <tr><td><code>(foo)?</code></td>     <td>optional "foo"</td></tr>
897
+    <tr><td><code>^</code>    </td>  <td>start of a line                     </td></tr>
898
+    <tr><td><code>$</code>    </td>  <td>end of a line                       </td></tr>
899
+    <tr><td><code>[abc]</code></td>  <td>one of a, b, or c                   </td></tr>
900
+    <tr><td><code>[a-z]</code></td>  <td>a character in the range a through z</td></tr>
901
+    <tr><td><code>[0-9]</code></td>  <td>a character in the range 0 through 9</td></tr>
902
+
903
+    <tr><td><code>+</code>    </td>  <td>one or more of the preceding thing  </td></tr>
904
+    <tr><td><code>?</code>    </td>  <td>0 or 1 of the preceding thing       </td></tr>
905
+    <tr><td><code>*</code>    </td>  <td>any number of the preceding thing   </td></tr>
906
+
907
+    <tr><td><code>(foo|bar)</code></td>  <td>"foo" or "bar"</td></tr>
908
+    <tr><td><code>(foo)?</code></td>     <td>optional "foo"</td></tr>
753 909
 </table>
754 910
 
755 911
 It's actually a little more complicated than that:  By default, if you want to

+ 10
- 0
literary_environment/lastnames View File

@@ -0,0 +1,10 @@
1
+Robinson
2
+Waring
3
+Tiptree
4
+Brunner
5
+Tolkien
6
+Walton
7
+Toews
8
+Cadigan
9
+Le Guin
10
+Veselka