brennen/wrt: Almost-minimal filesystem based blog. - wrt

Almost-minimal filesystem based blog.
Brennen Bearnes bb32d648c7 EntryStore: Add methods for further slicing entry list Including: - children() - get immediate children of entry - parent($entry) - get immediate parent of entry - days_for($container) - get days in a month or year - months_for($year) - get months contained in a year - basename($entry) - get basename of an entry These are used in App::WRT::year() and App::WRT::month(), replacing some dir_list() invocations and at least somewhat simplifying the code.		5 years ago
bin	Add bin/wrt-help and bin/wrt-version	5 years ago
example	EntryStore: Add methods for further slicing entry list	5 years ago
lib/App	EntryStore: Add methods for further slicing entry list	5 years ago
t	EntryStore: Add methods for further slicing entry list	5 years ago
.gitignore	.gitignore: add pod2htmd.tmp	5 years ago
.travis.yml	mv travis.yml .travis.yml	8 years ago
Build.PL	v6.0.0: expand EntryStore, test more, cache harder	5 years ago
COPYING	add a copy of the gpl and a LICENSE section	6 years ago
Changes	v6.0.0: expand EntryStore, test more, cache harder	5 years ago
README.pod	v6.0.0: expand EntryStore, test more, cache harder	5 years ago
benchmark.pl	Merging in newer Wala stuff.	16 years ago
README.pod

=pod

=head1 NAME

App::WRT - WRiting Tool, a static site/blog generator and related utilities

=for HTML <a href="https://travis-ci.org/brennen/wrt"><img src="https://travis-ci.org/brennen/wrt.svg?branch=master"></a>

=head1 SYNOPSIS

Using the commandline tools:

    $ mkdir project
    $ cd project
    $ wrt init         # set up some defaults
    $ wrt config       # dump configuration values
    $ wrt ls           # list entries
    $ wrt display new  # print HTML for new entries to stdout
    $ wrt render-all   # publish HTML to project/public/

Using App::WRT in library form:

    #!/usr/bin/env perl

    use App::WRT;
    my $w = App::WRT->new(
      entry_dir => 'archives',
      url_root  => '/',
      # etc.
    );
    print $w->display(@ARGV);

=head1 INSTALLING

It's possible but not likely this would run on a Perl as old as 5.10.0.  In
practice, I know that it works under 5.26.2.  It should be fine on any
reasonably modern Linux distribution, and may work on MacOS or a BSD of your
choosing.  It's possible that it would run under the Windows Subsystem for
Linux, but it would definitely fail under vanilla Windows; it currently makes
too many assumptions about things like directory path separators and filesystem
semantics.

(Although I would like the code to be more robust across platforms, this is not
a problem I feel much urgency about solving at the moment, since I'm pretty
sure I am the only user of this software.  Patches would certainly be welcome.)

To install the latest development version from the main repo:

    $ git clone https://code.p1k3.com/gitea/brennen/wrt.git
    $ cd wrt
    $ perl Build.PL
    $ ./Build installdeps
    $ ./Build test
    $ ./Build install

To install the latest version released on CPAN:

    $ cpanm App::WRT

Or:

    $ cpan -i App::WRT

You will likely need to use C<sudo> or C<su> to get a systemwide install.

=head1 DESCRIPTION

This started life somewhere around 2001 as C<display.pl>, a CGI script to
concatenate fragments of handwritten HTML by date.  It has since accumulated
several of the usual weblog features (lightweight markup, feed generation,
embedded Perl, poetry tools, image galleries, and ill-advised dependencies),
but the basic idea hasn't changed that much.

The C<wrt> utility now generates static HTML files, instead of expecting to
run as a CGI script.  This is a better idea, for the most part.

By default, entries are stored in a simple directory tree under C<entry_dir>.

Like:

     archives/2001/1/1
     archives/2001/1/2/index
     archives/2001/1/2/sub_entry

Which will publish files like so:

     public/index.html
     public/all/index.html
     public/2001/index.html
     public/2001/1/index.html
     public/2001/1/1/index.html
     public/2001/1/2/index.html
     public/2001/1/2/sub_entry/index.html

Contents will be generated for each year and for the entire collection of dated
entries.  Month indices will consist of all entries for that month.  A
top-level index file will consist of the most recent month's entries.

It's possible (although not as flexible as it ought to be) to redefine the
directory layout.  (See C<%default{entry_map}> below.)

An entry may be either a plain UTF-8 text file, or a directory containing
several such files.  If it's a directory, a file named "index" will be treated
as the text of the entry, and all other lower-case filenames without extensions
will be treated as sub-entries or documents within that entry, and displayed
accordingly.  Links to certain other filetypes will be displayed as well.

Directories may be nested to an arbitrary depth, although it's probably not a
good idea to go very deep with the current display logic.

A PNG or JPEG file with a name like

    2001/1/1.icon.png
    2001/1/1/index.icon.png
    2001/1/1/whatever.icon.png
    2001/1/1/whatever/index.icon.png

will be treated as an icon for the corresponding entry file.

=head2 MARKUP

Entries may consist of hand-written HTML (to be passed along without further
interpretation), a supported form of lightweight markup, or some combination
thereof. Actually, an entry may consist of any darn thing you please, as long
as Perl will agree that it is text, but presumably you're going to be feeding
this to a browser.

Header tags (<h1>, <h2>, etc.) will be used to display titles in feeds and
other places.

Other special markup is indicated by a variety of HTML-like container tags.

B<Embedded Perl> - evaluated and replaced by whatever value you return
(evaluated in a scalar context):

     <perl>my $dog = "Ralph."; return $dog;</perl>

This code is evaluated before any other processing is done, so you can return
any other markup understood by the script and have it handled appropriately.

B<Interpolated variables> - actually keys to the hash underlying the App::WRT
object, for the moment:

     <perl>$self->{title} = "About Ralph, My Dog"; return '';</perl>

     <p>The title is <em>${title}</em>.</p>

This is likely to change at some point, so don't build anything too elaborate
on it.

Embedded code and variables are intended only for use in the F<template> file,
where it's handy to drop in titles or conditionalize aspects of a layout. You
want to be careful with this sort of thing - it's useful in small doses, but
it's also a maintainability nightmare waiting to happen.

B<Includes> - replaced by the contents of the enclosed file path, from the
root of the current wrt project:

    <include>path/to/file</include>

This is a bit constraining, since it doesn't currently allow for files outside
of the current project, but is useful for including HTML generated by an
external script in a page.

B<Several forms of lightweight markup>:

     <markdown>John Gruber's Markdown, by way of
     Text::Markdown::Discount</markdown>

     <textile>Dean Allen's Textile, via Brad Choate's
     Text::Textile.</textile>

     <freeverse>An easy way to
     get properly broken lines
     plus -- em dashes --
     for poetry and such.</freeverse>

B<And a couple of shortcuts>:

     <image>filename.ext
     alt text, if any</image>

     <list>
     one list item

     another list item
     </list>

As it stands, freeverse, image, and list are not particularly robust.

=head2 TEMPLATES

A single template, specified by the C<template_dir> and C<template> config
values, is used to render all pages.  See F<example/templates/basic> for an
example, or run C<wrt init> in an empty directory and look at
F<templates/default>.

Here's a short example:

    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="UTF-8">
      <title>${title_prefix} - ${title}</title>
    </head>

    <body>
    ${content}
    </body>

    </html>

Within templates, C<${foo}> will be replaced with the corresponding
configuration value.  C<${content}> will always be set to the content of the
current entry.

=head2 CONFIGURATION

Configuration is read from a F<wrt.json> in the directory where the C<wrt>
utility is invoked, or can (usually) be specified with the C<--config> option.

See F<example/wrt.json> for a sample configuration.

Under the hood, configuration is done by combining a hash called C<%default>
with values pulled out of the JSON file.  Most defaults can be overwritten
from the config file, but changing some would require writing Perl, since
they contain things like subroutine references.

=over

=item %default

Here's a verbatim copy of C<%default>, with some commentary about values.

    my %default = (
      root_dir       => '.',         # dir for wrt repository
      entry_dir      => 'archives',  # dir for entry files
      publish_dir    => 'public',    # dir to publish site to
      url_root       => "/",         # root URL for building links
      image_url_root => '',          # same for images
      template_dir   => 'templates', # dir for template files
      template       => 'default',   # template to use
      title          => '',          # current title (used in template)
      title_prefix   => '',          # a string to slap in front of titles
      stylesheet_url => undef,       # path to a CSS file (used in template)
      favicon_url    => undef,       # path to a favicon (used in template)
      feed_alias     => 'feed',      # what entry path should correspond to feed?
      feed_length    => 30,          # how many entries should there be in the feed?
      author         => undef,       # author name (used in template, feed)
      description    => undef,       # site description (used in template)
      content        => undef,       # place to stash content for templates
      embedded_perl  => 1,           # evaluate embedded <perl> tags?
      default_entry  => 'new',       # what to display if no entry specified
      cache_includes => 0,           # should included files be cached in memory?

      # A license string for site content:
      license        => 'public domain',

      # A string value to replace all pages with (useful for occasional
      # situations where every page of a site should serve some other
      # content in-place, like Net Neutrality protest blackouts):
      overlay        => undef,

      # What gets considered an entry _path_:
      entrypath_expr => qr/^ ([a-z0-9_\/-]+) $/x,

      # What gets considered a subentry file (slightly misleading
      # terminology here):
      subentry_expr => qr/^[0-9a-z_-]+(\.(tgz|zip|tar[.]gz|gz|txt))?$/,

      # We'll show links for these, but not display them inline:
      binfile_expr   => qr/[.](tgz|zip|tar[.]gz|gz|txt|pdf)$/,
    );

=item $default{entry_map}

A hashref which will dispatch entries matching various regexen to the
appropriate output methods. The default looks something like this:

    nnnn/[nn/nn/]doc_name - a document within a day.
    nnnn/nn/nn            - a specific day.
    nnnn/nn               - a month.
    nnnn                  - a year.
    doc_name              - a document in the root directory.

You can re-map things to an arbitrary archive layout.

Since the entry map is a hash, and handle() simply loops over its keys, there
is no guaranteed precedence of patterns. Be extremely careful that no entry
will match more than one pattern, or you will wind up with unexpected behavior.
A good way to ensure that this does not happen is to use patterns like:

    qr(
        ^           # start of string
        [0-9/]{4}/  # year
        [0-9]{1,2}/ # month
        [0-9]{1,2]  # day
        $           # end of string
      )x

...always marking the start and end of the string explicitly.

This may eventually be rewritten to use an array so that the order can be
explicitly specified.

=item $default{entry_descriptions}

A hashref which contains a map of entry titles to entry descriptions.

=item $default{title_cache}

A hashref which contains a cache of entry titles, populated by the renderer.

=back

=head2 METHODS AND INTERNALS

For no bigger than this thing is, the internals are convoluted.  (This is
because it's spaghetti code originally written in a now-archaic language by a
teenager who didn't know how to program.)

=over

=item new_from_file($config_file)

Takes a filename to pull JSON config data out of, and returns a new App::WRT
instance with the parameters set in that file.

=item new(%params)

Get a new WRT object with the specified parameters set.

=item display($entry1, $entry2, ...)

Return a string containing the given entries, which are in the form of
date/entry strings. If no parameters are given, default to default_entry().

display() expands aliases ("new" and "all", for example) as necessary, collects
output from handle($entry), and wraps the whole thing in a template file.

If C<overlay> is set, will return the value of overlay regardless of options.
(This is useful for hackily replacing every page in a site with a single blob
of HTML, for example if you're participating in some sort of blackout or
something.)

=item handle($entry)

Return the text of an individual entry.

=item expand_alias($option)

Expands/converts 'all', 'new', and 'fulltext' to appropriate values.

Removes trailing slashes.

=item link_bar(@extra_links)

Returns a little context-sensitive navigation bar.

=item year($year)

List out the updates for a year.

=item month($month)

Prints the entries in a given month (nnnn/nn).

=item entry_stamped($entry, $level)

Wraps entry() + a datestamp in entry_markup().

=item entry_topic_list($entry)

Get a list of topics (by tag-* files) for the entry.  This hardcodes part of a
p1k3-specific thing which should be moved into wrt entirely.

=item entry($entry)

Returns the contents of a given entry. Calls dir_list and icon_markup.
Recursively calls itself.

=item get_sub_entries($entry_loc)

Returns "sub entries" based on the C<subentry_expr> regexp.

=item list_contents($entry, @entries)

Returns links (maybe with icons) for a set of sub-entries within an entry.

=item icon_markup($entry, $alt)

Check if an icon exists for a given entry if so, return markup to include it.
Icons are PNG or JPEG image files following a specific naming convention:

  index.icon.[png|jp(e)g] for directories
  [filename].icon.[png|jp(e)g] for flat text files

Calls image_size, uses filename to determine type.

=item datestamp($entry)

Returns a nice html datestamp / breadcrumbs for a given entry.

=item fragment_slurp($file)

Read a text fragment, call line_parse() and eval_perl() to take care of
lightweight markup sections and interpret embedded code, and then return it as
a string. Takes one parameter, the name of the file.

=item root_locations($file)

Given a file/entry, return the appropriate concatenations with entry_dir and
url_root.

=item feed_print(@entries)

Return an Atom feed for the given list of entries.

Requires XML::Atom::SimpleFeed.

XML::Atom::SimpleFeed will give bogus results with input that's just a string
of octets (I think) if it contains characters outside of US-ASCII.  In order to
spit out clean UTF-8 output, we need to use Encode::decode() to flag entry
content as UTF-8 / represent it internally as a string of characters.  There's
a whole lot I don't really understand about how this is handled in Perl, and it
may be a locus of bugs elsewhere in wrt, but for now I'm just dealing with it
here.

Some references on that:

=over

=item * L<https://github.com/ap/XML-Atom-SimpleFeed/issues/2>

=item * L<https://rt.cpan.org/Public/Bug/Display.html?id=19722>

=item * L<https://cpanratings.perl.org/dist/XML-Atom-SimpleFeed>

=item * L<perlunitut>

=back

=back

=head1 SEE ALSO

walawiki.org, Blosxom, rassmalog, Text::Textile, XML::Atom::SimpleFeed,
Image::Size, CGI::Fast, and about a gazillion static site generators.

=head1 AUTHOR

Copyright 2001-2017 Brennen Bearnes

=head1 LICENSE

    wrt is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.