Solaronite: Filename-encoded Metadata

Since naming my content manager, and more importantly separating it out into its own egg, independent of the website-specific code of retroj.net, I've had a bout of work, progressed nicely, mostly laterally — meaning that I have changed designs without, generally, affecting appearances. Now I want to tackle a problem very much of appearances, which is default index pages. To be clear about terminology, what is an index page and what do I mean by a default index page? An index page is the default document in a directory; when a visitor visits that directory, its index page is served. A default index page represents what to do when the directory has no index. Make sense? So when a directory has no index, it falls back to a default, which is possibly the default for the particular subsection, or top-level section, or even a site-wide default. These default indexes will actually be scheme procedures in most cases, and they will produce a list of what is in the directory, maybe with excerpts, sorted by date.

So what I want to write about today is actually just one little piece of that puzzle — to be able to sort articles by date, the system needs to know the date of each article, and at present, it does not. I discard two of the first ideas that come to mind, which are creation timestamps of the files, or timestamps from patches in git — neither of these are workable because those timestamps only represent some arbitrary point when an article was edited or added to the repository. I need to store the date of an article somewhere as metadata. This metadata might be stored in the file itself, or an auxiliary file, but I want it to be as convenient to me, the blogger, as possible, which brings me to my topic. As I have written previously it is an attractive prospect to store some metadata encoded directly into the filename.

To recap, the idea is to have the article files named with both a date and a shortname. So this article I am writing now would be called:

20120504--solaronite-filename-encoded-metadata.md

The double hyphen serves as a separator, allowing for parsing.

So I want to pull that date out of the filename and store that as metadata associated with the document. The content manager is already parsing filenames for the purpose of path translations as described in the blog post linked above. If that is where the information is being decoded, should we tap into the path translation procedure to also store metadata, or should we duplicate the step later on in the process of adding a file to the content tree?

Tapping into the path translation procedure would seem to make metadata collection available to only documents that go through path translation. On the other hand, duplicating the effort later on introduces both redundant computation and API complexity. These thoughts lead me to the idea of expanding upon the path translation procedure: make it responsible for both path translation and metadata collection; maybe give it a different name; add something to the API that allows the user to specify metadata collection without performing a path translation.

Here is an example of a path translation rule, showing a filename pattern associated with a translation to an URL path.

((translate-paths . ([(Y / m / Y m d "--" short-title) .
                      (Y / m / d / short-title)])))

What if I wanted to grab the metadata but not translate the path? Maybe something like this would do:

((translate-paths . ([(Y / m / Y m d "--" short-title)])))

Or the intent could be spelled out more explicitly:

((translate-paths . ([(Y / m / Y m d "--" short-title) .
                      no-translate])))

And given this new purpose, maybe translate-paths is not the best name. Things to think about!

The next question after that is what type of representation of the date should be used in the metadata? Clearly, it will have to be a representation capable of sparse data — a simple timestamp will not do because one cannot simply fill in blanks with defaults. I have had plans to write a date library capable of handling such ambiguities for several years; maybe this will be the impetus to finally do it.

Followup

The feature is now implemented, and one thing I learned is that it will be necessary to have different metadata keys for files than directories. Why? Index documents. An index document is not an independent content-entry, but is the resource of a directory content-entry. So that one content-entry's metadata has to cover both directory and file. The simplest way to do that, I think, is to use different keys, and simply avoid any potential name conflicts.

This metadata stuff could get very complex.