My blog used to be hosted on Posterous, but as of this post it’s generated by a little bit of Emacs Lisp that turns Org mode files into Markdown files which are then processed by Octopress to produce the site you see here. This has the usual benefits of static site generators (simple to deploy, easy to version in Git, fast, etc) but I hope that using Org mode will also motivate me to experiment with literate programming via Babel.
An Octopress-flavored Markdown backend
There are already a few other ways to publish Org files as Markdown or Jekyll (or Octopress) blog posts, however desipite my short list of requirements, I found them to be a bit lacking. In proper not-invented-here-style, I set out to write some Emacs Lisp to export Org files to something Octopress could work with. I learned a bit about some new features of Org’s internals along the way, as well as about Emacs, since this is the largest project I’ve undertaken using Emacs Lisp.
Org has a new exporting system that’s not released with the latest stable build yet, but is well documented and is more flexible than the existing generic export library. Writing a simple backend that covered a small subset of Org’s features turned out to be pretty easy to do. If you have some need to process Org files that’s not supported out of the box, I’d recommend trying this approach. What follows are notes I took while working on this, however since this version of Org has not been released yet, you should of course refer to Org’s documentation for any inconsistencies and so on you may find. Extra caveats apply here since I’m a novice both with Org and with Emacs Lisp in general.
Since this new export system is not really released yet, you have to
clone the latest from Git. The new export functionality is defined in
ox.el. Of course, there are some backends packaged with Org already,
including one that exports to “ASCII” text which was extremely
helpful to use as an example.
Defining a new backend is done with the
function, which accepts two arguments, a symbol and an alist. The
symbol is just the name of your new backend. The alist is what
actually defines your backend. The “keys” should be the type of Org
syntax element, and the “values” should be a function that can export
that type of element to your chosen format.
1 2 3 4 5 6 7 8 9 10 11 12
Here I’ve defined a small backend that supports the bare minimum I needed for this blog. There are many more types of Org syntax that I’m not supporting, but since I’m a novice Org user, I figure when I discover a need for those, I’ll add them to the backend.
how Org parses and exports files
When Org exports a file using this backend, and it comes across an
element of the type “bold” for instance, it will call the function
'bold, in this case
function should take three arguments: the node of the abstract syntax
tree for that element, the “contents” of that node, and a plist with
extra information (called the “communication channel” in the Org
docs). It should return the result of exporting that node as a string.
When you publish an Org file, it’s first parsed into an abstract syntax tree, and then the export system calls these functions, in a bottom-up fashion starting with the leaves of the tree. Each function returns a string, and these strings are accumulated together until eventually the function for the root of the tree is called, and the entire document has been converted. This is probably best demonstrated with an example. Here’s a snippet of an Org file:
1 2 3 4 5 6 7
Org will parse this file into the following syntax tree (actually the real tree has much more data attached to each node, and is also recursive making it difficult to print):
1 2 3 4 5 6 7 8 9
Org will walk over this tree and call your backend functions. In this
case, the first function it will call is the one associated with the
bold since the node
(bold "tele") is the first leaf. In my
backend, this is the function
1 2 3
This is a simple conversion from Org mode to Markdown, and since my
backend is very bare-bones, I ignore the other arguments and just wrap
contents string, which in this case will be
asterisks. (Markdown doesn’t really have a concept of “bold” but
instead uses HTML “strong” and “emphasis” tags which most browser’s
default CSS renders as bold and italic, respectively)
The next function that will be called is
since it’s the next least node in the syntax tree.
The contents of the paragraph will be a string, containing the results
of transcoding the children of the paragraph, in this case two plain
strings and a bold string. While I think there are some subtleties
around newlines, the simplest way to deal with paragraphs are to just
return the contents unchanged. Of course, if we were writing a new
HTML backend, we would wrap the contents in
To be honest, paragraphs are actuall part of this system I’m a little
shaky on. From what I could determine by reading
ox.el and doing
some experiments, there are a few syntax elements that you must
provide transcoder functions for. Paragraphs are one of them. The
element types headline, section, and the special type “template” are
others that must be provided. The reason for this is that these are
intermediary nodes in the syntax tree, so if they are not provided at
all, the results of other nodes will never be accumulated.
Continuing our example parse, a node for
section will be transcoded,
and in my case I’m using a similar function as for paragraphs, which
just returns the contents unchanged. The next node after that will be
headline node for the headline “Bar” in the original Org source.
1 2 3 4 5 6 7 8
Markdown has a similar syntax for headlines as Org, but uses pound or
hash symbols instead of asterisks. Here, we use the function
org-element-property to extract some properties from the AST node
headline. We need the raw value which is the headline without
asterisks, and the level which is the number of asterisks. In this
case, I convert all levels of headlines to Markdown headlines, but if
you were to be writing a “real” backend I think you would want to
:headline-levels option for the project, and only
convert headlines of a certain level. Again, like the paragraph node,
the contents are the children of the headline, which includes
everything under that headline, so we must concatenate the
Markdown-style headline string with the contents so as not to lose the
rest of the document.
The export process continues in this fashion, until all the nodes are
transcoded, their strings accumulated. There’s a special AST node type
template which represents the root of the Org document. The
docs suggest using this to add a preamble and/or postamble to the
result. In my case, I wanted to output the YAML front matter used by
Jekyll to generate blog posts. The template transcoder function takes
only two arguments, the contents string and the info plist. The root
AST node is not passed into this function, I assume because the idea
is that you’ve already transcoded its children and there’s not really
any concrete Org syntax associated with it, so there’s nothing to do
with the root node but return the contents, wrapped in pre- or
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
This function is an example of using the “communication channel” which is the third argument of the other transcoder functions but in this case is the second. The info plist contains all the metadata about the document that’s defined in the Org “export options template”. It’s from this that we extract the title of the post and the date and add it to the YAML front matter.
Source blocks are another area where I customized things to output something specific to Octopress. The Github-style “backtick” code blocks used by Octopress take optional language and name parameters, which are used for syntax highlighting and for adding captions to the source block itself. Similarly, Org supports passing the language to source blocks, and attaching a name to elements in general, so I used that to add this to the backtick code block if present. I also added a little hack to ignore the language if it was something not supported by Pygments.
1 2 3 4 5 6 7 8 9 10 11 12
Tying it together
Having defined the new backend, all that’s left is a little boilerplate to make this backend available to Org projects:
My blog’s Org project alist is then, at the bare minimum:
1 2 3 4 5 6
To start a new blog post, I use this little helper to create a new Org file with the right naming convention and export template:
1 2 3 4 5 6 7 8 9 10 11 12
While I’m working on a post, I can start the Octopress preview server
the normal way (
rake preview) and then export the current Org file
org-publish-current-file to preview the final output in a
In case it wasn’t obvious, this whole post was written using this system, and in fact the entire (albeit small) body of code is contained in the Org file for this post, using Babel’s noweb-style literate facilities. The source for this post is on Github, and contains some extra code not exported, like various helper functions and tests.
Working on this blog post while adding some features and fixing bugs to the code was pretty interesting, even as a small taste of literate programming. I’ve got some ideas for literate posts I’d like to write, so I’m looking forward to experimenting with this and will probably add to the backend as I find new Org features.
Please let me know if you find any errors in the code, it’s definitely just an alpha MVP at this point, but if anyone’s interested in using it I would be pleased to hear from you, and help if you run into problems.