Using org-roam
has helped me organize my thoughts and jot down whatever comes to mind in the moment, freeing my feeble mind to care for what's most important in the day. As I've been using it to take more notes, I'd like some of those notes (like this one) to become blog posts.
I've found pandoc to be a really good way to export, mainly for reasons of simplicity. The only issue with using org-roam
and pandoc together is that org-roam
's internal links don't translate to pandoc html pages. That's where pandoc's filters come into the picture.
Exporting a single page
To test exporting a single page with custom css, I've saved bettermotherfuckingwebsite.com's css declarations in a file aptly called style.css
and use pandoc to export a single page.
$ pandoc -f org -t html5 --css=style.css --standalone note.org -o note.html
And lo and behold, it already looks like how I want it to look like. But a proper website needs a header and a footer, so we create two files header.html
and footer.html
and add them to the final page.
$ pandoc -f org -t html5 --css=style.css --include-before-body=header.html --include-after-body=footer.html --standalone note.org -o note.html
And now we have each page following a proper website template with header, body, and footer.
Sprinkle of internal links
To include links properly, I'll be using pandoc's lua filters to look up the link in org-roam
's sqlite database and modify pandoc's AST to replace the id:xxx
link with a proper href
.
Before I can decide what filter to write, I need to see pandoc's generated AST.
$ pandoc --standalone -t native note.org
which gives me the following output
Pandoc (Meta {unMeta = fromList [("title",MetaInlines [Str "The",Space,Str "Grand",Space,Str "Unified",Space,Str "Theory",Space,Str "of",Space,Str "Everything"])]})
[Header 1 ("setting-up-org-roam",[],[]) [Str "Setting",Space,Str "up",Space,Code ("",[],[]) "org-roam"]
...
What we're interested in is
Link ("",[],[]) [Str "school"] ("id:e0e3eed4-d1ec-4e76-9244-cfbf22ba5a6f","")
Which according the module documentation and Text.Pandoc.Definition means it's a link item type]] with no attributes, alt text of "school", and target of "id:…".
function Link(elem)
return pandoc.Str(elem.target)
end
Switching gears to python
org-roam
stores note references with IDs in an sqlite database that by default sits under $HOME/.emacs.d/org-roam.db
. To access this, I'd need the sql extension for lua which is not installed on many systems. Python has both json and sqlite as part of its batteries-included standard library, so I'll use that instead.
We can use pandoc's json api and write the filter which parses, modifies, and prints json. But there's a better way! pandocfilters
and panflute
modules are available for python which takes care of the plumbing for us. They are also available on pypi which means they can be installed easily with pip. I've chosen to work with panflute
for no particular reason.
The filters can be used with the --filter
argument.
$ pandoc -f org -t html5 --standalone --filter myfilter.py note.org -o note.html
so the final line will be
$ pandoc -f org -t html5 --css=style.css --include-before-body=header.html --include-after-body=footer.html --standalone --filter myfilter note.org -o note.html
Filtering effectively
I've named the filter sanitize_links.py
.
#!/usr/bin/env python3
import panflute as pf
import sqlite3
import pathlib
import sys
import os
import pprint
import urllib
#### CHANGE THESE ####
= "~/.emacs.d/org-roam.db"
ORG_ROAM_DB_PATH #### END CHANGE ####
= None
db
def sanitize_link(elem, doc):
if type(elem) != pf.Link:
return None
if not elem.url.startswith("id:"):
return None
= elem.url.split(":")[1]
file_id
= db.cursor()
cur f"select id, file, title from nodes where id = '\"{file_id}\"';")
cur.execute(= cur.fetchone()
data
# data contains string that are quoted, we need to remove the quotes
= data[0][1:-1]
file_id = urllib.parse.quote(os.path.splitext(os.path.basename(data[1][1:-1]))[0])
file_name
= f"{file_name}.html"
elem.url return elem
def main(doc=None):
return pf.run_filter(sanitize_link, doc=doc)
if __name__ == "__main__":
= sqlite3.connect(os.path.abspath(ORG_ROAM_DB_PATH))
db main()
A note on versions!
I'm using Ubuntu 20.04 LTS which means some of the packages are outdated. It appears older pandoc versions didn't have great error messages making debugging difficult. Since I've updated pandoc with packages available on their release page, I've had better luck.
Worth noting the python3-pandocfilters
package in in repos is also outdated, so using pip
is recommended.
Publishing the right files
Some of my notes are to be published, but some I'd like to keep private. To do that, I have set up my notes to have a tag of "publish" for ones I want to, well, publish, by adding it to filetags
.
#+filetags: publish
Then my build.sh
script filters files that have a publish tag. Here's the entirety my of build.sh
script. A Makefile
would be more appropriate.
#!/bin/sh
CSS=org.css
mkdir -p html/
rm -f html/*
for note in $(grep -iRE '^#\+filetags:.*?publish' --color=never --files-with-matches); do
echo "processing ${note}"
pandoc -s -t html5 -f org --css="$CSS" --include-before-body=header.html --include-after-body=footer.html --filter fix_roam_links.py "$note" -o html/"$(echo $note | sed -e 's/\.org$/\.html/')"
done
index_file=$(grep -iR -l 'grand unified theory of everything' html | head -n 1)
echo "setting index file"
cp "$index_file" html/index.html
echo "copying $CSS"
cp "$CSS" html/
The output files go to html
directory. And I publish by simply rsync'ing the files to my public directory. Here's the one-liner for upload.sh
.
#!/bin/sh
rsync --progress html/* server:/srv/www/
Now it's time to add the publish tag to this file! With this setup, every time I add a new post, all I need to do is add a link to it to the homepage and run ./build.sh && ./upload.sh
.
Footnotes
I change the title of my notes frequently, which means the filename and title go out of sync. To prevent this, I have come to appreciate having date and IDs as filenames. Here's a one-liner that converts the default filenames to "<date>-<id>.org" format.
for f in *.org; do mv "$f" "$(echo $f | grep -Po '^\d+')-$(grep ID $f | tr -s '\t ' ' ' | cut -d' ' -f2)"; done
Having said that, my notes are now only named by date and time to make it easier for org-roam
to generate filenames.