from a device highlight to a tiddler: a roadmap

 23rd January 2024 at 12:33am

This is an approximation of all that happens between highlighting a passage in my Kobo device, and creating a tiddler out of that highlight.

On the Kobo device

Highlighting a passage in the Kobo will produce an internal registry in the device. It will later connect to my computer via USB; it gets mounted, and the database is copied to the computer.

The main screen of the interface. It shows all books with available highlights.


On the Python script (Textual interface)

Database work (utils/database.py)

The database is accessed. The highlight metadata is retrieved via get_highlight_from_database(highlight_id). It retrieves the title of the book, date of highlight, section of highlight start (in the .epub file) and the .epub file name.

To generate the main screen in the interface, there is get_list_of_highlighted_books.

To get all highlights from a given book, there is get_all_highlights_of_book_from_database.

After a book is chosen, all of its highlights are displayed in a scrollable list. It displays the date of highlight creation, and a ✅ or ❌ depending on whether a tiddler was already created from it or not.


Parsing and manipulating .epub text (utils/highlight_handling.py)

Given the file name and section, the .epub file is accessed with get_full_context_from_highlight(filename, section), providing the context (soup) surrounding the highlight: it's the full HTML that the .epub section consists of.

Having the highlight_id and the surrounding context (soup), one can extend and contract the highlight as much as needed (for more or less context or succintness). First, get_highlight_context_from_id retrieves the minimum sentence/group of sentences that contains the original highlight (this solves many issues: if a highlight was incomplete, or spanned across two pages in the device, it should appear complete now).

The next step is to match the highlight to a (group of) sentences in the soup: this is done in get_start_and_end_of_highlight (a function that will require more work later on). Both highlight and soup are split into sentences and striped of whitespace (thus, from this point onward most of the exquisite .epub formatting is gone.

Further context manipulations are possible using expand_found_highlight. Most of this is done in the interface/single_highlight_screen.py. There are also some additional variables to allow for fine parsing of the quotes using two different cursors, at the beginning and end of the quote (and it was very fun to implement).

In this screen, some keybindings will select more or less of a text. It can jump sentences backwards and forwards, and also do fine, one character at a time, adjustments to the highlight with both blue and red cursors.


Tiddler creation

This part is not so interesting. In the same panel, there are three input boxes to fill tiddler data with. When ready, pressing the button will create the .tid file. However, the file doesn't immediately appear in the TiddlyWiki instance I run — it must be restarted. I set up systemd to restart my TiddlyWiki every hour, in case I did some work (this could have obviously been handled in many other ways).