The fastest Djot parser ever?
Djot is a light markup syntax. It derives most of its features from commonmark, but it fixes a few things that make commonmark's syntax complex and difficult to parse efficiently. It is also much fuller-featured than commonmark, with support for definition lists, footnotes, tables, several new kinds of inline formatting (insert, delete, highlight, superscript, subscript), math, smart punctuation, attributes that can be applied to any element, and generic containers for block-level, inline-level, and raw content.
- From the Djot website
djot.rs is a pull parser for Djot written in pure Rust. It is built on three
djot.rsas fast as possible.
These are not necessarily compatible with each other. As an example, the speed requirement made me make the choice of only targeting UTF-8 and handling all the text as pure bytes, inhibiting the legibility. However, I try to find a balance where possible.
djot.rs is not finished and thus unusable at the moment. If you want
to help out, send me a mail through the
djot discussion mailing list
djot.rs is written as a library you can use for parsing Djot into an iterator
of markup events. You can use this in combination with a writer to produce the
djot.rs will ship with builtin writers for e.g. HTML, but you
can build your own as well.
There exists a CLI for
djot.rs, which is hosted on
GitHub. With it, you can convert Djot
into the desired output from the command line.
The parsing logic is organized into these two modules:
lexmodule, which produces Djot-specific tokens from a byte stream.
parsemodule, which uses these tokens and interprets them into Djot constructs.
Both the lexer and the parser work as iterators, to avoid as much allocation as possible. Two passes are necessary, however, since we need to define a map of references and footnotes before starting to parse.
All patches should be sent to the
djot.rs development mailing list