~kmaasrud/djot

Pull parser for Djot in Rust

ed1b756 feat: add code and attribute parsing and writing

8 days ago

0a5d305 feat: start HTML renderer

12 days ago

#djot.rs

The fastest Djot parser ever?

Djot is a light markup syntax. It derives most of its features from commonmark, but it fixes a few things that make commonmark's syntax complex and difficult to parse efficiently. It is also much fuller-featured than commonmark, with support for definition lists, footnotes, tables, several new kinds of inline formatting (insert, delete, highlight, superscript, subscript), math, smart punctuation, attributes that can be applied to any element, and generic containers for block-level, inline-level, and raw content.

djot.rs is a pull parser for Djot written in pure Rust. It is built on three principles:

  • Legibility: The libraryr should be maintainable by veterans and newcomers alike. Idiomatic Rust is preferred.
  • Extensibility: The library should not be constrained to outputting HTML, but rather allow users to extend it with other output formats.
  • Speed: Djot was designed with parsing speed in mind. Let's honor this work by making djot.rs as fast as possible.

These are not necessarily compatible with each other. As an example, the speed requirement made me make the choice of only targeting UTF-8 and handling all the text as pure bytes, inhibiting the legibility. However, I try to find a balance where possible.

NOTE: djot.rs is not finished and thus unusable at the moment. If you want to help out, send me a mail through the djot discussion mailing list djot-discuss.

#Usage

djot.rs is written as a library you can use for parsing Djot into an iterator of markup events. You can use this in combination with a writer to produce the desired output. djot.rs will ship with builtin writers for e.g. HTML, but you can build your own as well.

There exists a CLI for djot.rs, which is hosted on GitHub. With it, you can convert Djot into the desired output from the command line.

#Parsing logic

The parsing logic is organized into these two modules:

  • The lex module, which produces Djot-specific tokens from a byte stream.
  • The parse module, which uses these tokens and interprets them into Djot constructs.

Both the lexer and the parser work as iterators, to avoid as much allocation as possible. Two passes are necessary, however, since we need to define a map of references and footnotes before starting to parse.

#Contributing

All patches should be sent to the djot.rs development mailing list djot-dev.