~gagbo/R7.rs

A rust hosted r7rs scheme interpreter (eternal WIP status to learn PL)

b9af1ba Refactor: small scope codegen modules

~gagbo pushed to ~gagbo/rseven git

1 year, 6 months ago

835f617 Refactor: Use helper function in compile match

~gagbo pushed to ~gagbo/rseven git

1 year, 6 months ago

#R7.rs

builds.sr.ht status

A R7RS-small implementation in Rust.

#Acknowledgements

If anything useful comes out of this, albeit very unlikely, it will be 100% thanks to these person/resources:

#Launch

There is a small REPL you can simply access by running the default binary:

$ cargo run
user> ((lambda (x) (/ x 4)) 28)
7
user> ((lambda (x) (/ x 4)) 28.1)
7.025

#Parser

The old plan is left below to let everyone see my analysis struggle. Perfect is the enemy of good, and Pest definitely gives me good enough results and error recovery to work with. So instead I'll write my own AST type and a conversion function that walks through Pest pairs to rebuild a proper AST from there. This way the "Pest leaking" issue will disappear.

#Old plan

The parser is handwritten.

The first version was based on Pest, but I had multiple issues:

  • it doesn't handle cleanly teh quasiquation context-sensitive syntax
  • it is very hard to get a proper AST without all Pest strings attached, so Pest leaked into the core crate codebase.

At that point, I had the choice between handwriting the parser, and using a parser combinator library like nom or chumsky. I chose to use a parser combinator library.

At first I thought that handwriting the parser was the way to go because:

  • half the goal of this project is to learn about programming languages, and Robert Nystrom showed in Crafting Interpreters that writing a parser isn't a huuuge task (WHEN you have the specifications of your language, which r7rs-small scheme definitely has.)
  • Handwriting the parser means I have one less layer of documentation/maintenance to worry about.

After spending a few minutes writing the scanner by hand, it became obvious that the syntax of the reference document for the formal syntax and grammar is much easier translated with a parser combinator library rather than handwriting everything and redoing the rules myself. For example, token is supposed to match boolean before matching #( and #u8(, but at the same time there are rules in the middle, like number or string. So that means I need to split the logic that deals with # when it's encountered in a stream. Another small issue I have is that I had to wrap everything in a unicode_segmentation::Graphemes iterator to deal properly with Unicode: that might still be an issue with parser combinator libs though.

Granted, I lose the immediate payoff of having crazy good recovery and error handling, but that can be added in a later step.

#Memory model

For the time being, I plan on using rust-gc to obtain garbage collection on my values. At first I thought I could only wrap the runtime rust values in Gc<GcCell<>> and be done with it, but actually this doesn't support really well the continuations I want to implement; continuations has been the main roadblock that triggered the rewrite.

The current plan for memory management is to host the runtime stack on the garbage-collected heap, so that closures (using Upvalues like Lua and Lox in Crafting Interpreters) and continuations (by actually managing StackFrames manually on the garbage-collected heap) can be more easily done, even if performance is trash.

That would mean everything would still be Gc<GcCell<T>>, but T would not only be runtime values like before, but also stack and tracing-related things like:

  • Call Frames
  • Upvalues
  • Bytecode...

#Roadmap

This rewrite is almost from scratch, so the roadmap will move slowly

  • [x] Read standard
  • [ ] Implement and test the parser
  • [ ] Implement basic types
  • [ ] Implement REPL
  • [ ] Implement eval
  • [ ] Implement read
  • [ ] Implement (scheme base)

#TODOs

For each extra bit of compliance we go through, usually the parser (which started from the standard spec) gets enhanced to give better lexing information so that the core can make adjustments easily.

#Guarantee tail call optimization

This will probably need to be done with a trampoline or something

#Add abilities to bind Rust types to scheme

We want to have native types that are easy to manipulate. The proof of concept/motivating example will be adding a Rope-based structure to deal with long strings.

#Make a compiler

R7 will start as an interpreter only