~dvshkn/llama2.ha

Llama 2 inference in pure Hare (port of llama2.c)
~dvshkn/misc

New mailing list added

3 months ago

484f1ec use html table in readme

~dvshkn pushed to ~dvshkn/llama2.ha git

5 months ago

#llama2.ha

This is a port of Andrej Karpathy's llama2.c project rewritten in Hare. It has basic feature parity with the original project. Advanced features like parallelization and quantization are not implemented.

#Build

  1. Install Hare: https://harelang.org/installation/
  2. Use the included Makefile to build the llama2ha binary
cd llama2.ha
make
  1. Download a sample checkpoint like stories15M.bin or stories110M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
  1. Run llama2ha
./llama2ha stories15M.bin -i "The travelers reached a fork in the road"

#Usage

Command line flags remain virtually unchanged from llama2.c.

Example: llama2ha model.bin -n 256 -i "There once was a magical hare"

Usage: ./llama2ha [-h]
         [-t <temperature>]
         [-p <top-p>]
         [-s <seed>]
         [-n <steps>]
         [-i <prompt>]
         [-z <tokenizer>]
         <checkpoint_file>

-h: print this help text
-t <temperature>: temperature in [0,inf] (default 1.0) : float
-p <top-p>: p value for top-p (nucleus) sampling in [0,1] (default 0.9) : float
-s <seed>: random seed (defaults to current unix time) : int
-n <steps>: number of steps to run for (default 256, 0 for max_seq_len) : int
-i <prompt>: input prompt : string
-z <tokenizer>: path to custom tokenizer (optional) : string

#Performance

Test System: Ryzen 5950X, 64GB RAM

Test Flags: -i "There once was a magical hare" -s 1 -t 0

Checkpointllama2.hallama2.ha (unsafe pointers)llama2.c
stories15M.bin79.1 tok/s94.7 tok/s117.2 tok/s
stories110M.bin11.0 tok/s13.4 tok/s14.9 tok/s

Notes:

  • The default version of llama2.ha uses idiomatic Hare slices for accessing things like checkpoint weights and run state in memory.

  • The unsafe pointers version of llama2.ha uses C-style pointers which skips the language-provided bounds checking present with slices. The code for this version can be found on the unsafe-pointers branch.

#Screenshot

example screenshot of llama2.ha running in the terminal

#License

AGPL