C lexer/parser/checker/compiler for Hare

170fc41 lex: s/scanrune/read_rune/

~sebsite pushed to ~sebsite/hare-c git

12 days ago

c393b33 s/is null/== null/g

~sebsite pushed to ~sebsite/hare-c git

14 days ago


C lexer/parser/checker/compiler for Hare

A sprite in Scratch of a Hare, named "Hare-c"

#Stuff in this repo

  • c - the Hare module
  • cmd/cdecl - WIP-ish converter from C gibberish to readable English, and vice-versa. depends on madeline
  • cmd/hacc - C compiler. be prepared to run into aborts, lots of stuff still needs to be implemented. if you're developing stuff, you probably want to use different flags here for testing + debugging certain compilation phases:
    • -E: parse + pre-process the input file and unparse to stdout (useful for developing the parser)
    • -dump: check the input file and dump all top-level declarations to stdout (useful for developing the checker)
    • -emit-qbe: emit the qbe .ssa file (not implemented yet)
    • see -h for other supported flags
  • cmd/hareconv - converter between C headers and Hare modules

#hi pls contribute

Send patches to ~sebsite/hare-c@lists.sr.ht (archive)

If you need help or have questions ping me on fedi or IRC or wherever :)


A checkmark means the thing is implemented; a T means the thing is properly tested. This checklist isn't yet exhaustive; I'm updating it as I go.

T🗸   comments
T🗸     block
T🗸     line
T🗸   backslashes at end of line
 🗸     must declare at least one identifier
         type specifiers
T🗸         signed, unsigned
T🗸           parse
 🗸         void
T🗸           parse
 🗸           incomplete type; cannot be completed
T🗸         char
T🗸           parse
T🗸         short
T🗸           parse
T🗸         int
T🗸           parse
T🗸         long
T🗸           parse
T🗸         float
T🗸           parse
T🗸         double
T🗸           parse
T🗸                 parse
T🗸                 parse
T🗸             declarations
T🗸               named
T🗸                 parse
T🗸                 check
T🗸               anonymous
T🗸                 parse
T🗸                 check
T🗸               definitions
T🗸                 parse
T🗸                 check
T🗸               forward
T🗸                 parse
T🗸                 check
 🗸             incomplete types
T🗸               final member may be incomplete array if other members are present
 🗸               everything else disallowed
T🗸               struct type is still complete; size is as though incomplete array weren't present
 🗸               but struct type can't be used in other structs or in arrays
 🗸                 including if it's (recursively) a member of a union
 🗸           union
 🗸             bit-fields prohibited
 🗸             declarations
 🗸               named
T🗸                 parse
 🗸                 check
 🗸               anonymous
T🗸                 parse
 🗸                 check
 🗸               definitions
T🗸                 parse
 🗸                 check
 🗸               forward
T🗸                 parse
 🗸                 check
 🗸             incomplete types disallowed
 🗸           must contain at least one named member
             variably modified member types disallowed
T🗸           tag inserted into scope immediately after it's declared
             tag is always declared when not used as type specifier
 🗸           when used as type specifier, tag is only declared if no other declaration is visible
 🗸         enum
 🗸           declarations
T🗸             named
T🗸             anonymous
T🗸             definitions
 🗸             forward disallowed
 🗸           fields inserted into scope
T🗸             parse
 🗸             no linkage
 🗸           tag inserted into scope immediately after definition ends
T🗸           parse
T🗸           inserted into scope
T🗸           file scope
             block scope
               disallowed for variably modified types
T🗸             allowed otherwise
         storage specifiers
T🗸         auto
T🗸           parse
T🗸         static
T🗸           parse
T🗸           parse
             forbids taking address
 🗸             explicitly
               implicitly (e.g. array -> pointer conversion)
               also applies to derivative lvalues
 🗸         extern
T🗸           parse
 🗸           defaults to external linkage
T🗸           uses linkage of identifier visible in scope when it's internal or external
 🗸           can't have initializer
 🗸         typedef
T🗸           parse
T🗸           inserts type into scope
 🗸           can't have initializer
 🗸         at most one storage specifier per declaration (sans thread_local)
 🗸         duplicates disallowed
 🗸       duplicate specifiers
 🗸         disallowed for type specifiers
T🗸         allowed for other specifiers
 🗸       disallow incompatible
 🗸         type specifiers
 🗸         other specifiers
T🗸     qualifiers
T🗸       allowed in any order
T🗸       duplicates allowed
T🗸       pointer
 🗸           named
T🗸           anonymous
 🗸           arrays converted to (qualified) pointers
 🗸             still checked as arrays
 🗸           functions converted to function pointers
             may not shadow typedefs
 🗸           may not declare more than one (non-tag) identifier
 🗸           may not be initialized
T🗸           without storage specifier
T🗸             defaults to implicit auto
             register storage specifier
 🗸             parse
 🗸             declares with register storage
 🗸             ignored for non-definitions
 🗸               may be different in compatible types
               applies even when other declarations' parameters have different/absent storage specifiers
 🗸           all other specifiers disallowed
 🗸         return type requirements
 🗸           must return either complete type or void
 🗸           may not return array
 🗸         warn when function returning complete type doesn't have return statement
T🗸           except for main function in hosted environment
 🗸           all other functions
 🗸         storage specifier must be static or extern (explicit or otherwise)
 🗸         error out when explicitly given qualifiers (via typedef)
 🗸         implicit const qualifier
T🗸         without qualifiers
 🗸         qualifiers apply to element type
 🗸           don't apply to array itself before c23
           ...except as inner-most declarator of function parameter
           qualifiers may only be in square brackets in inner-most declarator of function parameter
 🗸         if present, known size must be greater than zero
 🗸         element type must be
 🗸           complete
 🗸           object
 🗸       identifier only
 🗸       parenthesized
 🗸       type names
 🗸         with declarator
 🗸         without declarator
 🗸     scopes
 🗸       file
 🗸         parse
 🗸         auto/register specifiers disallowed
 🗸         default storage
 🗸           extern for functions
T🗸           external linkage for objects
 🗸       block
 🗸         parse
 🗸         default storage is auto
 🗸       duplicates
 🗸         file scope
T🗸           allowed for compatible declarations
T🗸             composite type
T🗸             compatible specifiers
 🗸           disallowed for incompatible declarations
 🗸           at most one definition allowed
 🗸         block scope
 🗸           disallowed within same scope
T🗸           shadowing in nested scopes
 🗸             shadowing block-scoped declarations
 🗸             shadowing file-scoped declarations
 🗸         struct/union/enum tags must always refer to same type
 🗸       unique namespaces
T🗸         identifiers
T🗸         struct/union/enum
 🗸         goto labels
 🗸     tentative declarations
 🗸     internally-linked declarations must be complete by end of translation unit
       translation unit must not be empty (static assertions are ok)
       internally-linked declaration must be initialized if used
         except sizeof
         except alignof
         all other expressions
 🗸     can't declare an object with type void
T🗸         pre-processing numbers
T🗸         int literals
T🗸           decimal
T🗸           hex
T🗸           octal
T🗸           suffixes
T🗸             u
T🗸             l
T🗸           parse value
T🗸         float literals
T🗸           decimal
T🗸           exponent
T🗸           f suffix
T🗸           parse value
T🗸       char literals
T🗸         plain
T🗸         L prefix
 🗸       string literals
T🗸         plain
T🗸         L prefix
 🗸         string concatenation
 🗸           works for same-prefix strings
 🗸           disallowed for strings with different prefixes
T🗸     sizeof
T🗸       parse
T🗸     array indexing
T🗸       parse
 🗸     struct/union field accessing
 🗸       a.b
 🗸         parse
 🗸       a->b
 🗸         parse
T🗸     unary postfix
T🗸       parse
       unary prefix
T🗸       parse
T🗸         runtime
 🗸         when operand is unary *, neither is evaluated
 🗸           constraints still apply
T🗸     binary
T🗸       parse
T🗸     assignment
T🗸       parse
T🗸     parenthesized
T🗸     operator precedence
 🗸       explicit
T🗸         parse
T🗸         extends implicit type conversion rules
T🗸         integer -> pointer
T🗸         pointer -> integer
T🗸         pointer -> pointer
T🗸           object -> object
T🗸           function -> function
 🗸         everything else disallowed (except object <-> function; see extensions)
         implicit conversion
T🗸         integer promotion
 🗸         lvalue conversion
T🗸           integer -> float
T🗸           float -> float
T🗸           integer -> integer
T🗸           float -> integer
T🗸           pointer -> bool
T🗸           array -> pointer
T🗸           function -> pointer
 🗸           pointer -> pointer
T🗸             pointer -> compatible pointer
T🗸             object pointer -> void pointer
T🗸             void pointer -> object pointer
 🗸             can't implicitly convert to any other pointer
T🗸             qualifiers may be added
 🗸             qualifiers may not be removed
T🗸           any -> void
 🗸           va_list -> va_list
             struct/union -> same struct/union
 🗸             tagged
 🗸           everything else disallowed
T🗸       parse
T🗸       jump to any label within function
 🗸       disallow undefined labels
         may not jump from outside scope of VMT to inside scope
T🗸     compound
T🗸       parse
 🗸       goto label
T🗸         parse
 🗸         unique per-function
T🗸         parse
 🗸         must evaluate to integer constant
           case type is converted to type of controlling expression
 🗸         must only appear in switch body
T🗸           direct descendant
T🗸           within another block
 🗸           disallowed everywhere else
T🗸           only visible to inner-most switch body
 🗸         each case in switch body is unique
T🗸         ...but nested switch statements may contain duplicates
 🗸       default
T🗸         parse
 🗸         must only appear in switch body
T🗸           direct descendant
T🗸           within another block
 🗸           disallowed everywhere else
T🗸           only visible to inner-most switch body
 🗸         at most one per switch body
T🗸         ...but nested switch statements may contain their own
T🗸       labelled statements may themselves be labelled
T🗸     empty
T🗸       parse
 🗸     if
T🗸       parse
 🗸       condition must be scalar
 🗸     while
T🗸       parse
 🗸       condition must be scalar
 🗸     do-while
T🗸       parse
 🗸       condition must be scalar
T🗸       parse
 🗸       condition must be scalar (if present)
T🗸       initializer, condition, and afterthought may all be omitted
         initializer declarations may only have automatic storage duration
         initializer declarations enter scope immediately after declared
T🗸       parse
 🗸       value must be integer
 🗸       value is promoted
         no VMTs not in scope of switch statement may be in scope of any cases
T🗸       parse
T🗸       allowed in loop
T🗸       allowed in switch statement
 🗸       disallowed everywhere else
         causes jump to outside whatever it applies to
T🗸         within check
           for-loop afterthought isn't evaluated
T🗸       parse
T🗸       allowed in loop
 🗸       disallowed everywhere else
         causes jump to end of loop it applies to
T🗸         within check
           so for-loop afterthought is evaluated, then condition
 🗸     return
T🗸       parse
 🗸       type of return value must be convertible to function return type
 🗸       warn when returning non-void from void function
 🗸       warn when used in noreturn function
T🗸     expression statements
T🗸       parse
T🗸       implicitly convert to void
       as initializer of declaration
T🗸       parse
         must be constant expression for static objects
T🗸       may be any expression otherwise
         may be surrounded by braces
           when single scalar object
           when using literal to initialize array (e.g. string literal)
T🗸     in cast expression (compound literal)
T🗸       parse
 🗸   variadic functions
 🗸   function definitions
T🗸     declaration inserted into scope
T🗸       before body is parsed
T🗸       before body is checked
 🗸     declarator must be function
T🗸       non-pointer accepted
 🗸       pointer rejected
 🗸       typedef rejected
T🗸     without arguments (void)
T🗸     with named arguments
 🗸     argument types must be complete (sans (void) case)
 🗸   identifiers
 🗸   no such thing as an invalid token
 🗸   main function
 🗸     freestanding environment
 🗸       not required to be declared
 🗸       no requirements or restrictions imposed when declared
 🗸     hosted environment
 🗸       must have strictly conforming declaration
 🗸         int main(void)
T🗸         int main(int, char **)
T🗸         ...or equivalent, expanding typedefs and performing usual parameter conversions
 🗸         must have external linkage
 🗸         all other forms rejected
 🗸         checked even if no definition is present
 🗸   macro definitions
 🗸     #define
 🗸       variable-like
 🗸       function-likes
 🗸       shadowing keywords
 🗸     undef
 🗸       shadowing keywords
 🗸       other identifiers
     macro substitution
 🗸     pre-defined
 🗸       macros
T🗸          __STDC__
T🗸          __STDC_HOSTED__
T🗸          __FILE__
T🗸          __LINE__
 🗸          __DATE__
 🗸          __TIME__
         # (%:)
 🗸       ## (%:%:)
 🗸         with macro arguments
 🗸           only token closest to ## is concatenated
 🗸           when argument expands to no tokens, replace with placemarker
 🗸         with other tokens
 🗸         constructs new token
 🗸           non-pre-processing tokens
 🗸           constructed token won't be used for pre-processing (i.e. no # or ##)
 🗸     # has higher precedence than ##
 🗸     object-like
 🗸     function-like
 🗸       works when followed by left paren
 🗸       doesn't work when not followed by left paren
 🗸       parameters
 🗸         expand non-recursively
 🗸           expands macros
 🗸           everything else
 🗸         don't expand recursively
 🗸           from parameters
 🗸           from macros
 🗸     recursive
 🗸       macros can't expand to themselves
 🗸       everything else expands
 🗸   #include
 🗸     system headers
 🗸     non-system headers
 🗸       header exists
 🗸       header doesn't exist; fallback to system header
 🗸         header name doesn't have angle brackets
 🗸         header name has angle brackets
 🗸     macro expansion
 🗸     <this> is lexed as a system header string literal
 🗸       true in #include
 🗸       false everywhere else
 🗸   conditional
 🗸     #if, #endif
 🗸     #else
 🗸     #elif
 🗸     defined
 🗸       non-parenthesized identifier
 🗸       paranthesized identifier
 🗸       error out when neither of the above forms is matched
 🗸     #ifdef
 🗸     #ifndef
 🗸   #error
 🗸     errors out
 🗸     error message uses all tokens
 🗸     macros aren't expanded
 🗸     only change line number
 🗸     also change filename
       macros are expanded
T🗸   trigraphs
 🗸   k&r-style functions
 🗸     empty parameter list
 🗸     non-empty parameter list without types
 🗸     k&r-style parameter declarations
 🗸   implicit int
     implicit function declaration
 🗸 c95
T🗸   digraphs
 🗸   pragma
 🗸     #pragma
 🗸     _Pragma
 🗸     STDC
 🗸       accept standardized
 🗸         FP_CONTRACT
 🗸         FENV_ACCESS
 🗸       reject unstandardized
 🗸     implementation-defined
 🗸       accepted; ignored
 🗸     parse
       can't be initialized
 🗸   hex float literals
T🗸     prefix / base
 🗸     parse value
T🗸     exponent
     universal character names
T🗸     in char/string literals
       in identifiers
         prior to c17, conform to annex D
 🗸     disallowed: (<0xa0 && !='$' && !='@' && !='\'') || (>=0xd800 && <0xe000)
T🗸   initializer designators
T🗸     array
T🗸     struct
     declaration as for-loop initializer
T🗸     parse
       must be auto or register
         default is auto
T🗸   specifiers
T🗸     type specifiers
T🗸       _Complex
T🗸       _Imaginary
T🗸       long double
T🗸       long long
T🗸       parse
T🗸       may appear more than once
         can't be used outside of function declaration
T🗸       applies to function declaration
         if inline is used on any declaration, there must be a definition
 🗸       internally-linked
 🗸         can be used on any declaration; no change in behavior
         externally-linked; inline definitions
 🗸         function becomes inline definition if all file-scope declarations use inline
 🗸         ...and none explicitly use extern
 🗸         otherwise no change in behavior; function isn't an inline definition
 🗸         inline definition doesn't provide external definition
             may not define a modifiable object with static or thread duration
             may not use identifier with internal linkage
 🗸     static array parameters
T🗸   restrict
T🗸     parse
 🗸   variadic macros
 🗸     use of ... in definition
 🗸     __VA_ARGS__
T🗸   intermixing declarations and statements in compound statements
T🗸   literal suffixes
T🗸     ll int literal suffix
T🗸     l float literal suffix
 🗸   __func__
 🗸     parse
 🗸     resolves to name of current function
 🗸     can't be redeclared at top-level
 🗸     has type 'const char []'
     implicit return 0 from main for hosted targets
       not for freestanding targets
 🗸       parse
 🗸       may not appear alongside auto or register
 🗸       may not appear in block scope when no storage specifiers are present
 🗸       may not be used on function declaration
         must appear on all declarations of an object
T🗸       parse
 🗸       may appear more than once
         applies to function declaration
 🗸       expression
 🗸       type
         duplicates permitted; most strict alignment used
         invalid uses
           alongside typedef/register specifier
           on function
           on bit-field
         must not specify less strict alignment than default
         as specifier
           type in parens must not be qualified
           type in parens must not be array, function, or atomic
         as qualifier
 🗸         parse
           other qualifiers apply to atomic type, not to type being made atomic
 🗸   _Static_assert
T🗸     parse
T🗸       top-level
T🗸       within function
T🗸       within struct/union
 🗸     eval
T🗸       top-level
T🗸       within function
T🗸       within struct/union
 🗸       errors out when condition is false
T🗸   _Alignof
T🗸     parse
       controlling expression conversions
 🗸       lvalue conversion
         array -> pointer
 🗸       function -> function pointer
 🗸     must match exactly one case
 🗸       at most one compatible case
 🗸       at most one default case allowed
 🗸       if no compatible case, default case must be present
T🗸   prefixes
T🗸     char literals
T🗸       u8-
T🗸       u-
T🗸       U-
T🗸     string literals
T🗸       u-
T🗸       U-
     nested anonymous structs/unions
 🗸   pre-defined macros
 🗸     __STDC_IEC_559__
 🗸     __STDC_UTF_16__
 🗸     __STDC_UTF_32__
     unicode identifiers
 🗸     _BitInt
T🗸       signed
T🗸         parse
T🗸       unsigned
T🗸         parse
T🗸       wb int literal suffix
 🗸       operand must be integer constant expression
 🗸       signed width must be >= 2
 🗸       unsigned width must be >= 1
T🗸       doesn't undergo integer promotion
 🗸       has rank below equivalently sized basic integer type
 🗸       otherwise ranked based on width of both types
T🗸     typeof
T🗸       qualified (typeof)
T🗸       unqualified (typeof_unqual)
       float types
       decimal types
T🗸       basic
T🗸         _Decimal32
T🗸           type specifier
T🗸           df suffix
T🗸         _Decimal64
T🗸           type specifier
T🗸           dd suffix
T🗸         _Decimal128
T🗸           type specifier
T🗸           dl suffix
 🗸     constexpr
         system embeds
         non-system embeds
           embed exists
           embed doesn't exist; fallback to system header
             embed name doesn't have angle brackets
             embed name has angle brackets
           duplicates not permitted
           leading+trailing underscores permitted
         actually warns
 🗸       warning message uses all tokens
 🗸       macros aren't expanded
 🗸     #elifdef, #elifndef
       standard pragmas
 🗸         value must be direction
           information persists in check
 🗸         value must be dec-direction
           information persists in check
 🗸   static_assert without reason
     type inferencing with auto
T🗸   nullptr
T🗸   labels
T🗸     labelled declarations
T🗸     at end of compound statement
 🗸   binary int literals
 🗸     parse value
 🗸   function declarations
 🗸     variadic function without parameters
T🗸     parameters in function definition need not be named
T🗸       declaration
T🗸         top-level
T🗸           bindings
T🗸           function definitions
T🗸           as a declaration consisting of only attributes and nothing else
T🗸         within compound body
T🗸         for-loop initializer
T🗸         function parameter
T🗸         struct/union fields
T🗸       binding
T🗸       base type
T🗸       declarator
T🗸         array
T🗸         function
T🗸         pointer
T🗸         with identifier
T🗸         without identifier
         struct/union/enum declaration
T🗸         parse
           allowed when defining struct/union/enum
           allowed in tag declaration where struct/union doesn't act as type specifier
             doesn't apply to enums since they can't be forward declared
           disallowed otherwise
T🗸       enum field
T🗸       statement
         [[noreturn]], [[_Noreturn]]
           disallow argument
           applicable to
             nothing else
           may have optional argument
           applicable to
             struct/union declaration
             typedef name
             struct/union member
             enum member
             nothing else
           __has_c_attribute => 202311L
           warn when name is used
           disallow argument
           applicable to
             lone attribute declaration
             next encountered statement must have case or default label
             if within iteration statement, next statement must also be within said iteration statement
           __has_c_attribute => 202311L
           disallow argument
           applicable to
             struct/union declaration
             typedef name
             struct/union member
             enum member
             nothing else
           __has_c_attribute => 202311L
           may have optional argument
           applicable to
             struct/union definition
             enum definition
             nothing else
           __has_c_attribute => 202311L
           warn when value discarded
       warn on non-standard
       leading+trailing underscores permitted
T🗸     keywords are treated as identifiers
T🗸     with prefix
T🗸     without prefix
T🗸     without arguments
T🗸     with arguments
T🗸       including balanced tokens
T🗸     multiple attribute lists are combined
 🗸     multiple attributes in [[brackets]]
T🗸   u8- string literals
T🗸   empty initializers
 🗸   ' as separator
 🗸     int literals
 🗸     float literals
T🗸   empty declarations
     pre-defined macros
       keyword aliases
 🗸       alignas
T🗸       alignof
T🗸       bool
T🗸       true
T🗸         defined
T🗸         expands to _Bool value
T🗸       false
T🗸         defined
T🗸         expands to _Bool value
 🗸       static_assert
 🗸       thread_local
         when equivalent keyword is re-defined, alias macro doesn't expand said definition
 🗸     constants
 🗸       __STDC_IEC_60559_TYPES__
 🗸       __STDC_IEC_60559_BFP__
 🗸       __STDC_IEC_60559_DFP__
 🗸       __STDC_EMBED_FOUND__
 🗸       __STDC_EMBED_EMPTY__
     explicit enum backing type
 🗸   storage specifiers in cast expression
T🗸     allowed for compound literals
 🗸     disallowed for other casts
T🗸   treat empty parameter list as identical to (void)
     array is qualified equivalently to element type
     redefinition of macros
 🗸     user-defined
 🗸     pre-defined
       warns when new definition isn't identical to old
         doesn't warn when new definition is identical
     defining keywords as macros
 🗸     allowed
T🗸   pre-declared identifiers
 🗸     can't be redeclared at top-level
 🗸     __builtin_va_arg
T🗸       parse
 🗸       check
 🗸     __builtin_va_copy
T🗸       parse
 🗸       check
 🗸     __builtin_va_end
T🗸       parse
 🗸       check
 🗸     __builtin_va_list
T🗸       parse
 🗸       check
 🗸       parse
T🗸         require exactly two arguments before c23
 🗸         require one or two arguments after c23
 🗸       check
         warnings and errors
           error when function isn't variadic
           warn when more than one optional argument is supplied (c23)
           optional argument isn't an identifier
             error before c23
             warn since c23
           optional argument isn't the last named function parameter
             error before c23
             warn since c23
 🗸   pre-defined macros
 🗸     __has_builtin
     __asm__ declarations
 🗸     parse
       disallowed when no linkage
       disallowed for struct/union fields
 🗸     allowed otherwise
     casting between object pointer and function pointer
 🗸     does the Right Thing
 🗸   __DATE__ and __TIME__ use SOURCE_DATE_EPOCH if set
       warn for unrecognized attributes
       supported attributes
       leading+trailing underscores permitted
   additional warnings
     declaring a reserved identifier
       before c23: including potentially reserved
       after c23: excluding potentially reserved
     defining a reserved identifier
       before c23: including potentially reserved
       after c23: excluding potentially reserved
     SOURCE_DATE_EPOCH is invalid
     unused internally-linked objects and functions