~sebsite/hare-c

C lexer/parser/checker/compiler for Hare

9db5b1b hacc: add "-Werror" to warning error prefix

~sebsite pushed to ~sebsite/hare-c git

a month ago

ea0ddc0 README: update

~sebsite pushed to ~sebsite/hare-c git

a month ago

#hare-c

C lexer/parser/checker/compiler for Hare

#Stuff in this repo

  • c - the Hare module
  • cmd/cdecl - WIP-ish converter from C gibberish to readable English, and vice-versa. depends on madeline
  • cmd/hacc - C compiler. currently you will run into an abort somewhere cuz most stuff hasn't been implemented lol. if you're developing stuff, you probably want to use different flags here for testing + debugging certain compilation phases:
    • -E: parse + pre-process the input file and unparse to stdout (useful for developing the parser)
    • -dump: check the input file and dump all top-level declarations to stdout (useful for developing the checker)
    • -emit-qbe: emit the qbe .ssa file
    • see -h for other supported flags

#hi pls contribute

Send patches to ~sebsite/hare-c@lists.sr.ht (archive)

some stuff you can help with if you want

  • optimization; make it faster and not slow
  • test coverage; look for stuff in the checklist that's checked off but without a "T" and add some tests or something
  • idk just checking stuff off the checklist i guess. one easy one is to fix float literal lexing, since currently it requires a . appears somewhere, when a float literal needn't have a . if it has an exponent, so uh yeah that should be pretty straightforward maybe
  • help out getting a good checker going, and basically anything that needs to be done to get hacc up and running
  • help with hacc codegen stuff
  • if you need help or have questions ping me on fedi or irc or wherever

#Checklist

A checkmark means the thing is implemented; a T means the thing is properly tested.

   basics
T🗸   comments
T🗸     block
T🗸     line
T🗸   backslashes at end of line
     declarations
       must declare at least one identifier
       specifiers
         type specifiers
T🗸         signed, unsigned
T🗸         void
T🗸         char
T🗸         short
T🗸         int
T🗸         long
T🗸         float
T🗸         double
           struct/union
 🗸           struct
T🗸             bit-fields
T🗸               named
T🗸               anonymous
 🗸             declarations
T🗸               named
T🗸               anonymous
 🗸               definitions
 🗸               forward
             union
               bit-fields prohibited
 🗸             declarations
T🗸               named
T🗸               anonymous
 🗸               definitions
 🗸               forward
             must contain at least one named member
             variably modified member types disallowed
           enum
             declarations
T🗸             named
T🗸             anonymous
T🗸             definitions
               forward disallowed
 🗸           fields inserted into scope
           typedefs
T🗸           parse
T🗸           inserted into scope
T🗸           file scope
             block scope
               disallowed for variably-modified types
T🗸             allowed otherwise
         storage specifiers
 🗸         auto
 🗸           parse
 🗸         static
 🗸           parse
           register
 🗸           parse
             forbids taking address
               including implicit addressing, e.g. array -> pointer conversion
 🗸         extern
 🗸           parse
T🗸         typedef
T🗸           parse
           at most one storage specifier per declaration (sans thread_local)
 🗸       duplicate specifiers
 🗸         disallowed for type specifiers
 🗸         allowed for other specifiers
         disallow incompatible
 🗸         type specifiers
           other specifiers
 🗸     qualifiers
 🗸       allowed in any order
 🗸       duplicates allowed
       declarators
 🗸       pointer
         function
 🗸         named parameters
 🗸         anonymous parameters
 🗸         without storage specifier (defaults to implicit auto)
 🗸         register storage specifier
           all other specifiers disallowed
           array-typed parameters converted to (qualified) pointer-typed parameters
           function-typed parameters converted to pointer-typed parameters pointing to function
           function -> pointer to function conversion
           return type requirements
             must return either complete type or void
             may not return array
         array
 🗸         without qualifiers
           qualifiers apply to element type
           ...except as inner-most declarator of function parameter
           if present, size must be greater than zero
           array -> pointer conversion
 🗸       identifier only
 🗸       parenthesized
 🗸       type names
 🗸         with declarator
 🗸         without declarator
       scopes
 🗸       file
 🗸         parse
 🗸         auto/register specifiers disallowed
 🗸         default storage
 🗸           extern for functions
 🗸           static for objects
         block
 🗸         parse
           default storage is auto
         duplicates
           file scope
 🗸           allowed during parse
             during check
               allowed for compatible declarations
                 equivalent types
                 composite type
               disallowed for incompatible declarations
               at most one definition (including initialization) allowed
 🗸         block scope
 🗸           disallowed within same scope
 🗸           shadowing in nested scopes
 🗸             shadowing block-scoped declarations
 🗸             shadowing file-scoped declarations
 🗸       unique namespaces
 🗸         identifiers
 🗸         struct/union/enum
 🗸         goto labels
     expressions
       literals
         numbers
           pre-processor numbers
 🗸         int literals
T🗸           decimal
T🗸           hex
T🗸           octal
T🗸           suffixes
T🗸             u
T🗸             l
 🗸           parse value
           float literals
T🗸           decimal
             exponent
T🗸           f suffix
 🗸           parse value
T🗸       char literals
T🗸         plain
T🗸         L prefix
 🗸       string literals
T🗸         plain
T🗸         L prefix
 🗸         string concatenation
T🗸     sizeof
T🗸     array indexing
T🗸     unary postfix
T🗸     unary prefix
T🗸     binary
T🗸     parenthesized
T🗸     operator precedence
T🗸     casting
T🗸     assignment
     statements
       goto
T🗸       parse
         jump to any label within function; disallow undefined labels
         may not jump from outside scope of VMT to inside scope (i.e. after it's declared and accessible)
T🗸     compound
T🗸       parse
 🗸     labelled
         goto label
T🗸         parse
           unique per-function
         case
 🗸         parse
           must evaluate to integer constant
           must only appear in switch body
             direct descendant
             within another block
             disallowed everywhere else
             only visible to inner-most switch body
           each case in switch body is unique
           ...but nested switch statements may contain duplicates
         default
 🗸         parse
           must only appear in switch body
             direct descendant
             within another block
             disallowed everywhere else
             only visible to inner-most switch body
           at most one per switch body
           ...but nested switch statements may contain their own
T🗸       labelled statements may themselves be labelled
T🗸     empty
T🗸       parse
 🗸     if
T🗸       parse
 🗸       condition must be scalar
 🗸     while
T🗸       parse
 🗸       condition must be scalar
 🗸     do-while
T🗸       parse
 🗸       condition must be scalar
       for
T🗸       parse
         condition must be scalar
 🗸     switch
T🗸       parse
 🗸       value must be integer
       break, continue
T🗸       parse
         only allowed within loops
       return
T🗸       parse
         type of return value must be compatible with function return type
T🗸     expression statements
T🗸       parse
T🗸       implicitly cast to void
     initializers
       as initializer of declaration
T🗸       parse
         must be constant expression for static objects
T🗸       may be any expression otherwise
         may be surrounded by braces
           when single scalar object
           when using literal to initialize array (e.g. string literal)
T🗸     in cast expression (compound literal)
T🗸       parse
 🗸   variadic functions
     function definitions
       declarator must be function
 🗸     without arguments (void)
 🗸     with named arguments
       argument types must be complete (sans (void) case)
 🗸   identifiers
 🗸   no such thing as an invalid token
   pre-processor
 🗸   macro definitions
 🗸     #define
 🗸       variable-like
 🗸       function-likes
 🗸     undef
     macro substitution
 🗸     pre-defined
 🗸      macros
 🗸         __STDC__
 🗸         __STDC_HOSTED__
 🗸         __FILE__
 🗸         __LINE__
 🗸         __DATE__
 🗸         __TIME__
       operators
         # (%:)
 🗸       ## (%:%:)
 🗸         with macro arguments
 🗸           only token closest to ## is concatenated
 🗸           when argument expands to no tokens, replace with placemarker
 🗸         with other tokens
 🗸         constructs new token
 🗸           non-pre-processing tokens
 🗸           constructed token won't be used for pre-processing (i.e. no # or ##)
 🗸     # has higher precedence than ##
 🗸     object-like
 🗸     function-like
 🗸       works when followed by left paren
 🗸       doesn't work when not followed by left paren
 🗸       parameters
 🗸         expand non-recursively
 🗸           expands macros
 🗸           everything else
 🗸         don't expand recursively
 🗸           from parameters
 🗸           from macros
 🗸     recursive
 🗸       macros can't expand to themselves
 🗸       everything else expands
 🗸   #include
 🗸     system headers
 🗸     non-system headers
 🗸       header exists
 🗸       header doesn't exist; fallback to system header
 🗸         header name doesn't have angle brackets
 🗸         header name has angle brackets
 🗸     macro expansion
 🗸     <this> is lexed as a system header string literal
 🗸       true in #include
 🗸       false everywhere else
 🗸   conditional
 🗸     #if, #endif
 🗸     #else
 🗸     #elif
 🗸     defined
 🗸     #ifdef
 🗸     #ifndef
 🗸   #error
 🗸     errors out
 🗸     error message uses all tokens
 🗸     macros aren't expanded
     #line
 🗸     only change line number
 🗸     also change filename
       macros are expanded
   legacy
T🗸   trigraphs
 🗸   k&r-style functions
 🗸     empty parameter list
 🗸     non-empty parameter list without types
 🗸     k&r-style parameter declarations
 🗸   implicit int
     implicit function declaration
 🗸 c95
T🗸   digraphs
 🗸   __STDC_VERSION__
   c99
 🗸   pragma
 🗸     #pragma
 🗸     _Pragma
 🗸     STDC
 🗸       accept standardized
 🗸         FP_CONTRACT
 🗸         FENV_ACCESS
 🗸         CX_LIMITED_RANGE
 🗸       reject unstandardized
 🗸     implementation-defined
 🗸       accepted; ignored
     VLAs
 🗸     parse
       can't be initialized
 🗸   hex float literals
T🗸     prefix / base
 🗸     parse value
T🗸     exponent
     universal character names
T🗸     in char/string literals
       in identifiers
         prior to C17, conform to annex D
 🗸     disallowed: (<0xa0 && !='$' && !='@' && !='\'') || (>=0xd800 && <0xe000)
T🗸   initializer designators
T🗸     array
T🗸     struct
     declaration as for-loop initializer
T🗸     parse
       must be auto or register (default is auto)
     specifiers
T🗸     type specifiers
T🗸       _Complex
T🗸       _Imaginary
T🗸       long double
       inline
 🗸       parse
         applies to function declaration, not to return type
 🗸     static array parameters
 🗸   restrict
 🗸   variadic macros
 🗸     use of ... in definition
 🗸     __VA_ARGS__
T🗸   intermixing declarations and statements in compound statements
T🗸   literal suffixes
T🗸     ll int literal suffix
T🗸     l float literal suffix
 🗸   __func__
     implicit return 0 from main
   c11
     specifiers
       _Thread_local
         parse
         may only appear alongside static/extern
           or on its own if static/extern is default
         initializer must be constant expression
       _Noreturn
 🗸       parse
         applies to function declaration, not to return type
       _Alignas
 🗸       expression
 🗸       type
         duplicates permitted; most strict alignment used
         invalid uses
           alongside typedef/register specifier
           on function
           on bit-field
         must not specify less strict alignment than default
       _Atomic
         as specifier
           parse
           type in parens must not be qualified
           type in parens must not be array, function, or atomic
         as qualifier
 🗸         parse
           other qualifiers apply to atomic type, not to type being made atomic
     _Static_assert
T🗸     parse
T🗸       top-level
T🗸       within function
T🗸       within struct/union
       eval
 🗸       top-level
         within function
         within struct
T🗸   _Alignof
T🗸   _Generic
 🗸   prefixes
T🗸     char literals
T🗸       u8-
T🗸       u-
T🗸       U-
T🗸     string literals
T🗸       u-
T🗸       U-
     nested anonymous structs/unions
 🗸   pre-defined macros
 🗸     __STDC_IEC_559__
 🗸     __STDC_UTF_16__
 🗸     __STDC_UTF_32__
   c23
     unicode identifiers
     specifiers
       _BitInt
T🗸       signed
T🗸       unsigned
         wb int literal suffix
T🗸     typeof
T🗸       qualified (typeof)
T🗸       unqualified (typeof_unqual)
       float types
         basic
           real
             _Float32
             _Float64
             _Float80
             _Float128
           imaginary
             _Float32_Imaginary
             _Float64_Imaginary
             _Float80_Imaginary
             _Float128_Imaginary
           complex
             _Float32_Complex
             _Float64_Complex
             _Float80_Complex
             _Float128_Complex
         extended
           real
             _Float32x
             _Float64x
             _Float128x
           imaginary
             _Float32x_Imaginary
             _Float64x_Imaginary
             _Float128x_Imaginary
           complex
             _Float32x_Complex
             _Float64x_Complex
             _Float128x_Complex
       decimal types
         basic
           _Decimal32
T🗸           type specifier
             df suffix
           _Decimal64
T🗸           type specifier
             dd suffix
           _Decimal128
T🗸           type specifier
             dl suffix
         extended
           _Decimal64x
           _Decimal128x
 🗸     constexpr
     pre-processor
       #embed
         system embeds
         non-system embeds
           embed exists
           embed doesn't exist; fallback to system header
             embed name doesn't have angle brackets
             embed name has angle brackets
         parameters
           standard
             if_empty
             limit
             prefix
             suffix
           vendor-specific
           duplicates not permitted
           leading+trailing underscores permitted
       #warning
         actually warns
 🗸       warning message uses all tokens
 🗸       macros aren't expanded
       __VA_OPT__
 🗸     #elifdef, #elifndef
       standard pragmas
         FENV_ROUND
 🗸         value must be direction
           eval
           information persists in check
         FENV_DEC_ROUND
 🗸         value must be dec-direction
           eval
           information persists in check
 🗸   static_assert without reason
     type inferencing with auto
T🗸   nullptr
T🗸   labels
T🗸     labelled declarations
T🗸     at end of compound statement
 🗸   binary int literals
 🗸     parse value
     function declarations
       variadic function without parameters
       parameters in function definition need not be named
     attributes
       standard
         [[noreturn]], [[_Noreturn]]
           parse
             without argument
             disallow argument
           applicable to
             function
         [[deprecated]]
           parse
             without argument
             with argument
           applicable to
             struct/union declaration
             typedef name
             object
             struct/union member
             function
             enum
             enum member
           __has_c_attribute => 202311L
           warn when name is used
         [[fallthrough]]
           parse
             without argument
             disallow argument
           applicable to
             lone attribute declaration
               next encountered statement must have case or default label
               if within iteration statement, next statement must also be within said iteration statement
           __has_c_attribute => 202311L
         [[maybe_unused]]
           parse
             without argument
             disallow argument
           applicable to
             struct/union declaration
             typedef name
             object
             struct/union member
             function
             enum
             enum member
             label
           __has_c_attribute => 202311L
         [[nodiscard]]
           parse
             without argument
             with argument
           applicable to
             function
             struct/union definition
             enum definition
           __has_c_attribute => 202311L
           warn when value discarded
         [[reproducible]]
           TODO
         [[unsequenced]]
           TODO
       disallow non-standard
       leading+trailing underscores permitted
T🗸   u8- string literals
     empty initializers
     ' as separator
       int literals
       float literals
T🗸   empty declarations
     pre-defined macros
 🗸     alignas
T🗸     alignof
T🗸     bool
       true
 🗸       defined
         expands to _Bool value
       false
 🗸       defined
         expands to _Bool value
 🗸     static_assert
 🗸     thread_local
       __has_c_attribute
       __has_embed
       __has_include
 🗸     constants
 🗸       __STDC_IEC_60559_TYPES__
 🗸       __STDC_IEC_60559_BFP__
 🗸       __STDC_IEC_60559_DFP__
 🗸       __STDC_EMBED_NOT_FOUND__
 🗸       __STDC_EMBED_FOUND__
 🗸       __STDC_EMBED_EMPTY__
     explicit enum backing type
     storage specifiers in compound literal type name
     treat empty parameter list as identical to (void)
 🗸 extensions
 🗸   redefinition of macros
 🗸     user-defined
 🗸     pre-defined
 🗸   shadowing keywords with macro definitions
 🗸     #define
 🗸     #undef
 🗸     expansion
T🗸   pre-declared identifiers
T🗸     __builtin_va_arg
T🗸     __builtin_va_copy
T🗸     __builtin_va_end
T🗸     __builtin_va_list
T🗸     __builtin_va_start
 🗸   pre-defined macros
 🗸     __has_builtin
 🗸   __asm__ declarations
   additional warnings
     use of reserved identifier
     SOURCE_DATE_EPOCH is invalid