This repository contains an implementation of C17 language compiler from
scratch. No existing open source compiler infrastructure is being reused. The
main priority is self-sufficiency of the project, compatibility with platform
ABI and compliance with C17 language standard. Some exceptions to the standard
were made (see Exceptions
section below).
Kefir supports modern x86-64 Linux, FreeBSD, OpenBSD and NetBSD environments
(see Supported environments
section below). Compiler is also able to produce
JSON streams containing program representation on various stages of compilation
(tokens, AST, IR), as well as printing source code in preprocessed form. By
default, the compiler outputs GNU As-compatible assembly (Intel syntax
with/without prefixes and ATT syntax are supported). Position-independent code
generation is supported. Kefir features cc
-compatible command line interface.
Kefir compiler is named after fermented milk drink, no other connotations are meant or intended.
Kefir targets x86-64 ISA and System-V ABI. The main focus is on modern Linux
environments (with full range of automated tests is executed there), however
Kefir also has support for modern FreeBSD, OpenBSD and NetBSD operating systems
(base test suite and bootstrap are executed successfully in these environments).
A platform is considered supported if the base test suite and 2-stage compiler
bootstrap can be executed there -- no other guarantees and claims are made. On
Linux, glibc
and musl
standard libraries are supported (musl
is
recommended because it's header files are more compilant with standard C
language without extensions), on BSDs system libc
can be used (additional
macro definitions, such as __GNUC__
, __GNUC_MINOR__
, could be necessary
depending on used system libc features). Kefir supports selection of target
platform via --target
command line option.
For each respective target, a set of environment variables (e.g.
KEFIR_GNU_INCLUDE
, KEFIR_GNU_LIB
, KEFIR_GNU_DYNAMIC_LINKER
) needs to be
defined so that the compiler can find necessary headers and libraries.
The main motivation of the project is deeper understanding of C programming language, as well as practical experience in the broader scope of compiler implementation aspects. Based on this, following goals were set for the project:
libc
implementations.Following things are NON-goals:
Note on the language standard support: initially the compiler development was focused on C11 language standard support. The migration to C17 happened when the original plan was mostly finished. During the migration applicable DRs from those included in C17 were considered and code was updated accordingly. The compiler itself is still written in compliance with C11 language standard.
See CHANGELOG
.
The initial implementation has been finished: at the momement the compiler features all necessary components, and with some known exceptions (and unknown bugs) supports C17 language standard. Kefir is able to re-use standard library provided by host system. Further effort is concentrated on improving and extending the compiler, including:
Following exceptions were made in C17 implementation:
_Complex
floating-point number support. This feature is not being
used particularly frequently, but, at the same time, it complicates target code
generator.STDC
pragmas are implemented in preprocessor. Respective standard library
parts are out-of-scope, thus implementing these pragmas have no value at the
moment.At the moment, Kefir implements following builtins for compatibility with GCC:
__builtin_va_list
, __builtin_va_start
, __builtin_va_end
,
__builtin_va_copy
, __builtin_va_arg
, __builtin_alloca
,
__builtin_alloca_with_align
, __builtin_alloca_with_align_and_max
,
__builtin_offsetof
. __builtin_types_compatible_p
, __builtin_choose_expr
,
__builtin_constant_p
, __builtin_classify_type
, __builtin_trap
,
__builtin_unreachable
, __builtin_return_address
, __builtin_frame_address
,
__builtin_extract_return_addr
, __builtin_frob_return_addr
,
__builtin_ffs[l,ll]
, __builtin_clz[l,ll]
, __builtin_ctz[l,ll]
,
__builtin_clrsb[l,ll]
, __builtin_popcount[l,ll]
, __builtin_parity[l,ll]
,
__builtin_bswap16
, __builtin_bswap32
, __builtin_bswap64
,
__builtin_huge_valf
, __builtin_huge_val
, __builtin_huge_vall
,
__builtin_inff
, __builtin_inf
, __builtin_infl
, and provides compatiblity
stubs for some others.
Kefir supports __attribute__(...)
syntax on parser level, however attributes
are ignored in most cases except aligned
/__aligned__
and __gnu_inline__
attributes. Presence of attribute in source code can be turned into a syntax
error by CLI option.
Several C language extensions are implemented for better compatibility with GCC. All of them are disabled by default and can be enabled via command-line options. No specific compability guarantees are provided. Among them:
int function_name()
is automatically defined. The feature was
part of previous C standards, however it's absent from C11 onwards.int
as implicit function return type -- function definition may omit return
type, int
will be used instead.fieldname:
form -- old, deprecated form which is
still supported by GCC.&&
operator, gotos support
arbitratry addresses in goto *
form.Kefir also defines a few non-standard macros by default. Specifically, macros
indicating data model (__LP64__
), endianess (__BYTE_ORDER__
and
__ORDER_LITTLE_ENDIAN__
), as well as __KEFIRCC__
which can be used to
identify the compiler.
Kefir has support of asm
directive, both in file and function scope.
Implementation supports output and input parameters, parameter constraints
(immediate, register, memory), clobbers and jump labels, however there is no
compatibility with respective GCC/Clang functionality (implemented bits behave
similarly, though, thus basic use cases shall be compatible). Additionally,
asm
-labels are supported for non-static non-local variables.
Kefir can be used along with musl libc standard
library, with the exception for <complex.h>
and <tgmath.h>
headers which are
not available due to lacking support of _Complex
types. Kefir also supports
glibc
, as well as libc
implementations provided with FreeBSD, OpenBSD and
NetBSD, however the support is limited due to presence of non-standard C
language extensions in header files which might cause compilation failures
(additional macros/patched stdlib headers might need to be defined for
successful compilation).
Attention: code produced by Kefir shall be linked with a runtime library
libkefirrt.a
. The library is linked automatically if environment is configured
correctly. Kefir can also link built-in versions of runtime, however make sure
that correct --target
is specified during link phase. Kefir provides own
versions of some header files as well -- if environment is configured correctly,
they are also added to include path automatically.
Usage is strongly discouraged. This is experimental project which is not meant for production purposes.
Kefir depends on C11 compiler (tested with gcc
and clang
), GNU As assembler,
GNU Makefile as well as basic UNIX utilities for build. Development and test
dependencies include valgrind
(for test execution) as well. After installing
all dependencies, kefir can be built with a single command: make all OPT="-O2 -march=native -DNDEBUG" DBG="" -j$(nproc)
. By default, kefir builds a shared
library and links executables to it. Static linkage can be used by specifying
USE_SHARED=no
in make command line arguments. Sample PKGBUILD
is provided in
dist
directory.
It is also advised to run basic test suite:
LC_ALL=C.UTF-8 make test all OPT=-O3 -j$(nproc) # Linux
gmake test all PLATFORM=freebsd OPT=-O3 CC=clang # FreeBSD
gmake test all CC=clang AS=gas PLATFORM=openbsd OPT=-O3 # OpenBSD
gmake test all CC=gcc AS=gas PLATFORM=netbsd OPT=-O3 # NetBSD
Optionally, Kefir can be installed via: make install DESTDIR=...
. Short
reference on compiler options can be obtained by running kefir --help
.
At the moment, Kefir is automatically tested in Ubuntu 22.04 (full range of tests), FreeBSD 13.2 and OpenBSD 7.2 (base test suite) environments. Arch Linux is used as primary development environment.
Kefir supports compilation with Emscripten into a
WebAssembly library, which can be invoked from client-side JavaScript in
Web-applications. Kefir functionality in that mode in limited due to absence of
normal POSIX environment, linker and assembler utilities: only text output
(assembly code, preprocessed code, tokens, ASTs, IR) can be produced from a
single input file. Furthermore, all include files need to be explicitly supplied
from JavaScript side in order to be available during compilation. Note that this
does not imply support for WebAssembly as a compilation target: it only serves
as a host environment. To build kefir.js
and kefir.wasm
in bin/web
directory, use:
make web -j$(nproc) # Requires Emscripten installed
A simple playground Web application is also available. It bundles Kefir web build with Musl include files and provides a simple Godbolt-like interface. Build as follows:
make webapp -j$(nproc)
The Web application is static and requires no server-side logic. An example of simple server command-line:
python -m http.server 8000 -d bin/webapp
A hosted version of the Web application is available at Kefir playground (please note that the Web page uses JavaScript and WebAssembly).
Kefir is capable of bootstraping itself (that is, compiling it's own source code). At the moment, the feature is under testing, however stage 2 bootstrap is working well. It can be performed as follows:
make bootstrap -j$(nproc)
Alternatively, bootstrap can be performed manually:
# Stage 0: Build & Test initial Kefir version with system compiler.
# Produces dynamically-linked binary in bin/kefir and
# shared library bin/libs/libkefir.so
make test all -j$(nproc)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/bin/libs
# Stage 1: Use previously built Kefir to compile itself.
# Replace $MUSL with actual path to musl installation.
# Produces statically-linked binary bin/bootstrap1/kefir
make -f bootstrap.mk bootstrap SOURCE=$(pwd)/source HEADERS=$(pwd)/headers BOOTSTRAP=$(pwd)/bootstrap/stage1 KEFIRCC=./bin/kefir \
LIBC_HEADERS="$INCLUDE" LIBC_LIBS="$LIB" -j$(nproc)
rm -rf bin # Remove kefir version produced by stage 0
# Stage 2: Use bootstrapped Kefir version to compile itself once again.
# Replace $MUSL with actual path to musl installation.
# Produces statically-linked binary bin/bootstrap2/kefir
make -f bootstrap.mk bootstrap SOURCE=$(pwd)/source HEADERS=$(pwd)/headers BOOTSTRAP=$(pwd)/bootstrap/stage2 KEFIRCC=./bootstrap/stage1/kefir \
LIBC_HEADERS="$INCLUDE" LIBC_LIBS="$LIB" -j$(nproc)
# Stage 3: Diff assembly files generated by Stage 1 and Stage 2.
# They shall be identical
./scripts/bootstrap_compare.sh bootstrap/stage1 bootstrap/stage2
Furthermore, kefir
can also be bootstrapped using normal build process:
make all CC=$PATH_TO_KEFIR -j$(nproc)
Kefir relies on following tests, most of which are executed as part of CI:
*.c
files which are
compiled either using system compiler or kefir depending on file
extension. Everything is then linked together and executed. The test suite
is executed on Linux with gcc and clang compilers, on FreeBSD with clang and
on OpenBSD with clang. In Linux and FreeBSD environments valgrind
is used
to control test suite correctness at runtime.compile
& execute
parts of GCC torture test suite are
executed with kefir compiler, with some permissive options enabled. At the
moment, out of 3445 tests, 584 fail and 29 are skipped due to being irrelevant
(e.g. SIMD or profiling test cases; there is no exhaustive skip list yet). All
failures happen on compilation stage, no abortions occur at runtime. The work
with torture test suite will be continued in order to reduce the number of
failures. The torture tests are included into CI pipeline with some basic test
result checks.sqllogictest
is planned.Own test suite is deterministic (that is, tests do not fail spuriously), however there might arise problems when executed in unusual environments. For instance, some tests contain unicode characters and require the environment to have appropriate locale set. Also, issues with local musl version might cause test failures.
Currently, extension of the test suite is a major goal. It helps significantly in eliminating bugs, bringing kefir closer to C language standard support, improving compiler UX in general.
In order to simplify translation and facilitate portability, intermediate
representation (IR) layer was introduced. It defines architecture-agnostic
64-bit stack machine bytecode, providing generic calling convention and
abstracting out type layout information. Compiler is structured into separate
modules with respect to IR: code generation and AST analysis and translation.
Two code generators are available: naive (old) code generator that produces
straightforward threaded code, and (not yet) optimizing code generator. The
latter includes separate optimizer representation which is derived from the IR.
IR layer provides several interfaces for AST analyzer to retrieve necessary
target type layout information (for instance, for constant expression analysis).
AST analysis and translation are separate stages to improve code structure and
reusability. Parser uses recursive descent approach with back-tracking. Lexer
was implemented before preprocessor and can be used independently of it
(preprocessing stage can be completely omitted), thus both lexer and
preprocessor modules share the same lexing facilities. Driver links kefir as a
library and uses fork
syscalls in order to isolate each file processing.
Author: Jevgenijs Protopopovs
The code base also includes patches from:
License: