If this implementation had existed in the 1980s, the C standard would have a rule that different tokens hashing to the same 16-bit value invoke undefined behavior, and optimizing compilers in the 2000s would simply optimize such tokens away to a no-op. ;)
Oh, it looks like my X86-16 boot sector C compiler that I made recently [1]. Writing boot sector games has a nostalgic magic to it, when programming was actually fun and showed off your skills. It's a shame that the AI era has terribly devalued these projects.
Er, what? The article describes a compiler for a not-quite-C programming language which fits entirely in 512B. Your project, if I see this correctly, can optionally produce code meant to execute as boot sector.
Both interesting projects, but other than the words 'boot sector', 'C' and 'compiler', I don't see a similarity.
An interesting use case - for the compiler as-is or for the essentiall idea of barely-C - might be in bootstrapping chains, i.e. starting from tiny platform-specific binaries one could verify the disassembly of, and gradually building more complex tools, interpreters, and compiler, so that eventually you get to something like a version of GCC and can then build an entire OS distribution.
This is very nice. I'm currently writing a minimalist C compiler although my goal isn't fitting in a boot sector, it's more targeted at 8-bit systems with a lot more room than that.
This is a great demonstration of how simple the bare bones of C are, which I think is one reason I and many others find it so appealing despite how Spartan it is. C really evolved from B which was a demake of Fortran, if Ken Thompson is to be trusted.
Would and how much would it shrink when if, while, and for were replaced by the simple goto routine? (after all, in assembly there is only jmp and no other fancy jump instruction (I assume) ).
And PS, it's "chose your own adventure". :-)
I love minimalism.
What fancy jumps are present in assembly depends on the CPU architecture. But there are always conditional jumps, like JNZ that jumps if the Zero flag isn't set.
The “fancy jump” is the branch instruction. As far as I know all ISAs have them. Even rv32i which is famously minimal has several branch instructions in addition to two forms of unconditional jump. Branches are typically used to construct if / for / while as well as && and || (because of short circuiting) and ternary (although some architectures may have special instructions for that that may or may not be faster than branches depending on the exact model). Without it you would have to use computed goto with a destination address computed without conditional execution using constant time techniques.
It only does if & while, not for. A goto in a single-pass thing would need separate handling for forwards vs backwards jumps, which involves keeping track of data per name (in a form where you can tell when it's not yet set; whereas if/while data is freely held in recursion stack). And you'd still need to handle at least `if ( expr ) goto foo;` to do any conditionals at all.
This is the kind of project that reminds you how far removed modern development is from the actual machine. We pile abstractions on abstractions until "Hello World" needs 200MB of node_modules, and then someone fits a C compiler in 512 bytes.
Not saying we should all write boot sector code, but reading through projects like this is genuinely humbling. Great educational resource too.
It's a fun comparison, but with the notable difference that that one can compile the Linux kernel and generate code for multiple different architectures, while this one can only compile a small proportion of valid C. It's a great project, but it's not so much a C compiler, as a compiler for a subset of C that allows all programs this compiler can compile to also be compiled by an actual C compiler, but not vice versa.
And it doesn't for the compiler in question either. As long as the headers exist in the places it looks for them. No compiler magically knows where the headers are if you haven't placed them in the right location
stddef.h (et al) should be shipped by the compiler itself, and so it should know where it is. But they rely on gcc for it, hence it doesn't always know where to look. Seems totally fine for a prototype.
Shipping GPL headers that explicitly state that they are part of GCC with a creative commons licensed compiler would probably make a lot of people rather unhappy, possibly even lawyers.
I've certainly encountered clang & gcc not finding or just not having header files a good couple times. Mostly around cross-compilation, but there was a period of time for which clang++ just completely failed to find any C++ headers on my system.
So we're down to a missing or unclear description of a dependency in a README - note following the instructions worked for others -, from implications the compiler didn't work.
I mean we know it can be done in little space, given the many tiny C compilers. I think what is most interesting about this one is exactly the creative shortcuts. It's an interesting design space for e.g. bootstrapping to impose extra restrictions.
I actually "shipped" a parser using the symbols' hash(as the only identifier) for a test tool once. Hopefully, the users never used enough symbols to collide 32-bits.
For me is not interesting because it fits in 512 bytes, it's interesting because it's very simple. I think it would be a great introduction to learning about compilers.
That would be true of one using a libc, but in a boot sector, you only have the bios, so the atoi being referenced is the one defined in c near the beginning of the article
I've had enough of headlines that overpromise and underdeliver. It's essentially false advertising. It's not like the word "subset" would put it over the length limit.
Nice, now you can dd it to your boot sector and ... Wait, it is 2026, there are 1000 ways of booting and memory mapping on so-called unified ARM architecture @,@