What are some good resources/courses/books for learning compilers, especially in a hands-on way? "Crafting Interpreters" seems to be well-liked, is that a decent place to start? Appreciate any other tips.
This paper is my favorite introduction to compilers, it's short and hands-on, goes from compiling a primitive program that does nothing but returns a single integer to a full-blown implementation of a real programming language in 24 small steps: http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf
I wrote a tutorial series for my students: Let’s make a Teeny Tiny compiler. It’s much smaller than Crafting Interpreters, so I’d recommend it before you go there to test the waters.
Writing an Interpreter in Go / Writing a Compiler in Go, the two part series by Thorsten Ball, was what I really cut my teeth on. It straightforwardly takes you through some parsing, ASTs, a basic tree-walk interpreter, and then in part two refactors the whole thing to compile to bytecode and introduces a virtual machine.
Like other sibling comments mention about Crafting Interpreters, I recommend following the book but implementing in another language you're familiar with. I ran through "Writing an Interpreter" in Go once, but felt like it stuck a lot harder when I went back through and did it again in Swift. Though you're still following code with the hard parts figured out, the simple act of translating the syntax will mean you're looking harder at what's actually happening.
Someone on here a few weeks ago also mentioned chapter 8 of Rob Pike and Brian Kernighan's "the UNIX Programming Environment". I've had an old eBay recovered copy of this book on my shelve for a few years, it's a neat intro to shell as well, but I've been working though chapter 8 this Memorial Day weekend to dip my toes back into the vintage C waters. The book builds up a simple calculator adding variables, builtins, complex control flow, and functions. It starts as a pure yacc parser, and then shifts to a strategy reminiscent of bytecode. This time I'm using a process I'm familiar with (language implementation) to sharpen my C skills.
By doing either of these projects, you also then have a testbed language where you can start developing more advanced concepts. GC, a type system, externalizeable bytecode, linking, the list goes on forever. About four years ago I started working on the Apex language at Salesforce, and the problems encountered in these projects are absolutely still relevant at the scale we're compiling and executing code across our datacenter.
EDIT: I'll also throw in... it gets a lot more fun when the parser's done. If you're finding recursive descent to be a bit of a brain twister, fight through!
Thanks for the thorough reply. I had seen that Crafting Interpreters used Java as the language, glad to hear that implementing in a different language is ideal.
Seconding reading Crafting Interpreters but writing with a different language. I took it as an excuse to hone my Rust skills, as well as learn the behind the scenes of the compiler. I don't think I would have learned nearly as much if I didn't have to read a paragraph, stop, and then really thoroughly examine the ideas/assumptions behind it to ensure an accurate translation.
Another strategy I used was to look at the title of the chapter, then work ahead as much as possible to implement that. It really helped my learning process to naturally explore the problem space myself, then read through and see how my naive attempts compared to a more seasoned implementation. But as the parent said, ymmv!
Keep in mind that Crafting Interpreters starts with Java, but switches to C later on. If you want to write your code in a different language (which I did), don't choose either of those.
I used C# for the first part, easy to translate from the Java code but you can make things more concise with modern C# features while still having access to a GC which is kinda necessary for the first part of the book.
I got rid of the code gen and visitor pattern stuff by using records in C# with pattern matching which felt a lot simpler to me.
Depending on where you're coming from, you can pick C, Java or Standard ML as implementation language to go along and implement the Tiger language in its variants across the book.
Here is my second attempt with interpreters, It is a 7 part series. it uses JSON as AST and mostly goes into what happens after parsing and things like how scope, variables and functions can be represented.
It isn't perfect and it might not contain all the best practices out there, consider it notes from a student struggling to learn about the topic
If no other reason, compilers are just great fun and really satisfying to write. Watching your compiler take a simple expression and spit out a bunch of code referencing locals, referencing globals, type conversions and coercions, function calls, managing precedence. This little thing created all this code.
It’s exciting. Especially the first time, it’s magic. It’s a lot of fun.
Once I wrote a DSL for a rule system, compiled down into Java source. Later, I wrote another higher level rule system that generated the DSL.
At one point I think there was a USENET .sig line that read “Will write code that writes code that writes code for food.” Indeed.
Writing a compiler is still on my list since a few years, but everytime I start I get stuck in the parser. It just doesn't seem to click with me. On the next couple of weekends I'll start again with Flix (flix.dev) and maybe Crafting Interpreters. Wish me luck.
During my 2015 Compilers class, it was indeed the worst step and hardest to get right. We spent (my and my team mate) about 60%+ of the time making that parser actually parse. The lexing, tokenizing, etc is nothing compared to it. In another project, I used Antlr to avoid the headaches, but there are definitely better alternatives out there (some one mentioned Sly in the comments).
Right now, if I want to mess with some transformation passes for an existing language, I would just look for LLVM and Clang for C for example (readily made front end, and backend with hackable middle end).
P.S. I am a hobbyist at this stage and you can consider me knowing nothing about LLVM. My advice is based on my friends' (same colleague) who works for nVidia for their CUDA compiler using LLVM.
Another reason for me to be interested in LLVM is that I used JVM languages (Scala, Java) for the past 20+ years and with GraalVM and its polyglot support for LLVM based languages; whose support was added using LLVM.
Something to consider is a parser generator for your language. They get a lot of flack online these days for some reason from people saying "a recursive descent parser is faster and not that hard", but they're generally time consuming if you just want to get something going. PLY (or maybe it's SLY now) for Python is really great.
Good, it's not the most important or interesting part imo. I also had an entire course about parsing, just not about compilers, and I don't see why many professors give it such an amount of attention.
> And because compilers are largely deterministic closed-system problems -- for each input, there is one and only output
Not even close! Optimization and code generation is a fantastically hard problem full of heuristics and tradeoffs.
There is definitely more than one way to compile C source code to x86 binary code. There is even more than one way to compile TypeScript to JavaScript if you consider things like whitespace, indentation, and constant folding.
He probably meant that a compiler is a pure function for a given source file and a set of flags. Of course, if you change a compiler or flags, you can get something radically different, but you shouldn't get something radically different because it's Tuesday, midnight, your username starts with n, you have more than 3 hard drives, or because it reads some bytes from /dev/urandom.
It's bit of a stretch, sure, but I don't expect my performance problems to go away by recompiling stuff over and over again without changing either the code or flags and expecting the optimizer to make a better decision next time. It doesn't work that way.
> a compiler is a pure function for a given source file and a set of flags.
This is true, but not what they actually said.
> you shouldn't get something radically different because it's Tuesday, midnight [...] I don't expect my performance problems to go away by recompiling stuff over and over again without changing either the code or flags and expecting the optimizer to make a better decision next time
Maybe we should, though. I shouldn't have to be forever locked into the performance profile that matches the -O level that my distribution's package maintainers used at compile time.
I'd love to see a shift in programming systems towards a place where the language is designed with particular attention to how fast it can be processed by the toolchain to get _something_ on disk as quickly as possible and further optimization is deferred to a later stage to be fulfilled by a separate, asynchronous process. Imagine if there were no tradeoff between time spent waiting on the compiler to finish vs runtime performance, because your program no matter how large would never take more than 20 seconds* to compile. For more modestly sized programs, the effect would be the ability to test it almost immediately, but the compiler continues optimizing away all the while—to the point that you could even go to sleep on Wednesday and wake up on Thursday morning with a program that's even snappier. The expected outcome should be faster compile times and faster binaries.
Optimizing compilers often have budgets so you can get different outcomes if performance jitter happens to mean that it bails from some optimization stratagies.
The budgets I've seen have less been "if it's taken N seconds, bail" and more "if we've processed X million instructions, bail." At least in the compilers I work on, nondeterminism is considered a bug, although it can be harder to avoid nondeterminism than you'd think.
>and expecting the optimizer to make a better decision next time. It doesn't work that way
But in practice it does work exactly that way because of PGO and if we're precluding PGO then the observation is as banal as "programs that have no side-effects are pure". Like I get that the post is trying to paint some beautiful picture about how compiler passes are abstract beautiful transformations of representations of programs but it's not a useful picture at all.
I don't remember using PGO like ever. And even if you do use it, isn't it like breaking compilation into two phases that are themselves deterministic?
It is pretty useful picture in my eyes, because that's what I've been observing since like forever. Non-deterministic compiler would be pretty hard to test or reason about.
It might be a banal observation, but an important one if you want to contrast a compiler with something like an OS kernel or most modern programming projects that interact heavily with outside world. When you throw hardware, network traffic, or users into the mix, it gets crazy.
> Like I get that the post is trying to paint some beautiful picture about how compiler passes are abstract beautiful transformations of representations of programs but it's not a useful picture at all.
It is actually a very useful picture especially to a beginner.
PGO is an edge case and not used a whole lot in practice. Many compilers do not support it at all, even production compilers. Someone learning to write a compiler does not need to think about PGO. And besides, a compiler with PGO is still a pure function of source + flags + profile.
>PGO is an edge case and not used a whole lot in practice. Many compilers do not support it at all, even production compilers.
Both clang and gcc support `-fprofile-generate`. Beyond that generic infrastructure, in my area (DL compilers) if you're not doing PGO/autotuning you're not serious.
>Someone learning to write a compiler does not need to think about PGO.
That's like saying someone learning to write a compiler doesn't need to think about optimization at all - sure maybe the first time. But then every time after that it should be at the forefront of your mind.
> Both clang and gcc support `-fprofile-generate`. Beyond that generic infrastructure, in my area (DL compilers) if you're not doing PGO/autotuning you're not serious.
How many binaries in a typical Linux distro are built with PGO for example? The answer is approximately zero.
Your point? Because your first goalpost was "most compilers don't even support PGO". And now the goalpost is Linux distros? How about this one - how many binaries at FB/G are built with PGO? Answer: a great deal (llvm/bolt is a FB project and MLGO is a G project).
My point is that the view of a compiler as a series of transformation passes from a source program to machine code, implemented as pure functions, is an entirely appropriate way to describe how they work in a random introductory blog post?
Congratulations on your recent PhD in quantum machine learning or whatever -- I hope you find some way to speed up FB/G's ad serving algorithms by 1% and land that promotion. I'm sure you're very smart.
There are multiple valid ways to compile a source with the same semantics, but a decent compiler should absolutely be deterministic. Imagine libraries which compile fine 99% of the time, but produce the wrong semantics every 1%:
- Bug in your program? Could be a dependency of a dependency built wrong, better do a full clean and rebuild.
- All of your pre-release tests passed? The code you shipped doesn't work, because you hit that unlucky 1%. Imagine how bad this would be back when software was released on CDs...
- Have 70 dependencies which are like this (not hard in the webscale era)? Now your CI needs to retry every build up to 10 times, because every build has a ~63.4% chance of failing. And there's still ~1% chance your CI fails anyways because you got 10 unlucky builds in a row.
I'm sure there are more stories of these kinds of issues...
> There are multiple valid ways to compile a source with the same semantics, but a decent compiler should absolutely be deterministic. Imagine libraries which compile fine 99% of the time, but produce the wrong semantics every 1%:
That's not the only kind of non-determinism, you know?
Lots of algorithms benefits from access to random bits, too.
A randomized algorithm which uses non-determinism but produces the same output is OK.
Also, psuedo-randomness is OK as long as the same program compiled in the same version of the compiler always gets the same seed.
But if the algorithm can produce different output causing the same program to compile to different assembly, that's a problem. Unless you can formally prove that every possible variation will have the same behavior, but formal methods + non-determinism is hard, so it usually isn't worth it.
If you are using something like a SAT solver to produce code, non determinism might be unavoidable. Like, say the solver is heavily multi threaded for perf reasons (these things are slow right?). Or I guess, any FPGA compiler would fit that mold also.
Or anything that compiles a corpus into a model, though we typically call that training rather than compilation :).
Deterministic parallelism has a non-trivial cost that can negate the perf you were hoping to gain by using multiple cores (or even GPU cores) in parallel.
So, the downvotes and replies warrants a follow-up comment.
I support deterministic compilers due to deterministic builds. For a given input source code, compiler executable, and command-line flags, it should always produce the same binary, regardless of the time of day, random number generator, I/O, threading, environment, etc.
My refutation of the article is that he made it sound like a compiler is like computing Fibonacci numbers - here are all the valid inputs, here is how each one maps to its one and only correct output answer, so fill in the code to accomplish that.
Also in the previous paragraph he wrote:
> Because the most basic compiler architecture is standard, you have limited boundaries and degrees of freedom when designing one. This might sound restricting, but actually, it's freeing because you get to play in a predictable sandbox.
Or unless you iterate over randomized hashtables to produce some output, e.g. function sections in an executable. Though that doesn't change the semantics and is basically rearranging coins in your pocket.
The real discovery to me seems that you are actually supposed to read through the errors, and that it's ok to not understand them completely. As a TA I got to see huge jumps in productivity and independence on students that read a couple more lines and reasoned instead of bailing out at the first ugly line.