Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> i worry that caching pre-processed files is a red herring - its it really faster than re-including?

For future reference: In large C++ projects, it's not at all unusual for greater than 90% of the compile time to be spent parsing (and re-parsing, and re-parsing, and re-parsing, ad nauseam) header files.

> what about macros in include files?

The AST of the header is persisted. If the parser can parse macro definitions, macros will continue to work normally. The only case where you would need to fall back to #include is those rare times in which you want defining something in the source file to alter the parsing of the header (and even in most of those, you should just define the constant in the call to the compiler eg) "clang foo.c -DWITH_FEATURE_X")

> i feel that the preprocessor ultimately ends up with the same amount of work, just an extra pass for each included header to build a version to be cached… not to mention the complexity required to handle the multiplicity of pre-processor states required for this. maybe i am being dim and missing the obvious.

The pre-processor merely slurps the text of the included file into the including file; the compiler then parses the entire gigantic soup of <text of source file> + <text of all files included by source file>. The semantics of this require that every included header be re-parsed once per compilation unit. Imagine you have 5 .cpp files, each containing 200 characters, and each #including iostreams (which weighs in at roughly 1 million characters). Each .cpp file, post-preprocessor phase, will be 1 million, 200 characters long. A full compilation of the project will require the parsing of 5 million, 1 thousand characters. Any subsequent full build will require parsing the full 5 million, 1 thousand characters. Changing one .cpp file will result in the need to parse 1 million, 200 characters.

In this proposal, by contrast, an included file need only ever be parsed once; its AST can then be persisted and referenced eternally. In our above example, the iostreams header will be parsed once, and each .cpp file will be parsed once. This means a full build, the very first time iostreams is ever referenced in any compilation on the system, will require parsing 1 million, 1 thousand characters. Any subsequent full build will require parsing merely 1 thousand characters. Changing one .cpp file will require parsing merely 200 characters.



you have completely missed my point. i know full well how much time is spent parsing these things and how the mechanism works - i don't believe this proposal will actually improve that. i also don't believe your answers address the point i was trying to make either... namely that whatever preprocessed import module thing is created, it still has to be included into the compilation unit somehow... even if there is some kind of linkage type solution going on with a lightweight interface - that feels functionally equivalent to what most of the standard library headers already are - so i don't understand what could possibly be that much faster or better about it.

not to mention that this is not a problem if you encapsulate your use of standard libraries properly... maybe 10-20 compilation units have to use it if you like to split your stuff into files a lot.

standard headers are poorly written/designed by including so much crap everywhere. why can't i have specific - per function headers which include minimal stuff?

fix the headers, not the preprocessor.


> i know full well how much time is spent parsing these things and how the mechanism works - i don't believe this proposal will actually improve that.

Well, then you're pretty much 100% wrong in most C++ projects.

I honestly don't know what to tell you here. That persisting header ASTs between translation units is faster than re-parsing should be trivially obvious, and if it isn't trivially obvious, then the mere fact that precompiled headers and ccache dramatically speeds up builds ought to make it empirically obvious.

The facts just aren't on your side.

> namely that whatever preprocessed import module thing is created, it still has to be included into the compilation unit somehow

Well, yes, obviously. In the current model the compiler slurps the header into the source file, and parse the entire combination, resulting in the parse tree of the header + the parse tree of the rest of the file. In the proposed model the compiler pulls the parse tree of the header out of cache and just builds the parse tree of the file. Since in C++ header parse trees are often quite expensive to build (since template declarations have to live in the headers and their parse trees are incredibly expensive to build), this ought to be a blindingly obvious win.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: