Project Parsing/Lexing Ratio
----------------------------------------
jdk8 5.34x
Spring Framework 3.81x
Elasticsearch 5.76x
RxJava 5.69x
JUnit4 4.51x
Guava 5.37x
Log4j 2.92x
Obviously this is just one toolkit, one language, and one set of examples. But in these benchmarks, the tokenization (lexing) time is a small fraction of the total parse time.
Well, "small" is relative. It's certainly not a bottleneck, but it's still between 17.4% to 34.2% of the total time. That's definitely still in range where optimizing could have a measurable impact on performance difference (depending on how much room there's left for optimization).
Right, but it's clearly not the situation I was describing where cutting the parse time to zero for the stages after the tokenization stage would only slightly decrease total time.