Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I found some benchmarks in the ANTLR project: https://github.com/antlr/grammars-v4/blob/master/java/java/B...

  Project          Parsing/Lexing Ratio
  ----------------------------------------
  jdk8               5.34x
  Spring Framework   3.81x
  Elasticsearch      5.76x
  RxJava             5.69x
  JUnit4             4.51x
  Guava              5.37x
  Log4j              2.92x
Obviously this is just one toolkit, one language, and one set of examples. But in these benchmarks, the tokenization (lexing) time is a small fraction of the total parse time.


Well, "small" is relative. It's certainly not a bottleneck, but it's still between 17.4% to 34.2% of the total time. That's definitely still in range where optimizing could have a measurable impact on performance difference (depending on how much room there's left for optimization).

100/2.92 = 34.2%

100/5.76 = 17.4%


Right, but it's clearly not the situation I was describing where cutting the parse time to zero for the stages after the tokenization stage would only slightly decrease total time.


Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: