It always amazes me that so many "famous" people still take time to read HN and respond to questions (even insulting ones)! I used Antlr4 in the past and have to say it's a well-designed system, though at the time (2016) the Python runtime which I was most interested in was not very stable.
I've implemented several parser generators myself (nothing as polished as Antlr) and worked on a "rapid prototyping" parser generator tool a while ago (https://github.com/adewes/parsejoy). The idea was a tool that could creates parsers on-the-fly without requiring compilation of any code, and let the user control and modify parsing logic using Lua if necessary. I started writing it in Go but switched to C++ later in order to get more predictable performance.
In the tool I implemented several parsing strategies, notably "parsing expression grammars (PEGs)" as well as a GLR parser (based on the Elkhound paper). And while I find GLR parsing powerful it's not without problems as e.g. errors are hard to debug. That said, it's pretty amazing that you can throw almost any grammar at a GLR parser it and it will (usually) be able to parse it. As you said though writing a "bad" grammar can yield exponential time complexity on some input strings.
PEGs are also nice in general but require you to put more work into the grammar as you are basically required to resolve ambiguities yourself using the "and" or "not" operators. Also, a naive implementation of a PEG parser will have horrid performance as it basically evaluates all possible alternatives via backtracking until it hits a valid one, and for a real-world languages (I've implemented e.g. a Python parser for testing) this will ruin your performance due to the large nesting depth of the rules (Python has around 8-12 levels of rules I think). Using a prefix tree can alleviate this though, as can packrat parsing, which comes with its own cost though as remembering parse results requires memory allocation.
Anyway, in terms of popularity and number of actually implemented grammars, Antlr4 is probably by far the most relevant OS parser generator out there. Thanks for writing it :)
One suggestion I have is to consider improving the tooling for tokenization, as it's often a problem that can be quite challenging in itself. For example, tokenizing Python code is not easy as it's not context-free (due to the indentation that indicates the nesting) and the presence of several types of newline characters (those that occur inside bracketed expressions vs. those that occur outside of them)
Howdy. Agreed. It'd be nice to have a simpler "match x if NOT followed by y" then and something to handle context-sensitive lexical stuff like Python. I often just send all char to the parser and do scannerless parsing. :)
I've implemented several parser generators myself (nothing as polished as Antlr) and worked on a "rapid prototyping" parser generator tool a while ago (https://github.com/adewes/parsejoy). The idea was a tool that could creates parsers on-the-fly without requiring compilation of any code, and let the user control and modify parsing logic using Lua if necessary. I started writing it in Go but switched to C++ later in order to get more predictable performance.
In the tool I implemented several parsing strategies, notably "parsing expression grammars (PEGs)" as well as a GLR parser (based on the Elkhound paper). And while I find GLR parsing powerful it's not without problems as e.g. errors are hard to debug. That said, it's pretty amazing that you can throw almost any grammar at a GLR parser it and it will (usually) be able to parse it. As you said though writing a "bad" grammar can yield exponential time complexity on some input strings.
PEGs are also nice in general but require you to put more work into the grammar as you are basically required to resolve ambiguities yourself using the "and" or "not" operators. Also, a naive implementation of a PEG parser will have horrid performance as it basically evaluates all possible alternatives via backtracking until it hits a valid one, and for a real-world languages (I've implemented e.g. a Python parser for testing) this will ruin your performance due to the large nesting depth of the rules (Python has around 8-12 levels of rules I think). Using a prefix tree can alleviate this though, as can packrat parsing, which comes with its own cost though as remembering parse results requires memory allocation.
Anyway, in terms of popularity and number of actually implemented grammars, Antlr4 is probably by far the most relevant OS parser generator out there. Thanks for writing it :)
One suggestion I have is to consider improving the tooling for tokenization, as it's often a problem that can be quite challenging in itself. For example, tokenizing Python code is not easy as it's not context-free (due to the indentation that indicates the nesting) and the presence of several types of newline characters (those that occur inside bracketed expressions vs. those that occur outside of them)