I understand the allure of an objective way to evaluate the "quality" of your code... but that seems ridiculously naive. I'm pretty confident I could come up with something that uses features in the spec that nobody ever implemented, so it would be fully validated and correct and yet totally nonfunctional for actual users.