Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a great lesson in "knowing your data" vs. "creating a model". The validity of the generated model is tainted by the GIGO principle. Said another way, modeling bad data will get you bad models.

I see this shockingly often in my professional life. A data scientist will spend days, weeks, or months building a "perfect" model which replicates unclean, biased, or bad data. And when they are done, it gets thrown away because it cannot solve any real world problem.



[flagged]


I thought I was pretty clear. The data is garbage, ergo the model is garbage.


This kind of analysis sells because it is like a Rorschach test. People will see a book they know and have some feelings and think that gwern and his algorithm felt it too.

The creation of multiple lists is a good bet on his part because it has more ways to win, lacking an objective criterion for success.


If by "sell" you mean "create a model no one in their right mind would pay for", then sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: