Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Kaggle competitions rarely produce interesting algorithmic results.

But I highly encourage you to read the winners' solutions. They are full of clever data insight, augmentations, regularizations, feature engineering, and preprocessing and postprocessing tricks.

But above all, compared to the academic literature, it's shocking how much time and creativity they spend on validation. Maybe I'm reading the wrong papers, but the flashy new neural architectures rarely even mention their validation setup; Kaggle winners sometimes devote half of their explanation to it. It's part of their secret sauce.

Two personal favorites:

(1) https://www.kaggle.com/c/severstal-steel-defect-detection/di.... The "random defect blackout" was a really clever data augmentation.

(2) https://www.kaggle.com/c/ieee-fraud-detection/discussion/111.... Particularly how they reduced overfitting with adverserial validation. They trained a separate model to distinguish between train and test sets, and then dropped features that ranked highly in feature importance on that model. That's probably a well-known technique in some circles, but I had never seen anything like it before.



> But I highly encourage you to read the winners' solutions. They are full of clever data insight, augmentations, regularizations, feature engineering, and preprocessing and postprocessing tricks.

> But above all, compared to the academic literature, it's shocking how much time and creativity they spend on validation. Maybe I'm reading the wrong papers, but the flashy new neural architectures rarely even mention their validation setup; Kaggle winners sometimes devote half of their explanation to it

I agree, but in the end it is a competition, and the solution that scores the most is not always the solution that is "the most interesting" (or practical, or best in real world cases)

Though the details you mention are interesting, and can definitely apply at real-life solutions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: