> There is so much focus on the machine learning algorithms rather than getting data ready for the algorithms.
Generally, once a problem at work has come to the point of being a "kaggle problem", it's trivially easy. The main problem is unstructured data, with infinite ways of specifying similar ways to measure the same attribute, and lots of leeway to build an unmaintainable data pipeline between the data generation process and the model at the end.
I disagree that a "kaggle problem" style problem is trivially easy, but I strongly agree with the sentiment that dealing with unstructured data is often a much bigger, deeper, and broader problem than the choice of a particular algorithm or ensemble of them.
The ability to efficiently and effectively derive insights from such data is scarce.
Right, by "kaggle problem" I mean the general case where we roughly know what we're going to want to have on the right hand side of the model we're going to run (plus or minus some feature engineering, model choice and other hyperparameter specification, etc.)
Generally, once a problem at work has come to the point of being a "kaggle problem", it's trivially easy. The main problem is unstructured data, with infinite ways of specifying similar ways to measure the same attribute, and lots of leeway to build an unmaintainable data pipeline between the data generation process and the model at the end.