Saying big data is data too large to process on a single machine purposefully le...

e12e · on Nov 12, 2019

And there are still some use cases beyond the single machine: eg CERN.

But I think it's quite safe to say that it's not often because you need to process so much data, but rather that your experiment is a fire hose of data, and you're not sure what you want to keep, and what you can summarize - until after you've looked at the data.

And there might be a reason to keep an archive of the raw data as well.

Another common use case would be seismic data from geological/oil surveys.

But "human generated" data, where you're doing some kind of precise, high value recording, like click streams, card transactions etc might be "dense", but usually quite small compared to such "real world sampling".