>*NOTE: there are people in the world who would laugh at my definition and say t...

PeterisP · on Jan 24, 2015

From the technology standpoint, where it comes from and how it's used doesn't really make a difference - if it can be processed on a single very beefy machine when done properly, then the appropriate/efficient way to work with this data is by avoiding big data techniques.

If it cannot, then you pay the price of all the complexity and overheads of big data processing techniques so that you can get your processing done.

It's correlated with data size, bot not so strictly - you can get, for example, NLP processing problems where you need a painful pipeline split over a huge cluster for a single gb of input data, and you can have problems where the best way to process a petabyte dataset is just to stick a single powerful machine to get the performance benefits of locality and low latency, and avoid managing splits/failed nodes/whatever.

So, in the first problem you would need to use Big Data techniques and the second problem you don't, it's not related to big data and the recommendations on how best to do that won't help people who need to do big data processing.

gaius · on Jan 24, 2015

So some other definition of "big" than umm "big" then?

calinet6 · on Jan 24, 2015

Yes, absolutely. When people talk about big data, more often than not it's a measure of complexity and difficulty, not size.

saalweachter · on Jan 24, 2015

Yeah, for everyone but physicists it's really "big enough" data: it's a big enough data set that you've started recording things you didn't even try to record.

An excellent example was on HN the other day, using the NYC taxi data to determine which drivers are observant Muslims. It's not something anyone set out to record, but the data set has gotten so large that if you turn it sideways and shake, random facts like that fall out.

gaius · on Jan 25, 2015

Do you think that when people make 1000+ table relational databases it's because a) it's fun b) they're stupid or c) because it's modelling something that is inherently complex?

Big data is neither big nor particularly complex.

sp332 · on Jan 24, 2015

I have seen several news articles using it that way. I think that's going to become an alternate definition soon.