Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Saying big data is data too large to process on a single machine purposefully leaves out the spec of the machine.

That's because a reasonably sized machine from today is much larger than one from five years ago. And an unreasonably large machine today is also larger but yet more achievable.

A basic dual Epyc system can have 128 cores, and 2TB of ram. Someone mentioned 24 TB of ram, which is probably not a two socket system.

You can do a lot with 2TB of ram.



And there are still some use cases beyond the single machine: eg CERN.

But I think it's quite safe to say that it's not often because you need to process so much data, but rather that your experiment is a fire hose of data, and you're not sure what you want to keep, and what you can summarize - until after you've looked at the data.

And there might be a reason to keep an archive of the raw data as well.

Another common use case would be seismic data from geological/oil surveys.

But "human generated" data, where you're doing some kind of precise, high value recording, like click streams, card transactions etc might be "dense", but usually quite small compared to such "real world sampling".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: