Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tend to think of them as databases where rows are an aggregate count of events over a period of time.

An overly simplistic example being an API that captures upvotes. Instead of storing the individual upvotes, you incoming requests in a queue and only write out a count of upvotes over, say 1 minute. That way, if you want to get a count over a larger period of time, you're setting a ceiling on the number of records involved in your operation. If you have 1 minute resolution and you're looking over a year of data, you're adding at most, 6024366 records for each post. This is really handy for analytics data where you don't necessarily care about the individual records.

The project I was specifically referencing was one that captured data streams from sensors like accelerometers and thermometers, which is inherently time-series data, because it's literally just (timestamp, sensor-reading). But to make use of that, you need a library of tools that understand that the underlying data is a time-series to do things like smoothing data, or highlight key events. For example, a torque spike had a "signature" which involved looking at the difference over time periods at a particular resolution. But a temperature spike would look different. Etc.



I see, thanks for the explanation. Is aggregation in general a required feature of a time-series database? Or specifically the ability to calculate aggregates continuously over changing (incoming?) data?

Do time series databases store the raw underlying data for the aggregates or just enough to calculate the next window?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: