Saving money definitely, but also delaying having to scale the hardware as long as possible. I'm at 3.5 TiB NVMe (raid 1), and it would cost another $52 USD/mo to add an additional raid of 1.5 TiB NVMe @ Hetzner, not cool. Generally I'm seeing around 6:1 compression, so going from 3.5 TiB to 21 TiB is a big deal for me.
I'm manually doing zlib compression for large text columns when there's an obvious opportunity, basically DIY toast [1]. Doing that allowed one SQLite DB to go from about 205 GiB to 35 GiB. And I haven't really felt any performance impact when working with the data; but definitely feel the coding overhead. And there's still so much missed opportunity for compression.
Largest RocksDB is +1 GiB/day (poorly tuned with zlib compression). I just couldn't use SQLite for that one, lots of small rows, but they compress extremely well. I never wrapped up the compression experiments on that, but look at some rough notes snappy was 430G, and lz4 level 6 was 86G. Unfortunately using RocksDB has made coding more difficult.
I think one day I'm just going to snap and build a ZipFS-like extension [2], until them I'm just trying to keep an eye out, and putting out this call for help. :3
For my bachelor thesis, I used Postgres on a compressed btrfs partition. For my text-heavy dataset, this gave excellent results without compromising on ergonomics.
As the implementation is block-based it is also faster than the naive approach of just zipping your data files.
That is really cool, I tried btrfs a couple times and I had weirdness, sorry for being unspecific, not trying to do a drive by hit job on btrfs, but I moved on.
That said, for a specific case like this, I think this could be really compelling, esp going from compressed source material, to a postgres instance running on loopback block device with btrfs on top, could be really amazing for text analysis like you said. You could actually see performance improvements due to this. 4GB/s for PCIe 4, then due to massive core counts and large L3 caches, you could probably boost that to 20GB/s after decompression.
If you can use compression algorithms that allow for regex search in the compression domain, it could be an effective search speed of 10s to hundreds of GB/s.
I believe compression is still a work in progress in duckdb. My current use is with parquet files which compress well but don't lend themselves to updates like a db.
DuckDB is a column based db, so you are going to see a throughput increase for queries that only use a handful of columns.