I’ve used LZ4 and Snappy in production for compressing cache/mq payloads. This is on a service serving billions of clicks in a day. So far very happy with the results, I know zstd requires more CPU than LZ4 or snappy on average but has someone used it under heavy traffic loads on web services. I am really interested trying it out but at the same time held back by “don’t fix it if it ain’t broken”.
Use Lz4 where latency matters, Zstd if you can afford some CPU.
I have a server that spools off the entire New York stock and options market every day, plus Chicago futures, using Lz4. But when we copy to archive, we recompress it with Zstd, in parallel using all the cores that were tied up all day.
There is not much size benefit to more than compression level 3: I would never use more than 6. And, there's not much CPU benefit for less than 1, even though it will go into negative numbers; switch to Lz4 instead.
Maybe. The thing is; zstd is quite close, and unlike lz4, zstd has a broad curve of supported speed/time tradeoffs. Unless you're huge and engineering effort is essentially free or at least the microoptimization for one specific ratio is worth the tradeoff - you may be better off choosing the solution that's less opinionated about the settings. If it then turns out that you care mostly about decompression speed + compression ratio and a little less about compression speed, it's trivial to go there. Or maybe it turns out you only sometimes need the speed, but usually can afford spending a little more CPU time - so you default to higher compression ratios, but under load use lower ones (there's even a streaming mode built-in that does this for you for large streams). Or maybe your dataset is friendly to the parallization options, and zstd actually outperforms lz4.
If you know your use case well and are sure the situation won't change (or don't mind swapping compression algorithms when they do), then lz4 still has a solid niche, especially where compression speed matters more than decompression speed. But in many if not most cases I'd say it's probably a kind of premature optimization at this point, even if you think you're close to lz4's sweet spot.