Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Faster: Fast persistent recoverable log and key-value store (github.com/microsoft)
94 points by LAC-Tech on Feb 25, 2024 | hide | past | favorite | 54 comments


I wish people would push back against using a generic adjective as a name. Naming something "Faster" is just trying to get someone to remember by being intentionally confusing.


What! Nonsense.


Well you do make a compelling argument.


I think I know what they're getting at, but "can quickly saturate disk bandwidth" doesn't exactly sound like a selling point, on its own!


Reminds me of a Russian joke.

A secretary applies for a job; the interviewer asks her: "In your CV you claim that you can type 1000 characters per minute - for real?!" "Yes!", she replies, then adds in a low voice: "but such nonsense comes out..."


SSD have very good bandwidth, in excessive of 10 GB/s.

The bottleneck is often at the CPU. For a 10Ghz CPU, it can spend 1 cycle to process 1 byte. That’s the scale we’re at now.

https://www.tomshardware.com/features/ssd-benchmarks-hierarc...


How is that not a selling point? You want the disk to be the bottleneck


Not necessarily. For example, an uncompressed log will saturate disk more easily than a compressed log but if compression is fast enough the compressed log will write more data in the same amount of time.

A more complex case: a column store might write in batches. Later an insert in the middle might require the entire batch to be read from disk and then rewritten. This makes queries faster later on but at the cost of more disk io up front. In this case disk bandwidth is also saturated but write performance might be worse than an append-only log that does not optimize at all for reads/queries.


Those are good points. Perhaps a better way to phrase it is you want the system to be able to utilize as much disk bandwidth as you give it.


Related:

Faster A fast concurrent persistent key-value store and log, in C# and C++ - https://news.ycombinator.com/item?id=25741670 - Jan 2021 (8 comments)

Faster – Fast key-value store from Microsoft Research - https://news.ycombinator.com/item?id=17785002 - Aug 2018 (76 comments)

Faster – A key-value store for large state management - https://news.ycombinator.com/item?id=17267403 - June 2018 (34 comments)


FASTER is awesome! I just wish the C version was as feature complete as the C# version. I know how to use FFI to call C stuff, but I have no idea how to call C# stuff. It's the only reason I haven't used it in real life.


Which one have you used, the kv store or log?

Reason I submitted this because I'm curious about peoples real world experience.


I was looking to use both. Actually. I discovered FASTER when looking to port durable functions to php (it’s called durable-php if you want to google it, though the implementation is nothing like it) and the netherite engine uses faster.

It’s perfect for my use-case, more so now than when I originally researched it as a possibility. Back then, I didn’t even have threading solved for php. Now that’s all a solved problem (threads ftw) and I’m refactoring log storage now to better support things like faster.


Have you considered Meta's RocksDB as an option?


I wrote a time-traveling database (where you can query a table/row as of a specific point in time and join it to data at another point in time; we used this for AI training to predict future behavior in users) completely from scratch (that was the coolest work project ever, btw) that was built on Hadoop/Hbase. I understand RocksDB is fairly similar ... however, I want to stay as far away from any of those kinds of APIs. I have scars from dealing with hbase and writing query planners and figuring out how to do performant joins in a white-room type environment. No. Thank. You.

It was fun at the time, but I don't want to go near it ever again.


[RocksDB](https://rocksdb.org/) isn’t a distributed storage system, fwiw. It’s an embedded KV engine similar to LevelDB, LMDB, or really sqlite (though that’s full SQL, not just KV)


Yes, it's based on the same paper as hbase, IIRC.


To be perhaps overly detailed: Hbase is an open source approximation of bigtable. Bigtable _uses_ leveldb as its per-shard local storage mechanism; Rocks is a clone+extension of leveldb.

Bigtable and hbase are higher level and provide functionality across shards and machines. Level and rocks are building blocks that provide a log-structured merge tree storage and retrieval mechanism.


> Bigtable _uses_ leveldb as its per-shard local storage mechanism

Ah, that's probably what I'm conflating with it then.

Thanks for the information.


You could, in theory, export certain methods with [UnmanagedCallersOnly] and AOT* compile it - those become plain C exports. Alternatively, you can host .NET runtime within C++ process and call arbitrary methods from the loaded assemblies. Or you could not deal with all that and just use C# :) (which comes with an advantage of not using something worse)

* - I don't know if it uses anything requiring JIT but likely no or limited to certain features.


I’ve specifically gone hunting for this documentation and never found it. Thanks for the tip!

I love C#, but I’ve been in PHP/Go land for awhile now. PHP is an interesting language, I’d have never thought I’d like it as much as I do.

It’d also be cool to see a Go implementation, but there’s also cgo.


:(

But on occasion you go the first route, the docs are here:

- https://learn.microsoft.com/en-us/dotnet/core/deploying/nati... +adjacent sections

- https://learn.microsoft.com/en-us/dotnet/api/system.runtime....

This pretty much comes down to writing glue exports the way you would do in Rust and then 'dotnet publish'ing it as .dll/.so (you could also produce .lib/.a for static linking but it is trickier).

Overall, I feel like the ungodly amount of projects of this kind (albeit of lower complexity) written in Go (which is a weaker platform) could have benefitted from using C# instead - it has zero-cost abstractions through struct generics like Rust and allows expressing complex data structures in a terser way.


I think a lot of what go has going for it is in static compilation. Things just work. The C# community leans a lot further from open source (paid libraries vs. free).


Go receives so much unjustified good will it is almost unbelievable...

Either way, if you do look at current state of .NET ecosystem, you may get surprised. But I guess, such is the perception of the public that may have read a bit too much into Go's promises (it would have abysmal performance should FASTER have been written in Go) - C#, seen as less popular weird Java, is now an underdog after all (look at Github LOC statistics).


There are good reasons to give Go good will:

Simplicity: Editor, GitHub, golang download. I have a working dev env for a LOT of use cases. Sudo apt install vim git wget tar. wget golang, untar, set env var. I have a script to do it for me on Debian boxes. Python, ruby, php are pretty close. IM guessing C# is a bit more complicated but not by much.

Dependency and library management. GO wins vs python(venvs), ruby, node, php... go mod and how it deals with pinning, pulling from GitHub. Again I dont know what C# has here but go feels both magical, and easy to understand on this front.

go build / go run. The fact that is this easy and fast to get to a running binary is impressive. I had a badly behaving container the other day and the residents of it were not giving back helpful errors. One go program (sub 100 lines) later I was getting usable error messages and quickly worked through the network issues. There are plenty of go apps that work like this! Mediamtx is great (RPI cam server) just grab the binary blob and go... The same thing in python is gonna be a lot more complicated.

Testing: A friend of mine and I recently started a project together and his commentary after coming from working on large ruby and node projects is "how is this testing this fast". GO is eating its own dogfood here with its concurrency model. IM guessing that, C# can run the same way.

Golangs good will isnt because it is the best at something. If were comparing features go is gonna come in 2nd or 3rd or 4th every time. Thats the thing, go is consistently very good at feature "X". It's not the best but it remains in the top 5. Speed, concurrency, compile time, ease of setup, ease of deployment, portability, scaling...

Golang is the Toyota of programing languages. Lovable in its reliability.


Why do you bother commenting if you don't have any experience in the other language?

With C#, you literally just `apt install` the cli and it keeps up to date. I write go every weekend, and spent years in C#. I still don't know how to properly install go and I have to run it in docker containers. It's so undocumented (or rather the documentation was lacking at least a few years ago and I've not bothered checking because my workflow works for me), that I got lost as a new go dev.

As far as building and running ... C# is on par with Go. There's really not much difference, at least from the cli.

Editors ... I dunno. I pay for a visual studio license and the IDE is simply magical (esp with resharper). I use goland for go, and these two IDEs are barely comparable in some respects.

The testing story in C# is something to be desired. I would rather build a random project to test my code than write tests in C#. It's a mess over there.

But yeah, if I were to start a project completely from scratch, I'd choose PHP over either of them, so maybe I don't know what I'm talking about.

I'll see myself out.


>>>> It's so undocumented (or rather the documentation was lacking at least a few years ago and I've not bothered checking because my workflow works for me)

Massive improvements from the days of early go. Between ease of install (Mac, Linux, haven't looked at windows in 2 ish years) and how it deals with packaging (go mod) there has been a ton of progress here.

I have not used C# in at least a decade. My knowledge is dated! (python, ruby, node, c, and rust are all much more recent). It wasnt a bad language at the time but was very MS centric. And installing it on linux is a bit more complicated than "apt install" sadly.

As someone who wrote PHP for years (and it paid me well) I would say that you should take a look at installing go "from scratch" on your local system ( https://go.dev/doc/install ) and building a few throw away tools!


If you haven't used C# in a decade you missed that .NET Framework has been completely rewritten as open source .NET. To install, it's literally just:

apt-get install dotnet-sdk-8.0


and then getting helloworld to run is literally

    dotnet new console -o MyConsole && cd MyConsole && dotnet run
(dotnet run uses debug build by default, it's very similar to cargo)

as for package management, it is

    dotnet add package {name}
by far one of the best ones, and when you need to manually edit .csproj, it's as easy as cargo.toml.


> Go receives so much unjustified good will it is almost unbelievable...

Yes. Yes it does and it really is annoying. I can't tell if it is the language or just people not deeply understand the abstractions it provides because it doesn't use sane defaults (or the defaults are geared towards high-throughput, google scale nonsense).

I help out a lot on FrankenPHP, and lately I've been digging into a weird bug, deep in Go, that causes FIN_ACK packets to get delayed by hundreds and hundreds of ms. There are so many layers of abstractions (nearly 100) to dig through. I know for a fact in the C# there are less than 10 to the CLR, then after you can rule out the language, you are just tracing IR. But no, I'm digging through hand-written asm, hunting for a bug or at least figure out how to report it with enough information that it doesn't sound like "my email can't be sent more than 500 miles" (google that one btw).


> I've been digging into a weird bug, deep in Go, that causes FIN_ACK packets to get delayed by hundreds and hundreds of ms.

Just going to put it out there: have you considered the problem might actually be [Nagle’s Algorithm](https://en.wikipedia.org/wiki/Nagle's_algorithm)? This algorithm is the source of 10s of thousands of hours of wasted debugging time of “weird bugs related to latency of networking.” Even the “greats” have wasted time debugging something that ended up being “I forgot about Nagle.”


Yes. Go doesn't even offer the ability to use Nagle's algorithm:

https://withinboredom.info/2022/12/29/golang-is-evil-on-shit...

Nagle's algorithm is dealing with connections. These are FIN_ACK. Meaning the connection was told to close and we need to ack the closed connection.


Of course it does: just call TCPConn.SetNoDelay(false).

See https://github.com/golang/go/blob/master/src/net/tcpsock.go#...


You have to get access to the TCP connection first ... which, once you get more than a few layers above that, it is impossible to get to.


We've arrived at a weird spot.

There's a lot of very good software written in Go now. Much of that software could benefit (performance wise, at least, leaving ergonomics aside due to subjectivity) from being written in C# instead.

But does Go deserve some credit for being associated with those projects, or was it marketing and the Google effect?

.net AOT is still quite young, so maybe the tide could turn somewhat.


I think language simplicity is a blind spot for most developers. Programming languages shouldn't be judged just based on their performance (otherwise assembly language would be #1) but also how simple they are. The simpler the better.

Go is simpler than C# and that's giving it the advantage over C#.


That’s what makes it worse. Go was designed, in a way, as a toy language to solve the assault on the codebase quality by all the fresh graduates Google was hiring. I’m not joking - this is paraphrasing Rob Pike. Go is a language that wastes my time.


As the GP said:

> There's a lot of very good software written in Go now.

I would further argue: most of this “very good” Go software wouldn’t have been written at all, if it were required to be written in a more complex language. The companies involved would never have made it happen.

That’s not an indictment of complex languages, though. It’s an indictment of those companies! Specifically, I would argue that the companies involved in producing this software, have no idea how to train developers in use of, and “style taste” for, complex languages. Their other business practices, around hiring, corporate culture, etc. do not enable the (long-term) use of complex languages.

I say this as an Elixir dev (though I also write Go, Java, C#, Rust, and several other languages in the course of my day-to-day work.) Elixir stands at the pinnacle of a particular “saying you know the language is shorthand for saying you’ve thoroughly studied a particular problem domain through the lens of the language” spectrum. And there’s nothing wrong with that!

But I say that there nothing wrong with that, because I personally believe in training people to use complex languages that provide primitives, frameworks, and towers of stdlib abstractions, that all make working in particular domains intuitive.

AFAIK no bigcorp does — or ever really has. They find it too hard to retain the senior talent that would train the junior talent, I think. Easier to just hire the junior talent, and throw them at a language with an expressively ceiling, “exactly one way to do it”, and enforced guardrails around everything.

And as long as bigcorps think that way… languages like Go have their niche.


Software developers tend to be far more productive and write less bug filled code in "toy" languages. Progamming language complexity is usually just plain bad for developers at any skill level.


More sophisticated design allows you to richly represent a certain problem and offer idiomatic way of solving it rather than having you do extra 200 LOC of boilerplate like open-coded loops and if err != nils. Not even mentioning a dozen DSLs people keep inventing in Go ecosystem - something that is usually a sign of language weakness, similar to Ruby.


Generally speaking, more sophisticated designs perform worse both on time to implement and on code correctness.

That's just the way it usually shakes out.

DSLs are known in academic literature as 4th generation languages (whereas Go/C# would only be 3rd generation languages), they are really good thing as long as you aren't the person implementing them.


There's also cgo. Say what you will, but an easy-to-use way to reach into highly performant libraries and existing code was smart from a language design perspective. But yeah, other than that, I agree with you.


Is this actually used anywhere by MS? Or is it just random research project (MSR?)


I'm curious about that too, appareantly it was created for their SimpleStore research project.

https://www.microsoft.com/en-us/research/project/simplestore...


Durable Functions uses it in Netherite, which is how I originally discovered this library.


I really like the idea of tiering a log device such that you go from memory -> nvme -> object storage. You get some nice properties like fast low latency commit times from NVMe, read your own local writes (usually good enough), but have most of your cold data on cheaper durable object storage with some intelligence to pull down/warm up pages when you suspect you'll need them.

It'd be nice if this had a more language-agnostic frontend like gRPC or something.


How does this compare with RocksDB?

Also, are there any performance benchmarks?


There’s a paper that does some benchmarking against other options here: https://www.microsoft.com/en-us/research/uploads/prod/2018/0...


RocksDB supports point query and range query. This only supports point query. Also I'm not sure whether FASTER supports transaction, as the paper didn't mention it.


My notes from 2018 say FASTER has no multi-key transactions.


Looks like it's exponentially faster than RocksDB, but in most use cases the bottleneck would be the network and available sockets on the machine running FASTER. Unless you're doing high throughput embedded data ingestion. Maybe Neuralink could use this.


How does it compare to FoundationDB? Nether the paper nor the GitHub page mentioned it.


It would be a bit of an apples-to-oranges comparison. FoundationDB is a distributed KV database that supports range queries. FASTER is an embedded KV store that only supports point lookups. The use cases for each are rather different.


Are there Python bindings? I couldn't find any.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: