Yes, exactly. Which is why we switched from Go to Python (using twisted), and dr...

robbles · on March 20, 2016

Why doesn't Elixir/Erlang have this problem too?

It's using green threads as well, with 1 "process" per connection, right? Is it not the same kind of scheduling?

gpderetta · on March 20, 2016

I believe erlang can only block at the top level. This means it doesn't need to keep a stack per thread around but only enough space for a single activation frame.

codesushi42 · on March 20, 2016

I'm guessing some kind of multiplexing of the sockets is being used in the Erlang case. There's no green thread per every connection.

I don't understand what the grandparent is saying though. Yeah, if you create a thread for every request, then you're going to be killed by the memory overhead. This isn't unique to Go, and I remember it being a problem when many noob Java programmers would spawn a thread per request before the introduction of java.nio.

The downside with Twisted, Tornado, or whatever in Python is your code isn't parallelized. It's concurrent, yes, but you aren't taking advantage of your multiple cores without forking another Python process due to GIL.

Go, Scala, Java etc. are truly multi-threaded and compile to native code from commandline or via JIT. Saying your performance advantage is due to switching from Go to Python is a spurious claim. You weren't doing it right.

strmpnk · on March 20, 2016

For most scaled up examples in Erlang and Elixir, there is a BEAM process (green thread) per connection as well as others (usually arranged as a supervised pool so one can avoid an acceptor bottleneck). There are a few reasons it does better but the biggest is a carefully tuned SMP scheduling mechanism and aggressive preemption policies. Some of these choices actually hurt throughput in favor of fairness and latency. All in the name of reliability over speed.

codesushi42 · on March 21, 2016

That's helpful. Thanks.

zzzcpan · on March 19, 2016

Yeah, compared to that Go also does some extra locking, extra syscalls to wake up event loop if channels are used, etc. So for this kind of load I would expect Go to be slower than any single threaded event loop.

gpderetta · on March 20, 2016

I know nothing about the implementation of Go scheduler (or go in general, really), but there is in principle no reason why it would require additional locking or syscalls compared to a plain event based design. Is it just a limitation of the current implementation?

zzzcpan · on March 20, 2016

It is both, the implementation and all of the ideas behind it.

These should give you some idea on what is going on there:

https://golang.org/src/net/fd_mutex.go https://golang.org/src/net/fd_unix.go#L237 https://golang.org/src/syscall/exec_unix.go#L17

gpderetta · on March 20, 2016

Oh $DEITY. That can't be possibly right. Are they serialising all FD creation thru a single mutex?? Why they can't just close all sockets that need closing after fork?

Also the FD mutex that need to be taken before each read and write is nasty (is it just to prevent a race with close or is it for something else?), but at least that won't usually require any syscall and on sane applications could be optimised with a Java-like biased lock.