Golang implements a custom scheduler in userspace too, and this is why its FFI is not as fast or as well integrated with the OS as the calling conventions of C or C++ are. Heck, Golang's FFI is even less straightforward than that of Java, since the JVM is typically 1:1, not M:N.
Once you've gone M:N you're effectively in your own little world. There's a reason Golang's runtime is compiled with the Golang toolchan, and why it invokes syscalls directly as opposed to going through libc.
I did some cgo (Go's FFI) work. Go's FFI is vastly superior to JNI (which is in a league of its own when it comes to awfulness).
No FFI is really great. Go's isn't as good as C# or Luajit or Swift but better than Java or Python or Ruby or Lua FFI (where you have to write a wrapper for each function you want to expose to the language).
C and C++ are OS calling conventions so I'm not sure what that was supposed to mean.
It's true that go is M:N and that it does have bearing on bridging some C code (in rare cases where C code only works when called on main thread).
However, gccgo has Go runtime written in C and compiled with gcc, so go's runtime isn't tied to one toolchain.
I don't know why Go designers chose to (mostly) avoid libc but it certainly is great for portability and ease of cross-compilation. If you take dependency on libc (which is different between Linux/Mac/Windows) you pretty much throw the ability of cross-compile and given Go's static compilation model would require bundling a C compiler (unlike runtime-based languages like Java, Python or Ruby, where only runtime has to be compiled on the proper OS, so that complexity is contained for the programs written in the language).
I don't see why do you ding Go in particular for being "it's own little world". Compared to C - of course. Compared to Java or Python or Elixir? Much less than them.
cgo-style stack switching (which I assume BEAM also uses) adds a lot of overhead at runtime, which Java and Python don't need since they're 1:1.
The speed of the FFI really affects how a language ecosystem uses it; if it's a lot slower to call out to external libraries than to call code written in the same language, then there's a large incentive to rewrite all dependencies in the language as opposed to using what's already there. Sun's libraries are a bit of a special case in that Sun really tried to rewrite everything for strategic/political reasons, but look at Android; the heavy lifting in the Android stack is done by Skia, OpenGL, and Blink/WebKit (to name a few), a strategy which works because JNI is relatively fast. Python also heavily favors using C libraries where appropriate, again because Python C bindings are fast.
I don't understand the issue about cross-compilation. You don't need a cross-compiler to statically link against native libraries; you just need a binary library to link to and a cross-linker (which can be language-independent). And, of course, if you dynamically link, you don't even need that much.
I'm not really trying to ding Golang, in any case. M:N scheduling has benefits as well as drawbacks. FFI is one of the downsides. There are upsides, such as fast thread spawning. It's a tradeoff.
Its of topic but I think it was a great idea to have terrible C interop. It really forced people to write java all the way. This meant the JVMs could really evolve unlike the Python and Ruby ones.
I am not sure the new java FFI is such a great idea in the long run. I would rather that they spent more time focusing on object layout and GPU compute ;)
> It really forced people to write java all the way. This meant the JVMs could really evolve unlike the Python and Ruby ones.
Java's FFI is horrible; Python and Ruby are mediocre. LuaJIT2's is fantastic. Not so surprisingly, Python ate Java's lunch in places like scientific computing, where it is much more beneficial to build on existing work.
Python is hard to dethrone from that spot right now because of momentum, mostly - but if the competition was started again, I'm sure LuaJIT2 would take the crown (Torch7 is based on it, but that's the only one I know).
I think my bottom line is: If you want your VM environment to be self sufficient, have horrible FFI like Java. If you want your VM environment to thrive with existing codebases, you have to have at least a mediocre one like Pythons. But you can have the best of all worlds like LuaJIT[2] - and that's who Oracle should be copying.
I think Python will loose momentum as soon as Julia gets more adoption, likewise with languages like Go and ML derivatives. Unless PyPy gets more widespread that is,
Upcoming Java's FFI is based on JNR, an evolution of JNA, used by JRuby for native FFI.
Nevertheless everyone seems to have forgotten about CNI, implemented on GCJ, which mapped C++ classes directly to Java ones.
Any situation where you have M userspace jobs running on N system threads, i.e. the number of tasks is different to the number of system threads.
Normally this occurs because you're running a large number of "green" threads on your own scheduler which schedules onto a thread pool underneath. This is good if all your threads are small/tiny since userspace thread creation is cheaper than creating an OS thread but if your jobs are long-lived then your userspace job scheduler is really just adding additional scheduling overhead on top of the overhead that the OS already has for thread scheduling and you would have been better with OS threads. If your M:N threading requires separate stack space for each job, there can be a sizeable overhead (this is why Rust abandoned M:N threading).
If you're crossing the FFI boundary a lot, any overhead adds up quick. For example, drawing a bunch of small objects using Skia, performing lots of OpenGL draw calls, allocating LLVM IR nodes, or calling a C memory allocator…
One of the nice things about M:N is it decouples concurrency from parallelism. That is, your application can spawn as many inexpensive green threads as the task requires without worrying about optimizing for core count or capping threads to avoid overhead, etc. With Go 1.5, underlying system thread count will default to the same as the system CPU core count.
It's noticeable to the end-user only in it's negative performance implications in certain situations, making things slower than they would be otherwise on the same hardware. It's a low-level construct, it is not directly noticeable to the user either way. The negative performance implications are largely under heavy load. The post you replied to gave some more specific situations.
Once you've gone M:N you're effectively in your own little world. There's a reason Golang's runtime is compiled with the Golang toolchan, and why it invokes syscalls directly as opposed to going through libc.