Threads Don’t Scale, Processes Do

August 24, 2008

Threads Don’t Scale, Processes Do

Posted by Bartosz Milewski under Concurrency, D Programming Language, Erlang, Multicore, Multithreading, Programming
[18] Comments

With the multicore explosion in the making, are we going to be running hundreds of thousands of threads in a single program? Erlang programmers would emphatically say, yes! C++ and Java programmers would probably say no.

Why this discrepancy?

Thread model based on heavy-duty OS threads and mutexes has its limitations. You can ask server writers, or Google for “thread per connection” to convince yourself. Servers use thread pools exactly because of that.

Thread pools are an admission of defeat for the thread model. Having to pool threads tells you that:

Thread creation is not fast enough
Threads’ consumption of resources is substantial, so it makes sense to keep their numbers down

Granted, these are technical problems that might be overcome in future by improvements in operating systems.

The more fundamental problem with threads has its root in memory sharing. It seems like sharing offers great advantage in terms of performance, but sharing requires locking. It’s a well known fact that locking doesn’t scale (or compose). Between races and deadlocks, it’s also extremely hard to get right.

Here’s what Erlang does

Erlang gives up on sharing!

Threads that don’t share memory are called processes. We tend to think of processes as heavyweight beasts implemented by operating systems. That’s because one needs the operating system to strictly enforce the no-sharing policy (the isolation of address spaces). Only the OS can manage separate address spaces and the passing of data between them.

But that’s not the only way. The isolation might instead be enforced by the language. Erlang is a functional language with strict copy semantics and with no pointers or references. Erlang processes communicate by message passing. Of course, behind the scenes, messages are passed through shared memory, thus avoiding a large performance hit of inter-process communication. But that’s invisible to the client.

Erlang rolls out its own threads!

Erlang interpreter provides lightweight processes (so lightweight that there’s a benchmark running 20 million Erlang processes).

And there is a bonus: Erlang code that uses lightweight processes will also work with heavyweight processes and in a distributed environment.

Why don’t we all switch to Erlang?

As far as I know there are two main reasons:

It’s a weird language. Don’t get me wrong, I love functional programming for its mathematical beauty, terseness, and elegance. But I had to rewire my brain to be able to write pure functional programs. Functional paradigm is as alien to our everyday experience as quantum mechanics. (CS grad students: you’re not typical programmers.)
Messages have to be copied. You can’t deep-copy a large data structure without some performance degradation, and not all copying can be optimized away (it requires behind-the-scenes alias analysis). This is why mainstream languages (and I will even include Scala in this category) don’t abandon sharing. Instead they rely on programmer’s discipline or try to control aliasing.

Conclusions

Native threads are expensive. Interpreted languages, like Erlang, can afford to implement their own lightweight threads and schedulers. I don’t see that happening in compiled languages. The hope is that operating systems will improve their implementations of threads–I hear Linux’s NPTL already is a big improvement in this area. In the meanwhile, we have to rely on thread pools.

Shared memory concurrency model is the reason why multithreaded programming is so difficult and error-prone. Separating address spaces simplifies programming, but at a performance cost. I believe that some kind of compromise is possible. A message-passing or an actor model can work with sharing, as long as aliasing is under control.

Inspiration for this post

After my last post on thin lock implementation, I was asked the question, why I used such a small number, 1024, for the maximum number of threads. It was actually a number I’ve found in the D compiler runtime. A thin lock could easily work with a much larger number of threads. In fact I’m going to substantially increase the number of bits for thread ID at the expense of recursion count. But this question got me thinking about scalability of threading in general.

Fibers

Fibers (as well as Java’s green threads) are not an alternative to heavyweight threads. Fibers can’t run in parallel, so they have no way to use multiple processors. They are more of a flow-of-control construct, often used interchangeably with coroutines.

Interesting reading

The Problem with Threads by Edward E. Lee.
Kilim: Isolation-Typed Actors for Java by Sriram Srinivasan and Alan Mycroft.

18 Responses to “Threads Don’t Scale, Processes Do”

snakescaly Says:

August 25, 2008 at 5:19 am
I’ve got an impression that, since all data in Erlang is immutable, interpreter doesn’t need to actually copy messages. It can safely use a shared copy—unless the message is passed between different nodes, i.e. to another system process, or over network.

But I have also got an impression that Erlang doesn’t integrate well. There’s no much sense in writing tools in Erlang, you can’t gradually port an existing system to Erlang and see if it helps. You must build a new system which is a major decision and is hard to prove feasible. I think this is the main reason Erlang is still exotic.
llucax Says:

August 25, 2008 at 6:49 am
Oz is another language with *very* cheap concurrency, and somehow multi-paradigm.

http://www.mozart-oz.org/documentation/apptut/node9.html

I couldn’t do that test in D because of the hard limit on threads you mentioned =S
Bartosz Milewski Says:

August 25, 2008 at 12:46 pm
It’s true that data in Erlang is immutable. But every functional language worth its salt tries to optimize the copying behavior. For instance, it might not copy a whole list if you only change one element. So, behind the scenes, it has to keep track of aliasing. And you can’t carry the aliases between threads!
Top Posts « WordPress.com Says:

August 25, 2008 at 5:06 pm
[…] Threads Don’t Scale, Processes Do With the multicore explosion in the making, are we going to be running hundreds of thousands of threads in a single […] […]
patryksharks321 Says:

August 26, 2008 at 4:25 am
isn’t Lisp also a functional language also?
greetings
Hundreds of thousands of threads? Yes, with Erlang | insideHPC Says:

August 27, 2008 at 7:23 am
[…] with the pointer to a blog post by Bartosz Milewski on some of the problems with threaded programming in the mainstream languages […]
Sean Kelly Says:

September 5, 2008 at 9:17 am
Like most interpreted languages, there are native code compilers available for Erlang. So that aspect of performance isn’t an issue. But I think talk of performance may be a red herring anyway–certainly insofar as the upper-level control structures of a program are concerned. It’s been well established that the performance gains we’ll see in the future will come from parallelism rather than from optimizing a single thread or process, so a massively parallel language like Erlang should be ideally suited to scale over time, and to scale automatically with no changes to the program.

Also, I don’t see much of a problem with copying vs. shared data. If a system must maintain a large amount of data that isn’t feasible to copy then simply make one or more agents a broker for that data. It’s no different than interacting with a database. Of course, this has the potential to create a choke-point for program flow, but it does in a shared-memory program as well. The only real advantage with shared-memory programs in this case comes from fast in-process access to the data, but this is at the expense of program integrity (shared memory means that one badly behaved thread can bring down others as well).

As for syntax issues… any programmer worth her salt should be able to pick up a new language without much fuss. In fact, some universities teach functional languages first, and they are successful with this approach. So I also do not think that functional languages are inherently confusing. As for Erlang specifically, it is far more imperative in its syntax than, say, Lisp, and so should be fairly easy to pick up anyway.
Bartosz Milewski Says:

September 9, 2008 at 7:28 pm
Sean, as far as I know, Erlang native compilers map Erlang processes into Posix threads or Unix processes. That would defeat the advantages of lightweight threading.

As far as performance gains from parallelism go, this is a highly disputed topic. It was a different story when we could speed up any program by increasing processor clock speed. Now you have to re-write programs to take advantage of parallelism, which is far from trivial (see Amdahl’s law). As far as automatic parallelization, it’s an active research topic with positive results few and far between.

I agree that copying is much safer than sharing. In most applications sharing is a premature optimization. I remember when C++ COW strings were all the rage. Then people got used to using them and, when the implementation switched to copying, they didn’t notice the difference.

Having said that, there is a subset of applications where copying is not an option. Multimedia programmers simply cannot afford such an overhead. Being a systems language, D must support sharing. Of course, the safer we can make it the better.
Sean Kelly Says:

September 19, 2008 at 12:35 pm
I completely agree that D must support sharing because it is a systems app, and also agree that no language is ideally suited for every problem. However, I’m less clear on whether multiple threads of a multimedia app would actually need to share access to a large chunk of raw data. Most data processing of the kind that I’m aware of uses the scatter/gather method, for which message-passing is an excellent solution. But then I’m not much of a multimedia programmer, so I’m mostly speculating. Is there some multimedia problem that doesn’t decompose well to process-based multiprogramming?
Bartosz Milewski Says:

September 19, 2008 at 2:58 pm
Have you looked at the SharC paper I mentioned a few times? It gives an example of a multi-threaded multi-media application. Essentially, data is acquired from a device (either audio or video) and then goes through several stages of processing, Each stage runs in a separate thread. Large amounts of data are pipelined from stage to stage and copying would be unfeasable (and unnecessary). This setup is characteristic of many (almost-) real-time audio or graphics applications, including games.
Sean Kelly Says:

September 19, 2008 at 5:36 pm
Thanks, I’d missed that paper. From what I can see, the program flow it describes is logically a message-passing design. And as I believe you mention above, in-process message passing can actually be implemented via sharing (in fact, I think the Erlang message passing mechanism does this implicitly). However, Id’ forgotten about a related fly in the ointment for functional languages: the fact that data is immutable. Even if Erlang could avoid copying when transferring data between stages, each stage would have to create a new “post-processed” copy of the buffer to pass on to the next stage, and this is probably not ideal given the performance requirements of multimedia apps. I suppose the same issue would apply to D in that pure functions couldn’t be used for multimedia work because they would impose the same requirement.
CadiaMefefave Says:

November 19, 2008 at 5:56 pm
Hello. It is test.
NetGaituipket Says:

December 2, 2008 at 5:46 pm
I am here at a forum newcomer. Until I read and deal with the forum.
Let’s learn!
Types for Concurrency « Bartosz Milewski’s Programming Cafe Says:

March 23, 2009 at 2:16 pm
[…] perfectly feasible to have a concurrent language that disallows sharing–Erlang is one (see my post about Erlang). The trick is to always pass data between threads by value. This is especially easy in functional […]
How to Get Six Pack Fast Says:

April 15, 2009 at 11:13 am
I can tell that this is not the first time at all that you write about this topic. Why have you decided to write about it again?
Bartosz Milewski Says:

April 15, 2009 at 11:30 am
I’m in charge of designing multithreaded support in the D programming language, so I have to study the ways other languages deal with it. The same ideas come back over and over again in various contexts, so there’s bound to be some redundancy.
A Drop In The Stream › links for 2009-05-30 Says:

May 30, 2009 at 9:04 pm
[…] Threads Don’t Scale, Processes Do « Bartosz Milewski’s Programming Cafe Good explanation of the constraints of multithreading and concurrency. (tags: concurrency erlang programming performance architecture scalability threading threads) […]
alkaway Says:

June 22, 2009 at 1:57 pm
stagecoach casino music casino 7 clans casino thief river .stanley casino derby cuban casino indiana live casino myspace mgm grand foxwoods casino .no casino in plymouth nb casino .reel deal casino download casino royale music online casino sportsbook new casino ac .with mgm hotel x26 casino las vegas .Here .follow .follow could little creek casino hotel philadelphia sugarhouse casino or Here hollywood park casino poker tournament After montreal hotel casino rushmore city casino Best jouer casino in the attached now sometime four queens casino x26 hotel reviews choctaw casino dallas Ok, here Buy camp verde casino Come to multiplayer casino tyler florence clams casino blue water casino in parker az or someone prescott casino resort tukwila casino washington catfish casino sportsbook casino promotion code This was feather falls casino x26 lodge ought to
fitzgerald casino blackhawk royal casino movie .This website about in the attached quotes from casino royale In crown casino waterfront too site casinopauma.com pauma casino .The .follow .is the same as too century casino blackhawk sun cruz casino port richey was For daily star casino Cool stuff – showboat casino atlantic city parking las vegas casino list hotel casino cucuta iron horse casino in auburn wa Best wigan casino address which this .