How would you like a job in the supercomputing industry? Programming those powerful Ks, Jaguars, Roadrunners, Blue Genes, or gigantic clusters of computers? How inspiring would that be?
Not much, according to the luminaries of the field. I went to a panel about the future of supercomputing at SC11, and learned that the future is… Fortran, MPI, OpenMP and CUDA. I have no reason to doubt the experts; after all some of them were with the industry when it was all ferrite core memory and punch cards. But it makes me wonder if there is a future at all for supercomputing, if things keep going in this direction.
Let me explain: Programming in Fortran, MPI (Message Passing Interface), OpenMP (a system of annotations for C or Fortran to help the compiler parallelize the program), and CUDA (Compute Unified Device Architecture for programming GPGPUs) is tedious, uninspiring, and boring.
I talked to a CS student who was demonstrating his summer work at the booth belonging to one of the large national labs. It was a project to improve Monte Carlo simulations of some physical processes. It was done, unsurprisingly, using MPI and OpenMP. I asked him what the exciting part of the job was. It was the learning of the Monte Carlo method. The rest was the tedium of combining barely compatible clunky programming paradigms into a workable program.
Why does it matter? Because a thriving industry or a company must attract talent. And talent can’t be bought, at least not easily. There was once a study, which showed that, above a certain compensation level, talented people don’t care so much about salaries as they do about the novelty, excitement, and freedom. Google knows it very well: They create an exciting work environments (I call them day-care centers for programmers), and encourage their employees to spend 20% of their time pursuing their own projects. No wonder there is an underground pipeline from Microsoft to Google through which the talent keeps leaking out.
By the way, I worked for Microsoft back when it was exciting. Our salaries were rather mediocre, but we felt the urge to work long hours and weekends because we felt that our contributions mattered. Unlike today, sales and marketing were not driving the company, developers were.
To confuse matters even more for the executives, programmers are relatively cheap. The cooling bill for a data center dwarfs the cost of software development. Let’s face it, from a distance, a programmer might look just like another commodity, like a computer rack, air conditioner, or a router. This is even more pronounced in supercomputing, where a single rack might go for a million dollars–an equivalent of 10-20 programmer/years.
If you drain all the excitement from work, your company, or the whole industry, is bound to stagnate. Bored people don’t innovate. And we know from experience that, in high tech industries, if you don’t innovate, you die. Old programming paradigms might have worked for years, but new unmet challenges are piling up. A lot of work that required supercomputers in the past is now done on clusters of off-the-shelf components. Google owns one of the largest supercomputers in the world, and it’s all built from cheap commodity boxes. But Google lets its people innovate.
But not everything is bleak in the land of supercomputers. I have met two teams that were brimming with ideas and enthusiasm: one was Brad Chamberlain’s Cray Chapel team and the other was Hartmut Kaiser’s Louisiana State University Ste||ar team. I’m sure there were many others, but those were the ones I had the pleasure of meeting outside of the exhibition hall.
You can tell that a team is dedicated to a task if they can’t stop talking about their work even after a few beers. Young creative people are attracted like moths to interesting and challenging projects. I don’t think writing simulations using OpenMP and MPI, even if they run on Cray X-MP, can generate this kind of enthusiasm.
November 21, 2011 at 12:41 pm
This is a sad comment on the depth of understanding of the basic problem. As your respondents noted, it is the infrastructure that is the hard bit, not the operation code. Yet this should not be so. What is missing is a translatable design notation for the infrastructure, the glue that holds the operations together. Such a notation must be fully translatable, the generated code, and any run-time components, must completely abstract both the hardware and software environment, and must provide an environment that is a pleasure to use, not the brain ache and inflexibility of OpenMP, DDS or any of that ilk. The prime problem with all of the above is that they all constrain the design to address the problem of the distribution model well before the designer is ready to do so. Chapel has addressed at least some of those concerns, it has the capability to define the distribution model last, but that distribution model is effectively static. The designer defines that a certain method be run at a certain place; the data must always follow the code. If the full resources of the hardware are to be harnessed then code must be allowed to follow the data; the run time must have the freedom to run a method where best suited, where the data is already. The designer must have a notation that describes the concurrency in a lucid way, and allows the distribution model to be completely decoupled from the functional model. UML has signally failed in this regard and, I believe, is the wrong place to start.
November 21, 2011 at 12:51 pm
Bartosz, we really had a great time in Seattle, thanks!
Adrian, code following the data is exactly what hpx is about.
November 21, 2011 at 12:58 pm
Formula 1 is boring. Pilots shift gears manually. It can’t attract real talents, guess what – no audio system either.
November 21, 2011 at 1:09 pm
http://groups.google.com/group/comp.sys.super/msg/38b7e0f5416908d6 is where I wrote about this blogpost on Usenet.
November 21, 2011 at 1:50 pm
People use Fortran because it’s fast, has native support for vector operations, high-quality compilers exist for it, and there is a huge base of numerical libraries that has already been optimized for nearly every platform (ATLAS, for example) so you don’t have to implement everything from scratch.
When you paid $100 million for a machine you want to squeeze every last bit of juice out of it. I don’t like Fortran either, but there is simply no other way (well, except for C maybe) to get that level of performance.
November 21, 2011 at 2:54 pm
The choice of language is almost irrelevant. Text languages are inherently sequential and will never be able to express parallelism with any degree of clarity. The problem is one of design tools, not language. My posting above gave a hint of just some of what I consider to be the requirements for a design tool, if anyone can suggest solutions I would love to hear of them. Chapel, I think, goes some of the way but, being a text language, starts from the wrong point. It is still a language, not a design tool.
November 21, 2011 at 3:45 pm
Bartosz: I was wondering, what are your thought on C++ AMP?
http://www.gregcons.com/KateBlog/DidYouNoticeCAMPYouReallyNeedTo.aspx
For me the open-spec and modern C++ aspects make it potentially very exciting!
November 21, 2011 at 4:09 pm
I have a wait and see approach to AMP. It promises a lot but so far delivers little. It’s nicer than CUDA, for sure.
November 21, 2011 at 4:15 pm
Bartosz: OK, fair enough. I also find the heterogeneous aspect interesting — while arguably Thrust with CUDA 4.0 delivers (or starts to getting closer to delivering) a nicer syntax than CUDA for C it’s still somewhat NVIDIA-GPU-centric (although I’ve heard there are ongoing attempts to bring support for CPUs/SIMD, too) — and if C++ AMP handles it as well as the promise seems to be, this might turn out to be reason enough to switch.
I guess we’ll see — will be looking forward to your thoughts when it comes out and you have time to review it! 🙂
November 21, 2011 at 5:12 pm
The problem is partly that as a problem gets more difficult there are fewer people that are capable of coming up with a solution. I usually look for a similar domain and see if there is a solution there. Producing faster parallel-processed results is the domain of hardware design, so you can just reuse that methodology with C++
http://parallel.cc
– just evolution 😉
November 22, 2011 at 12:29 am
http://www.release-project.eu/
this might be interesting to you! Erlang on a Blue Gene 🙂
November 22, 2011 at 1:06 am
IMHO, language based solution are poor and will still be poor. No one want to tackle on learning on more variation of C/C++/Fortran with funky annotations or what not. That’s 50 years we have been promised some do-all end-all parallel compiler/language/guacamole … nothign to rear its end and solve our parallel problem yet.
Attacking parallel programming problem as one monolithic domain is foolish, approach base don DSL (or EDSL) like Scala are, I think on a good track. Solvign th eproblem domain by domain, usign a underlying infrastructure so every bits is interoperable with the other.
Until we have somethign domain expert like physiscist or biologist or what not can grok – adn those peopel ususally don’t want ot have to grok FORTRAN, they’re forced to, it’ll be doomed to failure. No amount of new fancy standard (like AMP or openAcc) will fix this.
@acetoline : so does C and C++, really. Fortran is no more the king of the performance hill.
November 22, 2011 at 3:10 am
I have been recently intrigued about reading STAPL of Stroustrup’s laboratory – http://bit.ly/sfWfAR
It seems to be a generic interface for MPI/OpenMP but I have difficulties to see if it is only about hybrid parallelism or if there are some plans for heteregenous parallelism.
Did anyone try it ?
Does anyone know if it is in the same field as OpenHMPP ?
Thanks 🙂
November 22, 2011 at 7:37 am
If you think HPX/ParalleX was exciting, have a look at Charm++. Unlike the former, Charm++ has been in production use for 15 years, and the current crop of students working on it (myself included) still have exciting new things that we’re doing with it (cf 3 SC papers, and some posters and PhD forum presentations). It just won a performance award from the HPC Challenge competition, a project which was a heck of a lot of fun and gave a clear sense of what directions we need to take the system.
If you’d be interested, it’d be cool if you came to visit the lab and the department. I’m sure there’s a lot of potential for cross-pollination between your work on parallelism and concurrency and ours.
November 22, 2011 at 1:47 pm
@Phil: I’ve heard people talk about Charm++ at SC11. My knowledge of if is limited to reading a Wikipedia article, so I won’t comment on it. I’m watching the competition between message-passing and PGAS with interest. From the programming point of view, message passing suffers from inversion of control, PGAS doesn’t have a good story for graph algorithms. Maybe the answer is in the convergence of the two.
November 25, 2011 at 10:57 am
[…] Supercomputing: An Industry in Need of a Revolution. Bartosz Milewski wants decently paying, interesting and meaningful jobs for all (supercomputer programmers). Get on that Santa. […]
December 7, 2011 at 3:56 pm
“A lot of work that required supercomputers in the past is now done on clusters of off-the-shelf components” — isn’t that the core of this? More than supercomputing suffering, it’s split into two fields — computing with datacenters full of (or clusters of) commodity components, and “supercomputing proper,” with extreme compute needs (GPUs) and/or connectivity needs (low-latency 100GbE) and maybe more of an emphasis on code efficiency than programmer productivity (or programmers working closer to the hardware than datacenter ones do, anyway).
That’s a first approximation–if it’s wrong, hope it’s interestingly so and someone can informatively set me straight. And if it’s about right in outline, question is what it means for finding interesting, useful problems (and, relatedly, jobs) or finding a way to make inspiring, fun computing tools.
December 8, 2011 at 12:56 am
This seems relevant –
http://www.infoq.com/presentations/We-Really-Dont-Know-How-To-Compute
If we knew how our own brains worked we’d be most of the way there. On the upside the human brain was produced by the random process of evolution, so it shouldn’t be too hard to replicate in Silicon – no intelligence required?
December 13, 2011 at 7:18 pm
A while ago (2008) I wrote a set of three blog posts about what was going on when I could do a quick one-evening web search for parallel programming languages, find 101 of them, and there was general agreement that none of them were in any kind of use except MPI and OpenMP.
See http://perilsofparallel.blogspot.com/2008/09/101-parallel-languages-part-1.html for the start of that three-part post series.
I shared this with a fairly high-level HPC sub-community; their consolidated responses are in the post series.
My personal opinion, which I don’t now think was well-expressed there, is that no programming language succeeds just on performance, which is all that parallelism brings to the table. To succeed, a language must do something that makes the programming task itself better/ easier/ less complex/ etc. in a way that programmers immediately respond to.
December 14, 2011 at 2:34 pm
There’s more than one way of interpreting results. 101 languages failed: Does it mean that a new language is not the solution? Or does it mean that there is desperate need for a new language, and people keep trying?
I agree that no language succeeds on performance — if this were true, we’d be still programming in assembly. New languages succeed on the right combination of abstractions, as long as these abstractions don’t kill performance.
In HPC, a new language has slim chances of succeeding without strong corporate backing. So as long as top executives believe that the future is Fortran, MPI, and OpenMP, there will never be enough push for change. Until, of course, it’s to late.
December 14, 2011 at 10:36 pm
I kinda disagree that “no language succeeds solely on performance”/otherwise we would be writing in assembly. C/C++ are primarily used for performance, and they beat assembly because the compilers got better than people at optimizing the assembly level. As such it might be worth looking at the problem from the perspective of: what parallel language constructs can the compilers handle better than people, or what programming styles suit the tools.
December 15, 2011 at 9:47 am
Yes, nowadays C & C++ are used for performance.
Back in the mid-70s when C was originally taken up, however, it provided the unique ability to to system programming in a high-level language. Compared with the alternative – assembler – it was allowed a simply awesome increase in programmer productivity. Don’t forget, this was a time when choices were assembler, FORTRAN, COBOL, PL/I, Pascal, and the like. The issue wasn’t performance – although a systems programming language has to provide that — but the ability to manipulate addresses as needed, among other things.
C++ added OO to C, and when it was introduced everybody thought OO was a really cool way to organize programs. Even the way C++ did it.
December 15, 2011 at 10:33 am
More importantly, C++ introduced Generic Programming (GP), with STL (algorithms-iterators-containers) and templates — this paradigm is sometimes a much better (natural) fit than OOP for high-performance numerics (and one of the reason I prefer C++ to, say, Fortran).
Of course, operator overloading and the resulting easy-to-read linear algebra (matrix & vector) operations syntax help, too! 🙂
January 15, 2012 at 2:49 am
What the post kind of misses is that most people who actually use these so-called “super computers” are not programmers.
I have actually done some work in this field. I am statistician by trade, and a programmer “only” by soul. Granted, I worked in the public sector, but my gut feeling is that attracting talent in this field is not so much about brilliant programmers, but about talented mathematicians, statisticians, biologists, et. cetera. For these people Fortran is usually quite fine. To reword this: these people write bad code in any language (no offense to anyone).
Milewski wrote:
“It was the learning of the Monte Carlo method. The rest was the tedium of combining barely compatible clunky programming paradigms into a workable program.”
Indeed.
I see the “supercomputing” field heading towards more and more domain-specific languages. It may be Fortran or C that runs under the hood, but what the “punch card guys” want is Mathematica, Matlab, SAS, R.
Let Dongarra et. al. do their own work. The users of “supercomputers” want results. They definitely do not want C++.
If you are doing weather forecasts, you don’t really care about programming per se. Where I worked we did (almost) real-time analysis of stock data, and I assure you, the actual programming was a lesser problem. But if I could’ve done it with R, I’d be one year younger.
January 15, 2012 at 11:51 am
I understand that a programming language is a tool, just as a supercomputer is a tool. It’s a matter of finding the right tool for a given job. If Fortran is adequate for your task then, by all means, use it.
But what do you do when it isn’t? The current path is to keep hacking: extending Fortran (or C) with MPI, CUDA, OpenMP, and so on. Pretty quickly you are learning arcane programming techniques and computer architecture. You’re hand-coding the distribution of your calculation between multiple nodes and/or GPUs. This is much harder, more unpleasant, and boring than it should be. It is throwing good money after bad money.
Sometimes incremental changes result in pushing you into a dead end. But I believe in a better world where scientists, programmers, and computer scientists work together to make each others’ lives easier and more interesting.
April 26, 2012 at 2:03 pm
The main problem of high-performance computing in my experience is inventing algorithms with high locality of data, operating only on independent chunks of data small enough to fit into some kind of core-local memory, but still equivalent to quite complex linear/straightforward algorithms and not (much) worse in the sense of computational complexity. Pretty challenging and as a consequence interesting. No compiler/library can do that, and I suspect this is not going to change any time soon. The problem of distribution of work is a trivial one by comparison.
April 26, 2012 at 3:57 pm
@Pavel – that’s the same job as designing hardware, we have methodology and tools for it. The downside is that you have to write your code in Verilog, VHDL or the SystemC subset of C++. I hear the FPGA folks are maybe moving to providing OpenCL compilers, but I’d say the levels of abstraction the tools use at the moment are wrong for both hardware and software design.
It’s not that hard a problem unless you go about it the wrong way, but then again not everybody makes a good hardware engineer.
November 27, 2016 at 11:09 am
The excitement in supercomputing programming is not the program itself but the problem that you want to solve.
I like Fortran: there is no language that is better for numerical programming. It is SIMPLE for the tasks that need to be accomplished, array handling (elemental procedures, sectioning, indexing, …) is nowhere better and this is the bread and butter of our calculations. And of course efficiency and optimization is among the best in the market. Of course it is understandable that large projects requiring system interaction, modularity and high level of abstraction might make someone prefer a different language.
As of MPI,OpenMP I will not comment, but is there anything better available?
November 28, 2016 at 1:52 pm
@Konstantinos Anagnostopoulos: I’m sure that there are problems for which Fortran is the perfect tool. But it’s possible that, if Fortran is all you have at your disposal, you won’t even consider attacking problems for which Fortran is not enough.
Look at the history of operating systems for the PC. They used to be written in assembly. The last such project was the IBM OS/2, which was a spectacular failure. It was easily overtaken by Windows, which was written in C.
November 28, 2016 at 2:17 pm
Among new code development at leading-edge organizations (primarily, the US DOE laboratories), most code is being written in C++, not Fortran. Its relatively stronger facilities for abstraction and high-level expressions are being heavily exploited. As ones program structures and computational methods become more irregular, Fortran’s strengths become less helpful.
As for MPI+OpenMP, I’d offer the system I work on, Charm++, as one potential alternative.