Why is climate modeling stuck?
There is no obvious reason why we can’t do better, and yet the promised progress in this direction isn’t appearing anywhere near as fast as the policy sector needs it.
Specifically, is it possible to provide useful regional prognostics of the consequences of global change? That’s what most people want us to do and what most people think we are up to. Whether that is what we want to do or not, I think some of us should rise to the occasion.
I would like to make the case that I can help do something about this. It’s a personal ambition to actually influence the architecture of a significant climate modeling effort before I retire. I think my skill set is unusual and strong in this regard, but unfortunately my track record is less so. It’s not that I haven’t accomplished anything as a coder, a software architect, a EE, or a manager, actually. It’s just that academia treats my time in industry as tantamount to “unemployment” and vice versa.
At the least, though, I can try to start a conversation. Are we stuck? If so, why? What could be done with a radical rethinking of the approach?
I have an essay on one view of the problem on another blog of mine. … I think both climate dynamics and software engineering issues are germane, and I’d welcome any informed discussion on it. There seem to be enough people from each camp lending me an ear occasionally that we might be able to make some progress.
Update: Well, since I’ve failed to move the discussion over there, I’ll move the article here. Thanks for any feedback.
I believe that progress in climate modeling has been relatively limited since the first successes in linking atmosphere and ocean models without flux corrections. (That’s about a decade now, long enough to start being cause for concern.) One idea is that tighter codesign of components such as atmosphere and ocean models in the first place would help, and there’s something to be said for that, but I don’t think that’s the core issue.
I suggest that there is a deeper issue based on workflow presumptions. The relationship between the computer science community and the application science community is key. I suggest that the relationship is ill-understood and consequently the field is underproductive.
The relationship between the software development practitioners and the domain scientists is misconstrued by both sides, and both are limited by past experience. Success in such fields as weather modeling and tide prediction provide a context which inappropriately dominates thinking, planning and execution.
Operational codes are the wrong model because scientists do not modify operational codes. Commercial codes are also the wrong model because bankers, CFOs and COOs do not modify operational codes. The primary purpose of scientific codes as opposed to operational codes is to enable science, that is, free experimentation and testing of hypotheses.
Modifiability by non-expert programmers should be and sadly is not treated as a crucial design constraint. The application scientist is an expert on physics, perhaps on certain branches of mathematics such as statistics and dynamics, but is typically a journeyman programmer. In general the scientist does not find the abstractions of computer science intrinsically interesting and considers the program to be an expensive and balky laboratory tool.
Being presented with codes that are not designed for modification greatly limits scientific productivity. Some scientists have enormous energy for the task (or the assistance of relatively unambitious and unmarketable Fortran-ready assistants) and take on the task with energy and panache, but the sad fact is that they have little idea of what to do or how to do it. This is hardly their fault; they are modifying opaque and unwelcoming bodies of code. Under the daunting circumstances these modifications have the flavor of “one-offs”, scripts intended to perform a single calculation, and treated as done more or less when the result “looks reasonable”. The key abstractions of computer science and even its key goals are ignored, just as if you were writing a five-liner to, say, flatten a directory tree with some systematic renaming. “Hmm, looks right. OK, next issue.”
This, while scientific coding has much to learn from the commercial sector, the key use case is rather atypical. The key is in providing an abstraction layer useful to the journeyman programmer, while providing all the verification, validation, replicability, version control and process management the user needs, whether the user knows it or not. As these services become discovered and understood, the value of these abstractions will be revealed, and the power of the entire enterprise will resume its forward progress.
It’s my opinion that Python provides not only a platform for this strategy but also an example of it. When a novice Python programmer invokes “a = b + c”, a surprisingly large number of things potentially happen. An arithmetic addition is commonly but not inevitably among the consequences and the intentions. The additional machinery is not in the way of the young novice counting apples but is available to the programmer extending the addition operator to support user defined classes.
Consider why Matlab is so widely preferred over the much more elegant and powerful Mathematica platform by application scientists. This is because the scientists are not interested in abstractions in their own right; they are interested in the systems they study. Software is seen as a tool to investigate the systems and not as a topic of intrinsic interest. Matlab is (arguably incorrectly) perceived as better than Mathematica because it exposes only abstractions that map naturally onto the application scientist’s worldview.
Alas, the application scientist’s worldview is largely (see Marshall McLuhan) formed by the tools with which the scientist is most familiar. The key to progress is the Pythonic way, which is to provide great abstraction power without having it get in the way. Scientists learn mathematical abstractions as necessary to understand the phenomena of interest. Computer science properly construed is a branch of mathematics (and not a branch of trade-school mechanics thankyouverymuch) and scientists will take to the more relevant of its abstractions as they become available and their utility becomes clear.
Maybe starting from a blank slate we can get moving again toward a system that can actually make useful regional climate prognoses. It is time we took on the serious task of determining the extent to which such a thing is possible. I also think the strategies I have hinted at here have broad applicability in other sciences.
I am trying to work through enough details of how to extend this Python mojo to scientific programming to make a credible proposal. I think I have enough to work with, but I’ll have to treat the details as a trade secret for now. Meanwhile I would welcome comments.