Friday, March 24, 2006

Open source scientific computing - follow up comments

My comments on open source software in science generated more feedback last month than any other article.

As well as a thread on sci.math.symbolic, there was an interesting response from Tim Daly, lead developer on the Axiom project.

There were three basic themes:

There was some heated debate on the best form of open source licensing. I have no particular expertise here, so I shall just duck that question.

There were several points that I had argued were a problem to open source, that people rightly questioned "so why is that more of a problem for open source than professional development?". Let me run through those now.

1) "The knowledge required for the topic and the interdependent parts of the system reduces the pool of possible contributors." It is quite true that the same issue applies to commercial development. But what I should have pointed out, is that the difference is that commercial development has an answer- it can pay more. Following the usual supply and demand laws of capitalism, the lack of supply of appropriate programmers forces salaries up until it attracts enough and equilibrium is reached. Or, as Richard Fateman pointed out in the case of Macsyma Inc, a failing company cannot attract the appropriate team and the failure is accelerated. In the contributed open source project, there is no salary or similar compensation mechanism. Indeed economics works in the opposite direction. Because the pool of contributors that you want to attract are particularly skilled, their time is more valuable, as they probably command high salaries in their day jobs, so it requires greater generosity to give their time for free.

The counter point to this are PhD students, who are typically highly skilled and expect to work for free, in pursuit of the qualification.

2) Tim Daly questioned why the complexity was a greater problem for part time contributors. Another point that I didn't make clearly enough. When I am deeply involved 8 hours a day in a programming project, I know every line of code that I am working on, the data structures, the function names and argument orders are all in my head. My 8 hours are all productive (apart from my human weaknesses of coffee breaks, day-dreaming etc). When I come back to that same code weeks or months later to fix a bug that might otherwise have taken 5 minutes, I spend the first two hours familiarizing myself with the flow of the code, that I have long since forgotten. If you take this over simplified example, and extrapolate. One full time programmer, achieves the same output as 24 programmers who work for one hour. Or another way, a full time programmer achieves the same in 2 hours, that a part time programmer manages in a year of working for an hour a week. Of course, the reality depends greatly on the amount you forget per break for given complexity of code, but I am sure you get the point.

As was pointed out there are many projects that are low on complexity and require widely available skills eg documentation translation, but these are pointless unless the central features are delivered.

The third strand, was from Tim Daly, and is the most interesting, being from a very different point of view (and well worth reading). If I can oversimplify his arguments- Axiom's purpose is not competition or dominance but in the act of donation where your work is available to others. He talks of a "30 year" time frame over which companies may well have gone bust, but free information lives on.

I was very much drawn to his view, but my cynical free-market thinking took over again. You can't separate yourself from the market, just because you are non-commercial. Even ideas need marketing, this is the purpose of publishing in journals, and presenting at conferences. If it was just to make the information public, it would be sufficient to put a copy of your research in the library and let it sit there waiting to be discovered. There is a wealth of research out there that is doing no good to anyone because it has long been forgotten or because competing ideas have been presented in a more compelling way. So Axiom must compete. Not for revenue perhaps, but at least for some level of mind share, and users, to keep attracting contributors.

The central question that I came back to was "If I produce some original piece of work that I want to share, should I publish it as a contribution to Axiom, or as a free piece of Mathematica or Matlab code and submit it to their share libraries?". Just like publishing a paper, I want it in the most prestigious place which gives it the widest exposure. Axiom must try to be that place, and I don't think it can take 30 years to get there.

1 comment:

Anonymous said...

You claim:

"There is a wealth of research out there that is doing no good to anyone
because it has long been forgotten or because competing ideas have been
presented in a more compelling way. So Axiom must compete. Not for revenue
perhaps, but at least for some level of mind share, and users, to keep
attracting contributors."

As Bill Clinton might say "That depends on what you mean by 'compete'.
If we look at the available playing fields where the competition takes
place we can begin to 'segment the market', so to speak.

Playing field 1: Price. Who cares?

Playing field 2: Mindshare. Hmmm. Clearly Axiom does not have the marketing
and sales team necessary to support hundreds of real customers who want to
do teaching or scientific computation in the workplace. But that's a good
thing because Axiom isn't really structured to support a large customer
base. All of that support effort would generate a lot of time answering
questions that are in the documentation.

Playing field 3: Computational Science Mindshare. Here, we care.
There are algorithms that Axiom cannot yet do such as symbolic summation.
We would like to have this but, so far, no one has written it. But I
believe that over the next 30 years someone will. I wish we could get
NSF or company grant funding or university funding to support such work
but that hasn't happened. Note, however, that the Risch integration
algorithm is based on work from the 1800s. The integration algorithm
belonged to the "wealth of research out there that is doing no good to
anyone because it has long been forgotten". You want it now. I want it right.

Playing field 4: Research. Here I believe we are the strongest on the field.
Axiom allows you to completely modify anything and everything about itself.
It is an ideal research platform. Indeed, that was the original idea. We have
several threads of ideas making the rounds on the Axiom mailing lists that
represent new directions in computational mathematics. I'm spending time
researching the question of provisos. The end result will deeply restructure
the system and change the way all algorithms work. This kind of research
cannot be done in MMA or Maple as you cannot change the core machinery.

Playing field 5: Quality. Here we care. In the long term the issue of
quality comes down to taking the time to do it right. I believe we can
win on this playing field because we have the time and we have people
who work on Axiom because they care.

Axiom competes. Not on dollars or mass markets but on fundamental
computational science. And Axiom will certainly be available and useful
30 years from now. That isn't true of the previous market leader, MACSYMA.
And you can't make that claim about MMA and Maple. Where do you want to
invest your computational science effort?

Tim Daly
Axiom Lead Developer