Tuesday, February 28, 2006

Security risks in web based scientific computing

Pretty much every science software company has spun off a web version their products. But there is something odd about many of them - there is no sign of them on the makers websites. Sure there are pages of prose and pretty pictures telling you about them, but rarely any live demo. No chance to interact with an example deployment. It seems obvious that this would be the best way to showcase or trial such products.

And so it is with the Matlab Web Server that I was researching when this fact dawned on me. It's not that Mathworks doesn't consider its website to be an important part of its business and promotion. Then it occured to me that it might be because their website is important to them that they have no live demo.

The architecture of the system is: The web browser connects to a TCP/IP client "matweb", on the server that passes all requests to the matlabserver that connects to Matlab where the task is computed. So the matlab server is exposed to full scrutiny of the internet community, including the unfriendly ones.

There is a great power struggle going on out there between security companies and the hackers. With mighty companies like Microsoft and real experts like The Apache Foundation struggling to keep up with the latest tricks of the enemy. Would you trust the security of your web server made to a company who's expertise is not in security? Well it looks like neither would Mathworks!

Perhaps this is paranoid, but when you look at the documentation for the Matlab Web Server, there does not appear to be even a mention of security, let alone any serious instruction.

Since most scientific software are essentially programming languages, they are very dangerous to give unrestricted access to. You must block two levels of access:

First you must control the tasks that the user requests to get computed (Matlab, like most other languages, can be instructed to delete files, launch programs like telnet daemons, or copy and send files such as your password file).

Second you must make sure that you are opening a door only to the application that you intended.

One line of the documentation that interested me was:
"observe that the line
<input name="mlmfile" value="webmagic" type="hidden">
sets argument mlmfile to the value webmagic. The mlmfile argument contains the name of the MATLAB M-file to run."

So the choice of what program to run is visible in the page source! Sure enough, if you find some user's pages and do "Show source" you can see this "hidden" field in the HTML. This means that I can instruct that user's Matlab to run any .m file that I can predict to exist on his server, just by copying the HTML and editing that line and opening it in a browser. Now hands up who knows the matlab installation and most popular addons well enough to know which could be misused if malicious user could run them? Also, if I can place a file on the server by some other exploit, I now have a way to execute it. I'm sure a real hacker would have plenty of ideas.

If you are considering any other web based deployment system for a language, a good test might be to see if the supplier trusts it enough to run on their own websites.

Thursday, February 23, 2006

Open source technical software - dead end?

The open source movement has had a large impact on the software industry in recent years. I for one am a huge fan of Firefox and Thunderbird from the Mozilla project and a modestly satisfied user of OpenOffice. Can scientific software expect the same kind of open source benefits?

Firstly, it has to be said, on the smaller scale end of software, scientific software is dominated and always has been by open source code. When small specialist problems are solved in software, then there is often no commercial opportunity, so developers put code into the public domain rather than do nothing with it.

But the reality is very different with large software products. Complexity of software development rises disproportionately with the size of source code. This requires teams, which in turn require coordination and more people. It rapidly becomes expensive. And it becomes too involved to have developers spend just small amounts of their time on it.

When you look around at large scale open source projects they are nearly all Commercial software that has been released. e.g. Firefox was once Netscape. OpenOffice was once StarOffice.

On this basis one can probably assume that any future major scientific software open source project already exists. It is either one of the existing open source projects (Axiom, Maxima, Reduce, Scilab) or it is one of the commercial products if its owners fail to make it pay or fail as companies (Matlab, Mathematica, Maple, Mupad etc). Since these all these products are the central revenue of their companies, we can rule out the code being donated, like IBM released Eclipse. With the exception of Mupad, their companies look pretty healthy too.

So let us turn our attention to the three main large open source projects Axiom, Maxima and Scilab. Do any of them have the potential to become the killer free science app? Well no, and let me argue why:

1) The knowledge required to be a contriubutor is a level above that required to be a contributor to, say, Firefox. You need to be both a competant programmer AND a competant mathematician to be able to code complicated and debug algorithms. The world may be full of keen students who want the kudos of getting a few lines of code into Firefox, many fewer who are actually good enough programmers, and fewer still who might know the math to add to, say, Axiom. This means that these projects will never be able to keep up with the development rate of the commercial packages. They are behind already, and will only fall further back.

2) Large technical computing systems have a lot of internal dependancies. You may be able to take a task like "change the bookmark mechanism" in Firefox and be pretty confident that that team won't affect the way your browser renders pages. Ask someone to add features to, say, linear algebra, and you might affect equation solving, statistics, ODE solvers and many more areas that each affect other components. This again raises the technical requirement of your contributors, reducing the pool further AND adds a significant management overhead and system design overhead.

3) To support the costs, free software needs major financial contribution. OpenOffice gets the backing of Sun, who would like to sink Microsoft Office, Eclipse is backed by lots of big companies like IBM (who want it as a platform for their commercial tools). Only Scilab has any backers. The Axiom group recently commented that they had their first funded project. A $4500 grant from the Google "Summer of code" one time charity. And that 2 month project does not appear to have been delivered after 5 months!

4) The major selling point of free software is "it's free". But of course there are other costs to installing new software. Principally the training cost. It takes next to no time to learn how to use a word processor so OpenOffice is still almost free. But the learning time for scientific software can be major. So if it takes a month to learn then Axiom is now a $2000 product. If you add the same cost to, for example, Mathematica it is a $4000 product and arguably with the better support of the large user base, lots of materials, books training etc, that comes with a commercial system, perhaps it is only $3000. Free is still cheaper but the relative difference is much smaller than it is with simple consumer products.

5) Every successful open source project has established itself as THE alternative. Firefox is THE choice of browser after IE (who really has heard of Opera?) OpenOffice is THE office suite etc. This isn't true yet in the science software. Two must die to give the third any recognition, and if MuPAD were to join the group, the problem would get worse.

Tuesday, February 21, 2006

Maple handwriting recognition: useful or gimmick?

One of the more original features of Maple 10 was its handwriting recognition tool. Quoting the PR:

" With over 1000 symbols, it could be a daunting task to find the symbol you want. But with the handwriting recognition built into Maple 10, it's easy!

Simply use the mouse to sketch the symbol in the Symbol Recognition palette, just like you would draw it on paper. The symbol rec
ognizer will search through all the symbols and find the symbol you need."

The question is- "Is this a useful tool or just a gimmick". I set out to find out.

The first problem is a human failure. With only a mouse to do the handwriting, it is actually quite hard to draw the symbol you want unless you slow down and concentr
ate. Here are screenshots of my first attempts at drawing an Infinity symbol:


The list of characters at below the two buttons are Maple's suggestion for what I meant. With this writing, it should probably be forgiven for its suggestions of "M" and "&".

However, when you take more care, the suggestions don't get much better.







These seem like pretty good infinity symbols to me, but I never managed to get a match, even though the symbol is supported. "%" was the most common response and I can't see why it matched to "D" in this example.

The results varied depending on the symbol, arrows and comparisons frequently gave useful suggestions, but lots of the symbols were frustratingly hard to match or impossible. Following are some more screenshots of decent attempts to draw supported characters, with the failed matches:



















The most surprising was Pi, since it was on the face of the button you press for a match and very likely to be needed. I never managed to get a match for Pi out of lots of attempts. Most often Maple suggested "H"


Where it did do well, with almost flawless matching was ASCII characters. Presumably because ASCII character recognition was already a solved problem. Of course, those are the characters where it is of no use in this context.

n reality, the need isn't really there anyway. 1000 characters may sound like a lot, but there are over 100 on your keyboard already, and once the remaining 900 have been broken down into 12 catagories, it is pretty easy to scan, say 70 arrow symbols, by eye, for the right one because, unlike Maple, your eyes are good at character recognition.

This feature was a lot of fun, seeing what the character lottery would throw up, but useful or gimmick? Strictly gimmick.

Friday, February 17, 2006

Mathematica released for Mac Intel

It was not long ago that I wrote about the hints and statements coming out of the scientific software companies about support for the new Mac Intel platform.

http://scientificcomputing.blogspot.com/2006/01/mac-intel-race-is-on.html

Well the race has a winner, and it is Wolfram Research, who have now started shipping a native Mac Intel version of Mathematica. Wolfram even claims that this is the first professional application, not made by Apple, for Mac Intel.

http://www.wolfram.com/news/intelmac.html

Now we must wait to find out if it was a close run race or if the other horses are heading for the glue factory!

While Apple will be pleased that there is serious support for its new platform, this will mean little to Wolfram's sales in the short term. But it does tell us much about Wolfram's attitude to new technology and the efficiency of its development group.

Tuesday, February 14, 2006

MAA Placement tests in MapleTA

Back in July, Maplesoft and the MAA (Mathematical Association of America) announced a plan to create Placement Tests using Maplesoft's MapleTA online test software. Today they have released the first version.

http://www.maplesoft.com/products/placement/

Placement tests help to establish which courses a student needs and online delivery will make the process more efficient.

The MAA is not the only provider and had previously discontinued its Placement Test program. It clearly feels confident enough that the online delivery makes it worthwhile again that it has signed a five year agreement with Maplesoft and has mapped out the plan for the product well into that time:

  • CPTS v 1.0. Consists of four parallel forms of each of six tests: Arithmetic and Skills, Basic Algebra, Algebra, Advanced Algebra, Trigonometry and Elementary Functions, and Calculus Readiness.

  • CPTS v 2.0. will consist of the six tests included in Version 1.0; added to these tests will be parallel forms of each of four calculator-based tests that use a “pop-up” calculator. The calculator-based versions of Arithmetic and Skills , Basic Algebra , Algebra , and Calculus Readiness are the ones included in Version 2.0 that were not a part of Version 1.0.

  • CPTS v 3.0. will consist of placement tests that include some items for which students supply the responses; that is, these items will not be multiple-choice ones.

  • CPTS v 4.0. will consist of the placement tests of Version 3.0 and will be administered in a computer-sequenced environment.

  • CPTS v 5.0. will be a full-fledged Computer Adaptive Testing System.

Since the details of the agreement are not public, it is not clear who is taking the risk, but with prices claimed to be as little as $2, the margins are tight in finding customers but once signed up, customers will take little effort to support.

Friday, February 10, 2006

Statistical Inference package for Mathematica

Wolfram Research have released a new Statistical Inference Package.

This is an add-on to Mathematica and claims the following features:

  • Likelihood-based statistical inference for sophisticated statistical models
    • Profile-likelihood-based confidence intervals for general parametric functions
    • Likelihood ratio test for general parametric hypotheses
    • Maximum likelihood estimates

  • Large collection of generalized statistical distributions and models
  • Generalized regression models
  • Generalized stochastic and hierarchical models
    • Generalized hidden Markov models

  • Automatic handling of censored data
  • Random observation from any statistical distribution or model
  • Symbolic computation of statistical model properties
With recent versions of Mathematica emphasizing speed of numeric data, more statistics functionality seems logical. I wouldn't be surprised if there was more to come.

Details can be found at http://www.wolfram.com/products/applications/sip/

Tuesday, February 07, 2006

Rumored product from Maplesoft: Blockbuilder for Simulink

Rumor has it that Maplesoft are working on a product to that will allow Maple use with Mathworks' Simulink.

Details are not clear: "This product will combine the power and ease-of-use of Maple with specialized dynamic analysis tools and an S-Function Generator that will allow you to automatically convert Maple-derived mathematical models into a Simulink block (in C or MATLAB) that can be used as part of your simulation, ultimately for real-time execution in hardware-in-the-loop applications. "

In practice this sounds like the Maple Code Generation package, which is used to generate C, FORTRAN and Visual Basic code, is being extended to support the S-Function syntax and sold as a separate add-on.

If this is true, the problem will be that the the existing Code Generation package just isn't that good. First there is a conceptual issue, that a computer algebra system contains lots of operations that can have no equivalent in a numerical language. e.g. Symbolic integration or simplification of an expression. So you have to limit your programing in Maple to have parts of the system that are numeric operations.

But even when you adjust to this, parts that could be translated, simply don't. The documentation for Code Generation contains phrases like "Support for modules is limited", "Translation of repetition statements is not fully supported."

By the time you limit yourself to a subset of the system learned the limitations of what can be translated, you have little benefit left compared to just writing directly in the target language. The Code Generation tools are good for little more than converting a formula into appropriate syntax - 2**3 for FORTRAN 2^3 for Visual Basic etc.

If you want symbolic computation converted to other languages, you should consider MathCode C++ or MathCode F90 from MathCore http://www.mathcore.com. This still has the conceptual barrier, but what it does cover, it covers well.

A less likely possibility is that this will provide for Maple to be a component within a Simulink system. But a Maple link is likely far too slow to make this viable for all but the simplest systems.

Thursday, February 02, 2006

Firefox 1.5.0.1 Released

Firefox's steady growth in share of the browser market and steady development cycles carry a small but significant feature for scientific computing: It is the only serious browser with built in support for MathML - the XML markup language for mathematical expressions.

MathML has been around for some years now, but Microsoft's disinterest in it has confined its use to a small number of specialist websites.

If Firefox can ever reach parity with IE, perhaps MathML use in technical websites will become a reasonable expectation. For now mathematical expressions typically appear as images - pretty, but useless.

read more | digg story