Friday, December 16, 2005

Math documents, XML and hidden traps


I took a closer look at the file formats of the three main computational
products that have a document system: MathCAD, Maple and Mathematica.

MathCAD and Maple are both sporting new XML formats, Mathematica has a
LISP like functional text format and option to save in XML.

With XML so widely supported, this seems like good news for portable
technical information. But beware, danger lurks below the surface.

While the first half of the MathCAD file is nice, understandable text like

<math optimize="false" export="false" disable-calc="false">
<ml:define>
<ml:id subscript="15">z</ml:id>
<ml:apply>
<ml:id>Z</ml:id>
<ml:sequence>
<ml:real>1</ml:real>
<ml:real>5</ml:real>
</ml:sequence>
</ml:apply>
</ml:define>
</math>

which one could import into another system and manipulate, index, search
etc. Suddenly half way down you find entries like

<binaryContent>
<item
item-id="1">iVBORw0KGgoAAAANSUhEUgAAAHgAAAAvCAIAAACnjZnKAAAAAXNSR0IArs4c6QAAAARnQU1B

AACxjwv8YQUAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAA
AkVJREFUeF7tmdtyBCEIRJP//+jJZbYmFmCDLBKsYp9S2RGaI7au83ld10d/Egh8g67zueut
oydQSa2qGnTg1KJQDbpBywQeb1cBtXWoiKYPjNuJurU0aD/ocWSDjuGIo6iUf87QGULMOU7c
DI2aG7S5C...

A similar trick is played in the Maple file- images and typesetting in
some encrypted binary format.

Only the Mathematica formats are entirely open text although you would
need to map the LISP syntax

Cell[BoxData[
SqrtBox["2"]], "Input"]

to XML version

<Cell class='Input'>
<BoxData>
<List>
<SqrtBox>
<String>2</String>
</SqrtBox>
</List>
</BoxData>
<Style>
<String>Input</String>
</Style>
</Cell>

Unless you remember to use Save As XML as your standard format.

So why follow XML semantics but obscure the contents?

The first part is easy- XML is a buzz word and everyone wants to say
they support it, and with open source code available to manipulate it,
it cuts R&D costs.

I think the obscuring is for different reasons between the two
offenders. MathSoft also sells a document management tool. How can you
hope to sell that against better, and probably cheaper, XML management
tools? Easy, make the contents choke the XML tools so only your tool can
do the job.

For Maplesoft, who have no tools to handle the documents apart from the
creator, the reason is, I suspect, very traditional. By locking the
content into an un-intelligible form, it makes it hard for users to
migrate to something else.

Well done Wolfram for staying away from the oldest customer exploiting
tricks in software. But please make the XML the default method soon.

1 comment:

Anonymous said...

Hello,

Mathcad defaults to saving the rendering of each math region in a worksheet in image form as well as in XML form.

The binary content you see at the end of a Mathcad file is simply a text encoding of that binary image data. That's all there is to it... no sinister motivation on PTC/Mathsoft's part. The Mathcad XML format was designed with end-user cataloging and manipulation tools in mind.