Content MathML in text/html [was: Re: Exploring new vocabularies for HTML]

Neil Soiffer wrote:
> 
> 
> On Sat, Mar 29, 2008 at 9:02 PM, Ian Hickson <ian@hixie.ch 
> <mailto:ian@hixie.ch>> wrote:
>     Nobody includes the equivalent of content MathML when writing their
>     papers
>     using LaTeX. Why would they do so with HTML?
> 
> 
> They don't do it in LaTeX because they can't do much with it.  In 
> MathML, you can copy and paste the expression you see on a web page into 
> a computation or graphing system and do something with it.  Most of the 
> time, the system can guess correctly what is meant by the MathML, but 
> mathematicians love to reuse notations for similar concepts.  Once you 
> step outside the bounds of  "everyday" math, the likelihood of guessing 
> incorrectly starts to go up quickly.

I believe the main use cases suggested so far for Content MathML inside 
text/html are:

1. Possibility of copy/paste or loading into computer algebra engines
2. Mathematical search
3. Ability to understand an equation out of context

Looking at these in reverse order:

3. Seems like a non-issue to me. In general it is not possible to take a 
random mathematical expression (e.g. in print or hand-written) and 
understand it without knowing something about the 
presentation-to-semantics mapping being used by the author. I see no 
requirement (except insofar as 3 is a superset of 1 and 2) that the web 
do better here.

My naive reacion to 2. is that it is unlikely to work. Do you have an 
functional example of such a system? In particular it seems like the 
syntax for entering expressions would have to be very simple, or it 
would be too tedious to use. Moreover, the majority of mathematics on 
the web will not be annotated with extra semantics even if it is 
technically possible to do so. Therefore it is reasonable to assume that 
an engine that requires such annotations will be of limited use except 
in walled-garden environments.

1. Certainly seems like a "nice to have" feature, but it's hard to see 
it as more than that. As for 2 the fact that most maths will not have 
any additional annotations means that this will simply not work in the 
majority of cases, creating an inconsistent experience for people trying 
to load web pages in e.g. mathematica. History suggests that features 
that only work on a small subset of pages are not used at-all (e.g. 
toolbars to support rel="prev, next, home, etc." have never taken off 
because they are useful on so few pages).

This suggests that the use cases for Content MathML in text/html are not 
compelling from the point of view of the end user.

Given that at the moment the only support for equations in text/html is 
via bitmaps, ascii art, or using js to implement the rendering, it seems 
prudent to start by addressing the question of how to give the web a 
*TeX-level of support for mathematics  without trying to bolt on a large 
degree of additional complexity to support beyond-*TeX level features. 
This is in part because, as far as I can see, most mathematics intended 
for the web is currently, and is likely to remain in the future, 
produced either using a GUI tool like Word or by converting *TeX. Most 
people producing mathematics already know one or more of these tools and 
have little interest in learning new tools to change their publication 
workflow. All the tools I am familiar with for authoring maths for use 
in a web environment use *TeX-style input (e.g. [1], [2], [3]), which 
implies the output will be limited to the capabilities of the *TeX 
allowed for input.

Sure there are some people, particularly those in theoretical science, 
who make extensive use of computer algebra tools that necessarily 
process semantics, but those cases are a minority and, even then, the 
work is likely to be published using a presentation-only system.

Despite the assertion that LaTeX has support for defining semantics via 
macros, my personal experience is that this is almost never used and, 
when it is, it is more akin to the way @class is typically used in HTML 
i.e. as a styling hook so that one may, for example, trivially change 
all vectors from bold to underlined. Therefore, I believe that it is 
this level of functionality that we should be aiming for in text/html.

If it can be demonstrated that the case for content markup on the web is 
stronger than I have surmised (and, if you believe it is, it would be 
really useful to demonstrate it with examples of this kind of content 
actually working in the wild) there are ways to add it without requiring 
all the complexity of Content MathML in text/html. For example we could 
have an attribute that allows a <math> subtree to link to an external 
resource that provides the semantics. The external resource would be a 
well formed MathML-as-XML file containing the content MathML. 
Alternatively, we could decide that application/xhtml+xml is sufficient 
to cover the use case of exchanging computer-algebra output over the web 
  as such systems are presumably capable of generating well-formed 
output consistently. If our conclusions change for HTML 6 it shouldn't 
be any harder to add annotations back in than it is to add them now.


[1] http://www.math.union.edu/~dpvc/jsmath/
[2] http://golem.ph.utexas.edu/instiki/show/HomePage
[3] http://www.latex2html.org/

-- 
"Mixed up signals
Bullet train
People snuffed out in the brutal rain"
--Conner Oberst

Received on Sunday, 30 March 2008 14:50:14 UTC