Re: Reservations about <mchar> from David Carlisle on 2000-05-03 (www-math@w3.org from May 2000)

From: David Carlisle <davidc@nag.co.uk>
Date: Wed, 3 May 2000 10:18:44 +0100 (BST)
To: rbs@maths.uq.edu.au
CC: www-math@w3.org
Message-Id: <200005030918.KAA10618@nag.co.uk>
> Is this option still open/valid?

It depends on what notion of validity we want to work with.

It is of course possible, and practical, to have any mathml renderer
just `know' the MathML entities, this notion of entities was formalised
in SGML as SDATA entities (entities which don't have a definition in the
DTD, other than being declared SDATA, and which are rendered in a system
specified way) This is how character entities (eg the latin 1 entities in
HTML) were typically defined.

I wasn't involved in the discussions leading to XML but somewhere along
the line the SDATA concept was deliberately dropped. In XML all entities
have to have a definition. This is why there is far more pressure to get
math characters into unicode for XML usage than there was for SGML, in
SGML &rightarrow; could have just been a name, but for XML it has to be
defined to be something, either a unicode character or an mchar node or
whatever.


Thus a fragment

<mrow><mi>A</mi><mo>&rightarrow;</mo><mi>B</mi></mrow>

is not well formed XML. Even if in practice any system that can do
anything at all with MathML will auto-load the entity definitions
and so the above will work.

The main problem with such a solution (and I assume its the reason SDATA
was dropped in the first place) is that such fragments cause problems
with generic XML systems that may not be MathML aware, indexers, web
crawlers and these days an untold number of other applications may pick
up files that have an XML mime type (or some other designation of being
XML) and pass them through an XML parser. The whole motivating idea of
XML as opposed to SGML was that you should be able to do that and get a
parse tree (even if you couldn't do much else with it) just from the
document instance, without reference to a DTD. But the above will
generate a `fatal error' to an XML parser.

So by having entities, we pay the price of XML, explicit and verbose
</xxx> end tags everywhere making the parse tree explicit so not relying
on a DTD to provide any information about implied tags, but you don't
get the intended benefit of DTDless parsing.

But all this is the argument against entities (I've tried to say what
the argument is, without actually arguing one way or the other:
there are pros and cons)

It is not, as such, an argument _for_ mchar.

The argument for mchar is that _if_ we deprecate entities and _if_
we want some way of accessing characters by name using ascii markup
then we have to use an element or attribute syntax.

Currently in the mathml 2 draft you have any of the following equivalent
forms


&xarr;
&#x02192;
→
<mchar name="xarr"/>


the first three being equivalent to any XML parser (if I got the third
one right, which I think is utf8 for #x02192) and the third one of
course is not equivalent to the XML parse but is specified as being
equivalent to a MathML system. All but the first may be parsed by a non
validating parser without reference to a DTD.

If entities were deprecated (but available, of course) could we live
with just recommending that people use either the 2nd or 3rd of the
above syntaxes for an arrow. (The third would look a lot more palatable
of course if you were using a system that understood unicode encodings
had had mappings to a suitable font set.)

David
Received on Wednesday, 3 May 2000 05:21:07 UTC