- From: David Carlisle <davidc@nag.co.uk>
- Date: Wed, 3 May 2000 10:18:44 +0100 (BST)
- To: rbs@maths.uq.edu.au
- CC: www-math@w3.org
> Is this option still open/valid? It depends on what notion of validity we want to work with. It is of course possible, and practical, to have any mathml renderer just `know' the MathML entities, this notion of entities was formalised in SGML as SDATA entities (entities which don't have a definition in the DTD, other than being declared SDATA, and which are rendered in a system specified way) This is how character entities (eg the latin 1 entities in HTML) were typically defined. I wasn't involved in the discussions leading to XML but somewhere along the line the SDATA concept was deliberately dropped. In XML all entities have to have a definition. This is why there is far more pressure to get math characters into unicode for XML usage than there was for SGML, in SGML → could have just been a name, but for XML it has to be defined to be something, either a unicode character or an mchar node or whatever. Thus a fragment <mrow><mi>A</mi><mo>→</mo><mi>B</mi></mrow> is not well formed XML. Even if in practice any system that can do anything at all with MathML will auto-load the entity definitions and so the above will work. The main problem with such a solution (and I assume its the reason SDATA was dropped in the first place) is that such fragments cause problems with generic XML systems that may not be MathML aware, indexers, web crawlers and these days an untold number of other applications may pick up files that have an XML mime type (or some other designation of being XML) and pass them through an XML parser. The whole motivating idea of XML as opposed to SGML was that you should be able to do that and get a parse tree (even if you couldn't do much else with it) just from the document instance, without reference to a DTD. But the above will generate a `fatal error' to an XML parser. So by having entities, we pay the price of XML, explicit and verbose </xxx> end tags everywhere making the parse tree explicit so not relying on a DTD to provide any information about implied tags, but you don't get the intended benefit of DTDless parsing. But all this is the argument against entities (I've tried to say what the argument is, without actually arguing one way or the other: there are pros and cons) It is not, as such, an argument _for_ mchar. The argument for mchar is that _if_ we deprecate entities and _if_ we want some way of accessing characters by name using ascii markup then we have to use an element or attribute syntax. Currently in the mathml 2 draft you have any of the following equivalent forms &xarr; → → <mchar name="xarr"/> the first three being equivalent to any XML parser (if I got the third one right, which I think is utf8 for #x02192) and the third one of course is not equivalent to the XML parse but is specified as being equivalent to a MathML system. All but the first may be parsed by a non validating parser without reference to a DTD. If entities were deprecated (but available, of course) could we live with just recommending that people use either the 2nd or 3rd of the above syntaxes for an arrow. (The third would look a lot more palatable of course if you were using a system that understood unicode encodings had had mappings to a suitable font set.) David
Received on Wednesday, 3 May 2000 05:21:07 UTC