Re: Reservations about <mchar> from David Carlisle on 2000-05-02 (www-math@w3.org from May 2000)

From: David Carlisle <davidc@nag.co.uk>
Date: Tue, 2 May 2000 16:23:46 +0100 (BST)
To: rminer@geomtech.com
CC: www-math@w3.org, rbs@maths.uq.edu.au
Message-Id: <200005021523.QAA20642@nag.co.uk>

> If a MathML processor is
> getting a parse tree from the DOM, then the extra <mchar> nodes are
> more expensive than entities which resolve to character data.

Yes, this is certainly true, as you get more nodes. (although doesn't
the DOM produce a node for entity references as well?) But I think
that the same would be true of any syntax other than entity references
or character data.

> A statistical profile of the density of characters needing to
> be accessed through <mchar> in a typical document would be very
> useful. 

It could, in some areas, be the majority of the character data in the
mathml expression (but probably not the majority in a typical document
consisting of text interspersed with mathematics).

So the cost is real and not negligible, I would say.

However, the only real alternative if we get rid of mchar is to
deprecate entity references and use instead unicode character data, or
numeric character references. Technically this works well, but removes
any remaining pretence that it is possble to hand write or read MathML
without MathML tools. This is another real cost, perhaps harder to
quantify.

Or we don't deprecate entity references.
The schema issue isn't a total block on using entities as you can still
use <!DOCTYPE whether or not you in addition specify a schema, but the
harder issue is ensuring that MathML fragments remain well formed, which
means ensuring that MathML that is "cut and pasted" from one place to
another either has any entities expanded to character data as the
expression is "cut" or the document into which the fragment is pasted
has its DTD modified to include the MathMl entity declarations.
(It may be that this is automatic as the original entities will have
been expanded by the xml parser and so are not there to be cut, but I'd
like to be sure of this:-)

All of these are I think viable alternatives, but all of them have some
nasty side effects. Currently I think I'm still happiest with mchar
although I haven't implemented mchar at all yet in my own TeX based
MathML renderer as it would be a pain to implement, needing to duplicate
much of the support for unicode character data:-)

If however the implementation costs on mchar turn out to be too high
I agree that we might need to reconsider this.......


David

Received on Tuesday, 2 May 2000 11:26:55 UTC