Re: comments on MathML last call from David Carlisle on 2003-05-08 (www-math@w3.org from May 2003)

From: David Carlisle <davidc@nag.co.uk>
Date: Thu, 8 May 2003 09:55:42 +0100
To: duerst@w3.org
CC: www-math@w3.org, w3c-i18n-ig@w3.org
Message-Id: <200305080855.JAA28186@penguin.nag.co.uk>
Martin, thanks for your coments.

> This is currently a personal review.

Likewise this is a personal response (A group response will follow once
we've had time discuss any issues raised).


> Overall: It would be extremely nice to have an index of elements
> and attributes. Given all the technology used for producing the
> spec, this shouldn't be a problem at all.

I've wondered about that in the past. I think the main worry was
increasing the size of an already large spec, but I may see what can be
generated and see if the working group like any of the results. it is
certainly true that every reference to an element or attribute is marked up
and could be indexed, although  to make a _good_ index would probably
require a lot of work to distinguish defining or main uses, as opposed
to examples, flag ranges which should be indexed as a block, etc I
suspect that would not be possible at this time.



> "In practice, non-ASCII characters will typically be represented by entity 
> references."
>
> This sentence should be removed. 

I agree (but note this is a personal response).


> If you want
> an easy way out, substitute "character glyphs" with
> "characters/glyphs". This at least makes it clear that
> these are two different things.

The wording could be improved, but I think characters/glyphs would
actually be worse as it implies two different things. mglyph
is for accessing glyphs (representing characters) from fonts.
as opposed to the mchar element which was part of some of the mathml 2
WDs but removed before the  rec, which was an element taking a name
representing a character (as an alternative to using entities/character
data).


> It would be good to have a clear explanation of 'encoding' here,
> so that people don't confuse it with the 'encoding' pseudo-attribute
> in the XML declaration.

I'm sure we can think of a clarifying sentence to insert here.


> 6.1 Introduction [several individual comments]

This text was heavily edited in this 2nd edition to remove most
references to the use of "proposed" characters (since the main
motivation for having a 2nd edition is that these characters are now
mostly standardised (and we will just have to learn to live without the
remaining ones which didn't make it into Unicode). However you've shown
a few places where such references were missed. I'm sure this can be
easily fixed.

>> for example, encoded in Latin-1 (ISO-8859-1) then only the 
>> characters in that encoding are available directly. Unfortunately, most 
>> mathematical symbols may not be encoded as character data in this
>> way.

> The last sentence is misleading. Using UTF-8 or UTF-16, the two only
> encodings that all XML processors are required to accept, mathematical
> symbols can be encoded as character data.

The intention of that bit was to highlight that using latin-1 (for
example) may not be the best idea as it limits direct access to
most of the mathematical characters, and (by implication authors may
want to consider using utf*). You clearly didn't read it that way, so
perhaps explictly mentioning utf.. here would clarify things.



> This is again more version/history-oriented than necessary. What about:
>
> Most of these characters are in Plane 1, except for a few characters (such 
> as the double-struck letters N, P, Z, Q, R, C, H representing common number 
> sets) which are in the BMP.

As I say above most of your comments about "future" references to
Unicode 3.1 and 3.2 I accept but in this case I think that the
historical perspective is useful. Your re-wording tries to make it seem
perfectly natural that these characters should be out of sequence,
whereas the holes in the plane 1 block are a regrettable and potentially
confusing feature, and only make sense at all if some of the history as
to why they are there is given.

> This is even more so in that the problems in
> IE, according to our knowledge, have been fixed.

IE 6 SP1 did finally fix the problem in IE6 if you have IE6 SP1.
Of course there are thousands (millions?) of copies of IE6 still out
there. James Clark's nsgmls also has the same restriction (although 
onsgmls does now work with characters out of the BMP I believe).

The MathML DTD uses plane 1 characters, but I see nothing wrong with
giving end users an option not to do that. If they are using a system
that is crippled in this way, removing this feature from the DTD will
not fix their application and does not mean that they can no longer
do the re-mapping, it just means that they will have to redefine half a
dozen parameter entities (for the affected entity sets) to point to some
definition that works on their system penalising end users in this way
doesn't seem particulary useful.


> This production is clearly wrong, and needs to be fixed.

Thanks! I don't know how that has slipped past so many eyes
(I suspect that it has been that way since the first drafts of mathml,
ie since before xml...)


David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Received on Thursday, 8 May 2003 04:56:06 UTC