Re: comments on MathML last call from Martin Duerst on 2003-05-08 (www-math@w3.org from May 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 08 May 2003 14:44:33 -0400
To: David Carlisle <davidc@nag.co.uk>
Cc: www-math@w3.org, w3c-i18n-ig@w3.org
Message-Id: <4.2.0.58.J.20030508134636.03ca9bb0@localhost>
Hello David,

Many thanks for your speedy and very positive reply.
A few more comments below:


At 09:55 03/05/08 +0100, David Carlisle wrote:

> > Overall: It would be extremely nice to have an index of elements
> > and attributes. Given all the technology used for producing the
> > spec, this shouldn't be a problem at all.
>
>I've wondered about that in the past. I think the main worry was
>increasing the size of an already large spec, but I may see what can be
>generated and see if the working group like any of the results. it is
>certainly true that every reference to an element or attribute is marked up
>and could be indexed, although  to make a _good_ index would probably
>require a lot of work to distinguish defining or main uses, as opposed
>to examples, flag ranges which should be indexed as a block, etc I
>suspect that would not be possible at this time.

What I was thinking about was not a traditional index where every
even marginally relevant occurrence of a term is indexed, but just
e.g. a list of elements/attributes/properties,... that leads to
the relevant definitions. Some examples:

http://www.w3.org/TR/html401/index/elements.html
http://www.w3.org/TR/html401/index/attributes.html
http://www.w3.org/TR/REC-CSS2/propidx.html
http://www.w3.org/TR/SVG/eltindex.html
http://www.w3.org/TR/SVG/attindex.html
http://www.w3.org/TR/SVG/propidx.html

I'm sure there are more. These kinds of things are extremely
useful, and that's why they are actually linked from the top
of every part of the spec. My guess is that it shouldn't be
too difficult to create such a thing for MathML.


> > If you want
> > an easy way out, substitute "character glyphs" with
> > "characters/glyphs". This at least makes it clear that
> > these are two different things.
>
>The wording could be improved, but I think characters/glyphs would
>actually be worse as it implies two different things. mglyph
>is for accessing glyphs (representing characters) from fonts.
>as opposed to the mchar element which was part of some of the mathml 2
>WDs but removed before the  rec, which was an element taking a name
>representing a character (as an alternative to using entities/character
>data).

characters/glyphs was just a very quick shot, sorry.
I'm sure you can do better.



> > 6.1 Introduction [several individual comments]
>
>This text was heavily edited in this 2nd edition to remove most
>references to the use of "proposed" characters (since the main
>motivation for having a 2nd edition is that these characters are now
>mostly standardised (and we will just have to learn to live without the
>remaining ones which didn't make it into Unicode). However you've shown
>a few places where such references were missed. I'm sure this can be
>easily fixed.
>
> >> for example, encoded in Latin-1 (ISO-8859-1) then only the
> >> characters in that encoding are available directly. Unfortunately, most
> >> mathematical symbols may not be encoded as character data in this
> >> way.
>
> > The last sentence is misleading. Using UTF-8 or UTF-16, the two only
> > encodings that all XML processors are required to accept, mathematical
> > symbols can be encoded as character data.
>
>The intention of that bit was to highlight that using latin-1 (for
>example) may not be the best idea as it limits direct access to
>most of the mathematical characters, and (by implication authors may
>want to consider using utf*). You clearly didn't read it that way, so
>perhaps explictly mentioning utf.. here would clarify things.

Yes, it was unclear what 'this way' referred to. What about something
like:

for example, encoded in Latin-1 (ISO-8859-1) then only the
characters in that encoding are available directly, which excludes
most mathematical symbols. Encodings such as UTF-8 and UTF-16
can directly encode all characters in Unicode.



> > This is again more version/history-oriented than necessary. What about:
> >
> > Most of these characters are in Plane 1, except for a few characters (such
> > as the double-struck letters N, P, Z, Q, R, C, H representing common 
> number
> > sets) which are in the BMP.
>
>As I say above most of your comments about "future" references to
>Unicode 3.1 and 3.2 I accept but in this case I think that the
>historical perspective is useful. Your re-wording tries to make it seem
>perfectly natural that these characters should be out of sequence,

I didn't try to make this sound perfectly natural. I just tried to
just state the facts.


>whereas the holes in the plane 1 block are a regrettable

We might agree on 'regrettable', but that doesn't belong in
the spec.


>and potentially
>confusing feature, and only make sense at all if some of the history as
>to why they are there is given.

Well, adding 'for historic reasons' at the end of the above
sentence would probably do the job, in that it tells people
not to search for deeper reasons. What's important for
people is the 'watch out' warning. People interested in
history can find it in various places on the Web.


> > This is even more so in that the problems in
> > IE, according to our knowledge, have been fixed.
>
>IE 6 SP1 did finally fix the problem in IE6 if you have IE6 SP1.
>Of course there are thousands (millions?) of copies of IE6 still out
>there. James Clark's nsgmls also has the same restriction (although
>onsgmls does now work with characters out of the BMP I believe).

I expected to find information about this on the MathML home
page. Getting the message out to use SP1 and onsgmls is really
important. This is much more efficient than to keep the spec
the way it is. PLEASE GET THIS MESSAGE OUT LOUDLY AND CLEARLY.

Before these fixes, there was a somewhat justified fear that
adherence to standards would delay the spread of MathML to
an undue extent. This is no longer justified.


>The MathML DTD uses plane 1 characters, but I see nothing wrong with
>giving end users an option not to do that.

There would be a lot wrong with that. First, W3C would acknowledge
that it is okay to have non-conforming XML processors, and would
encourage users to use them. A lot of experience over the last
few years has shown that unfortunately, users don't need any
encouragement to do this. We need to encourage them to do the
right thing.

Second, redefining the entities to point to the PUA changes the
infoset of the document. As soon as that document goes through
some kind of advanced processing (e.g. XSLT), these changes
are frozen. The resulting document will not be understood by a
conforming MathML processor.


>If they are using a system
>that is crippled in this way, removing this feature from the DTD will
>not fix their application and does not mean that they can no longer
>do the re-mapping, it just means that they will have to redefine half a
>dozen parameter entities (for the affected entity sets) to point to some
>definition that works on their system penalising end users in this way
>doesn't seem particulary useful.

As far as I see, if a document comes in that points to the correct
DTD (with the 'correct' parameter entity), then such a document won't
work on a nonconforming system. So the user has change something
anyway before it will work. Whether this is adding a parameter entity
definition or changing the location of the DTD doesn't seem to make
too much of a difference for the user. So the clean solution is to
produce a DTD with only the correct stuff (without the parameter entity)
for the spec, and have people with nonconforming systems point to a
different DTD, and mention that on the homepage, but not in the spec.
The way it should be mentioned on the homepage is:

Question: I'm using IE, and have problems with MathML. What to do?
     1) Upgrade to IE6 SP1 (or later)
     2) if that doesn't work, change the reference to the DTD in the file
        to this older one: ...

(and same for sgmls).


Regards,    Martin.
Received on Thursday, 8 May 2003 15:28:25 UTC