Comments on MathML Last Call from Patrick D. F. Ion on 2003-07-02 (www-math@w3.org from July 2003)

From: Patrick D. F. Ion <ion@ams.org>
Date: Wed, 2 Jul 2003 16:42:54 -0400
To: www-math@w3.org
Cc: w3c-math-wg@w3.org
Message-Id: <a05200f00bb28f323da04@[130.44.25.30]>
Dear Martin,

Thank you very much for your characteristically careful
and detailed reading of the MathML 2 Revised Edition draft.
This is, at last, the reply to your various comments and
suggestions dealing with Chapter 6, most of which we have
simply adopted.

Your comments on the front page, and Chapter 3 and 4,
and Appendices A and B have been addressed separately.
Your message was:
http://lists.w3.org/Archives/Public/www-math/2003May/0026.html

I hope you will agree that any change not made is properly one of
stylistic preference.

Best regards,

	Patrick

-------
-------

In detail:

---
6.1 Introduction

  >>>>
<< It did not fall naturally within the purview of developing a specification
<< enabling mathematics to be used with HTML and producing a DTD for the
<< Working group this to worry about more than the entities allowed in the DTD.
  >>>>

<< "this" is weird.

FIXED: A typo from another change.

==========
<<
<< More general, the I18N WG has on various occasions requested that the
<< introduction in chapter 6 be seriously shortened to make sure the document
<< stays a spec rather than a historical account of a spec's history.

The text has been shortened quite a bit.  However, the presence of
explanatory text, to outline the situation to readers and implementors
who may not be aware of reasons for what they find strange, was
intentional.  By and large the MathML spec has been felt to read quite
well.  The spec, I suggest, has enough dry technical detail that
few will think it anything else.  A history of how the spec came to
be would require a lot more room.

====
<< "While a long process of review and adoption by UTC and ISO/IEC of the
<< characters of special interest to mathematics and MathML is now  complete
<< (Unicode Work in Progress) there remains the possibility of some further
<< modification of the lists of characters accepted, of the code assignments
<< for those adopted, or of the names given them by Unicode. To make sure any
<< possible corrections to relevant standards are taken into account, and for
<< the latest character tables and font information, see the W3C Math Working
<< Group home page and the Unicode site."

<< This is highly misleading. There is a very strong commitment by
<< Unicode and ISO to not change any codepoints or names. The characters
<< referenced in the spec to our knowledge all have been fully
<< accepted, and any language such as the above suggesting there
<< will be further changes is highly confusing and misleading and
<< should be removed.

As you are no doubt aware, although the invariability of a character
standard like Unicode is as desirable as ever, there seem to be changes
afoot again that will affect both mathematical encoding and W3C.
Unfortunately we do not have a situation in which someone can say,
as in the story of Daniel
"O king, establish the decree and sign the writing, that it
be not changed, according to the laws of the Medes and Persians
which altereth not."
Daniel 6:8-9

Thus it seems reasonable to retain a weakened version of the text above.
The reference to the Unicode Work in progress has been moved and clarified.

=====
<< "The parenthetical notation beginning with U+ is one recommended by Unicode
<< for referring to Unicode characters [see [Unicode], page xxviii]."

<< What about this notation is parenthetical? Proposal: remove 'parenthetical'.
The notation is in parentheses; that's what parenthetical means.
CHANGED for clarity TO
'notation, just introduced in parentheses,'

<< 'is one' -> 'is the one';
CHANGED per grammar To
'is that'

<< also, just introduce the notation, and then
<< avoid to list the same numbers twice, once without and once with U+.
The redundancy was felt to be of possible assistance to those not
already well familiar with Unicode notations for character codes.
====

6.2.1 Unicode Character Data

  >>>>>>>>
<<      * Using characters directly: For example, an A may be entered as 'A'
<< from a keyboard (character U+0041J). This option is only available if the
<< character encoding specified for the XML document includes the character.
<< Most commonly used encodings will have 'A' in the ASCII position. In many
<< encodings, characters may need more than one byte. Note that if the
<< document is, for example, encoded in Latin-1 (ISO-8859-1) then only the
<< characters in that encoding are available directly. Unfortunately, most
<< mathematical symbols may not be encoded as character data in this way.
  >>>>>>>>

<< The last sentence is misleading. Using UTF-8 or UTF-16, the two only
<< encodings that all XML processors are required to accept, mathematical
<< symbols can be encoded as character data.

As mentioned by David Carlisle
http://lists.w3.org/Archives/Public/www-math/2003May/0029.html
this didn't get across a point we intended.  We can adopt your
sentence:

LAST SENTENCE CHANGED TO
Using UTF-8 or UTF-16, the only two encodings that all XML
processors are required to accept, mathematical symbols can
be encoded as character data.

====

  >>>>
<< By using Character references it is always possible to access the entire
<< Unicode range.
  >>>>

<< 'Character references': inconsistent capitalization.

FIXED

=====

<< 6.2.2 Special Characters Not in Unicode

  >>>>
<< In these cases one may use the mglyph  element for direct access to a glyph
<< from some font and creation of a MathML character corresponding.
  >>>>

<< corresponding to what?
To the glyph.  The idea is that if you have created a glyph in a font
for mathematical notation not in Unicode, then there's a way to use
it like a character.  For instance, if the overcrossing drawn in
knot theory is used in a discussion of knotting of DNA then it is
quite possible that it may need to occur in an equation.  <mglyph>
is what you use to do this.

CHANGED TO
creation of a MathML substitute for the corresponding character.

=====

<< 6.2.3 Mathematical Alphanumeric Symbols Characters.

<< there should not be a dot after the title
FIXED
====

  >>>>
<<   The new Mathematical Alphanumeric Symbols provided in Unicode 3.1
  >>>>

<< remove 'new'. Otherwise, the spec already looks outdated
<< before it is approved.
The characters expressly introduced by Unicode to facilitate
mathematical formulas certainly are new.  They are the solution
that was found for a specific need in mathematical markup.
It could conceivably have happened that only a few special math
variant markers were introduced, but it did not.

CHANGED
'new' ===> 'additional'

=====
  >>>>
<< ... in contrast to the Basic Multilingual Plane (BMP) which has been used
<< by Unicode so far.
  >>>>

<< remove temporal context ('so far')

The addition of many new (additional) planes was an important
change for Unicode.

'which has been used by Unicode so far'
CHANGED TO
'which was originally the entire extent of Unicode'

====
  >>>>
<< For example, a Mathematical Fraktur alphabet is being added, and the code
<< point for Mathematical Fraktur A is U1D504.
  >>>>

<< 'is being added' seems to refer to some activity that is now complete.
<< Please update. Also, U1D504 -> U+1D504

Wrong tense and wrong code FIXED

=====

6.2.4 Non-Marking Characters

  >>>>
<< Some characters, although important for the quality of print or alternative
<< rendering, do not have glyph marks that correspond directly.
  >>>>

<< correspond to what?
To the character, since it is not supposed to create a mark directly.
There are such characters in Unicode.

ADDED 'to them'
====


  >>>>
<< The Universal Character Set (UCS) of Unicode and ISO 10646 continues to
<< evolve, see Section 6.4.4 Status of Character Encodings. A small number of
<< the changes recently introduced, relative to those resulting from the needs
<< of Asian languages, are those designed exactly to facilitate the use of
<< Unicode by the 'equation-writing' community. This specification is written
<< on the assumption that the code assignments suggested to ISO/IEC
<< JTC1/SC2/WG2 by the UTC will be confirmed as they are in public draft forms
<< of Unicode 3.1 and 3.2. As before, we can only reiterate that for latest
<< developments on details of character standards as far as they influence
<< mathematical formalism the home page of the W3C Math Working Group should
<< be consulted.
  >>>>

<< This seems to be totally outdated. Also, http://www.w3.org/Math/workingGroup
<< does not provide any relevant info. As text such as this has appeared
<< in older versions, http://www.w3.org/Math/workingGroup should contain
<< such info, even if it is just to say that all characters in question have
<< been approved in the meantime.

This is a piece of text that should have been excised and so we have
a new shortened version (see below).  The comments about the character
information that ought to be found on the Math WG page (or IG page
later perhaps) are quite right.   It is intended to keep such
information on updates there.

NEW VERSION ==>

The Universal Character Set (UCS) of Unicode and ISO 10646 continues to
evolve, see Section 6.4.4 Status of Character Encodings.  At the time
of writing the standard is Unicode 4.0.  As before, we can only reiterate
that for latest developments on details of character standards as far as
they influence mathematical formalism the home page of the W3C Math
Activity should be consulted.

====
<< 6.3 Character Symbol Listings

  >>>>
<<   The characters are listed by name, and sample glyphs provided for all of
<< them. Each character name is accompanied by a code for a character grouping
<< chosen from a list given below, a short verbal description, and a Unicode
<< hex code drawn from ISO 10646, now extended in accordance with the proposal
<< forwarded by the UTC to ISO/IEC WG2 in March 2000.
  >>>>

outdated, please fix

UPDATED

====
<< 6.3.1 Special Constants

  >>>>
<< These have been accorded new Unicode values.
  >>>>

<< 'have been accorded': remove temporal reference

'have been accorded new'
===>
'now have'

====

6.3.4 Negated Mathematical Characters

  >>>>
<< Note that it is the policy of the W3C and of Unicode that if a single
<< character is already defined for what can be achieved with a combining
<< character, that character must be used instead of the decomposed form. It
<< is also intended that no new single characters representing what can be
<< done by with existing compositions will be introduced.
  >>>>

<< There should be an explicit mention of NFC, with a reference to Unicode
<< Standard Annex #15.

DONE Text and reference added

====

<< 6.3.6 Mathematical Alphanumeric Symbols

  >>>>
<< Most of these characters come from the additions to Plane 1, however a few
<< characters (such as the double-struck letters N, P, Z, Q, R, C, H
<< representing common number sets) were already present in Unicode 3.0 and
<< retain their original positions.
  >>>>

<< This is again more version/history-oriented than necessary. What about:

<< Most of these characters are in Plane 1, except for a few characters (such
<< as the double-struck letters N, P, Z, Q, R, C, H representing common number
<< sets) which are in the BMP.

It doesn't seem essential to excise the history here, and it helps
some to understand the context.

=====

<< 6.4.2 Fewer Non-marking Characters

  >>>>
<< It used to be in MathML 1.0 that there were a number more non-marking
<< character entities listed.
  >>>>

<< 'It used to be' reads like 'once upon a time'. But this is a spec, not
<< a fairy tale. What about:

<< MathML 1.0 contained a small number of non-marking character entities that
<< have been removed in MathML 2.0.

I suppose the suggested revision is more machine-friendly.  I see no
difficulty with the other, whether or not this spec is a 'fairy tale',
as some have turned out to be for all their technical writing.

=====

<< 6.4.4 Status of Character Encodings

<< This section needs serious rework. Some of the (updated) text is speaking
<< about events in 2001. The section simply should say that earlier
<< versions may have mentioned that different characters were in different
<< stages of adoption in the standards process, but that all characters
<< now in the spec are fully standardized. This is the message that
<< we need to get out, and this is the way to avoid that the spec
<< looks silly in a few years.


  >>>>
<< Even with the good will shown to the mathenatical community by the Unicode
<< process a small number of characters of special interest to some may not
<< yet have been included. The obvious solution of avoiding their use may not
<< satisfy all. For these characters the Unicode mechanism involving Private
<< Use Area codes could be deployed, in spite of all the dangers of confusion
<< and collisions of conventions this brings with it. However, this is the
<< situation for which mglyph was introduced.
  >>>>

<< This paragraph should be rewritten and shortened, if it belongs
<< into this section at all. It is particularly important to us
<< that mention of the private use area is removed. What about:

Why is it so important the I18N that the existence of the PUA,
which is a recorded part of the USC and 10646 be denied?  It is
part of a real standard.  It is not being recommended here, but
its existence is worth a warning.

A REVISED VERSION version now ends with

"However, this is the situation for which mglyph was introduced.
The use of <mglyph> is recommended to refer to symbols not included
in Unicode. "
Received on Wednesday, 2 July 2003 16:43:14 UTC