Re: comments on MathML last call from Martin Duerst on 2003-05-14 (www-math@w3.org from May 2003)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 14 May 2003 17:34:32 -0400
To: www-math@w3.org
Cc: w3c-i18n-ig@w3.org
Message-Id: <4.2.0.58.J.20030514173223.065fb728@localhost>
Dear MathML WG,

At the teleconference yesterday
(http://www.w3.org/mid/4.2.0.58.J.20030513130320.071fb188@localhost),
the I18N WG (core task force) has decided to endorse the comments
below and has actioned me to tell you.

Regards,    Martin.

At 18:02 03/05/07 -0400, Martin Duerst wrote:

>Dear MathML WG,
>
>This is my review of your last call at
>http://www.w3.org/TR/2003/WD-MathML2-20030411/, mainly based
>on the diff-marked HTML version.
>
>This is currently a personal review. I'm sending this to you
>because your last call period is closing shortly.
>
>The core task force of the Internationalization (I18N) WG meets
>again next Tuesday, and will have a look at my comments and may
>then indorse some of the comments, add some, or modify some.
>I hope you can grant the I18N WG an extension of a few days.
>
>
>
>Front page:
>
>"Please refer to the errata for this document, which may include some 
>normative corrections."
>
>According to the new process document (now in review), errata
>pages are not normative.
>
>
>Overall: It would be extremely nice to have an index of elements
>and attributes. Given all the technology used for producing the
>spec, this shouldn't be a problem at all.
>
>
>3.2.8 String Literal (ms)
>
>"In practice, non-ASCII characters will typically be represented by entity 
>references."
>
>This sentence should be removed. It is technologically biased
>(there are enccodings and tools that don't need entity references),
>and it is also biased with regards to language. E.g. Japanese
>mathematician would never represent Japanese (i.e. non-ASCII)
>characters with entity references.
>
>
>3.2.9 Adding new character glyphs to MathML (mglyph)
>
>We have earlier complained about this: "character glyph" is
>not a term that we know, nor is it defined in this spec, nor
>should it be used, because it is highly confusing. The text
>of the section now mostly manages to avoid it. If you want
>an easy way out, substitute "character glyphs" with
>"characters/glyphs". This at least makes it clear that
>these are two different things.
>
>
>
>4.4.11.1 Annotation (annotation)
>
>(and same for 4.4.11.3 XML-based annotation (annotation-xml) )
>
> >>>>
>The annotation element takes the attributes definitionURL and encoding 
>that can be used to override the default semantics. Only the encoding 
>attribute is required whenever the semantics remains unchanged.
> >>>>
>
>It would be good to have a clear explanation of 'encoding' here,
>so that people don't confuse it with the 'encoding' pseudo-attribute
>in the XML declaration.
>
>
>6.1 Introduction
>
> >>>>
>It did not fall naturally within the purview of developing a specification 
>enabling mathematics to be used with HTML and producing a DTD for the 
>Working group this to worry about more than the entities allowed in the DTD.
> >>>>
>
>"this" is weird.
>
>More general, the I18N WG has on various occasions requested that the
>introduction in chapter 6 be seriously shortened to make sure the document
>stays a spec rather than a historical account of a spec's history.
>
>
>"While a long process of review and adoption by UTC and ISO/IEC of the 
>characters of special interest to mathematics and MathML is now  complete 
>(Unicode Work in Progress) there remains the possibility of some further 
>modification of the lists of characters accepted, of the code assignments 
>for those adopted, or of the names given them by Unicode. To make sure any 
>possible corrections to relevant standards are taken into account, and for 
>the latest character tables and font information, see the W3C Math Working 
>Group home page and the Unicode site."
>
>This is highly misleading. There is a very strong commitment by
>Unicode and ISO to not change any codepoints or names. The characters
>referenced in the spec to our knowledge all have been fully
>accepted, and any language such as the above suggesting there
>will be further changes is highly confusing and misleading and
>should be removed.
>
>
>"The parenthetical notation beginning with U+ is one recommended by 
>Unicode for referring to Unicode characters [see [Unicode], page xxviii]."
>
>What about this notation is parenthetical? Proposal: remove 'parenthetical'.
>'is one' -> 'is the one'; also, just introduce the notation, and then
>avoid to list the same numbers twice, once without and once with U+.
>
>
>6.2.1 Unicode Character Data
>
> >>>>>>>>
>     * Using characters directly: For example, an A may be entered as 'A' 
> from a keyboard (character U+0041J). This option is only available if the 
> character encoding specified for the XML document includes the character. 
> Most commonly used encodings will have 'A' in the ASCII position. In many 
> encodings, characters may need more than one byte. Note that if the 
> document is, for example, encoded in Latin-1 (ISO-8859-1) then only the 
> characters in that encoding are available directly. Unfortunately, most 
> mathematical symbols may not be encoded as character data in this way.
> >>>>>>>>
>
>The last sentence is misleading. Using UTF-8 or UTF-16, the two only
>encodings that all XML processors are required to accept, mathematical
>symbols can be encoded as character data.
>
> >>>>
>By using Character references it is always possible to access the entire 
>Unicode range.
> >>>>
>
>'Character references': inconsistent capitalization.
>
>
>
>6.2.2 Special Characters Not in Unicode
>
> >>>>
>In these cases one may use the mglyph  element for direct access to a 
>glyph from some font and creation of a MathML character corresponding.
> >>>>
>
>corresponding to what?
>
>
>6.2.3 Mathematical Alphanumeric Symbols Characters.
>
>there should not be a dot after the title
>
> >>>>
>  The new Mathematical Alphanumeric Symbols provided in Unicode 3.1
> >>>>
>
>remove 'new'. Otherwise, the spec already looks outdated
>before it is approved.
>
> >>>>
>... in contrast to the Basic Multilingual Plane (BMP) which has been used 
>by Unicode so far.
> >>>>
>
>remove temporal context ('so far')
>
> >>>>
>For example, a Mathematical Fraktur alphabet is being added, and the code 
>point for Mathematical Fraktur A is U1D504.
> >>>>
>
>'is being added' seems to refer to some activity that is now complete.
>Please update. Also, U1D504 -> U+1D504
>
>
>6.2.4 Non-Marking Characters
>
> >>>>
>Some characters, although important for the quality of print or 
>alternative rendering, do not have glyph marks that correspond directly.
> >>>>
>
>correspond to what?
>
>
> >>>>
>The Universal Character Set (UCS) of Unicode and ISO 10646 continues to 
>evolve, see Section 6.4.4 Status of Character Encodings. A small number of 
>the changes recently introduced, relative to those resulting from the 
>needs of Asian languages, are those designed exactly to facilitate the use 
>of Unicode by the 'equation-writing' community. This specification is 
>written on the assumption that the code assignments suggested to ISO/IEC 
>JTC1/SC2/WG2 by the UTC will be confirmed as they are in public draft 
>forms of Unicode 3.1 and 3.2. As before, we can only reiterate that for 
>latest developments on details of character standards as far as they 
>influence mathematical formalism the home page of the W3C Math Working 
>Group should be consulted.
> >>>>
>
>This seems to be totally outdated. Also, http://www.w3.org/Math/workingGroup
>does not provide any relevant info. As text such as this has appeared
>in older versions, http://www.w3.org/Math/workingGroup should contain
>such info, even if it is just to say that all characters in question have
>been approved in the meantime.
>
>
>6.3 Character Symbol Listings
>
> >>>>
>  The characters are listed by name, and sample glyphs provided for all of 
> them. Each character name is accompanied by a code for a character 
> grouping chosen from a list given below, a short verbal description, and 
> a Unicode hex code drawn from ISO 10646, now extended in accordance with 
> the proposal forwarded by the UTC to ISO/IEC WG2 in March 2000.
> >>>>
>
>outdated, please fix
>
>
>6.3.1 Special Constants
>
> >>>>
>These have been accorded new Unicode values.
> >>>>
>
>'have been accorded': remove temporal reference
>
>
>6.3.4 Negated Mathematical Characters
>
> >>>>
>Note that it is the policy of the W3C and of Unicode that if a single 
>character is already defined for what can be achieved with a combining 
>character, that character must be used instead of the decomposed form. It 
>is also intended that no new single characters representing what can be 
>done by with existing compositions will be introduced.
> >>>>
>
>There should be an explicit mention of NFC, with a reference to Unicode
>Standard Annex #15.
>
>
>
>6.3.6 Mathematical Alphanumeric Symbols
>
> >>>>
>Most of these characters come from the additions to Plane 1, however a few 
>characters (such as the double-struck letters N, P, Z, Q, R, C, H 
>representing common number sets) were already present in Unicode 3.0 and 
>retain their original positions.
> >>>>
>
>This is again more version/history-oriented than necessary. What about:
>
>Most of these characters are in Plane 1, except for a few characters (such 
>as the double-struck letters N, P, Z, Q, R, C, H representing common 
>number sets) which are in the BMP.
>
>
>
>6.4.2 Fewer Non-marking Characters
>
> >>>>
>It used to be in MathML 1.0 that there were a number more non-marking 
>character entities listed.
> >>>>
>
>'It used to be' reads like 'once upon a time'. But this is a spec, not
>a fairy tale. What about:
>
>MathML 1.0 contained a small number of non-marking character entities that
>have been removed in MathML 2.0.
>
>
>
>6.4.4 Status of Character Encodings
>
>This section needs serious rework. Some of the (updated) text is speaking
>about events in 2001. The section simply should say that earlier
>versions may have mentioned that different characters were in different
>stages of adoption in the standards process, but that all characters
>now in the spec are fully standardized. This is the message that
>we need to get out, and this is the way to avoid that the spec
>looks silly in a few years.
>
>
> >>>>
>Even with the good will shown to the mathenatical community by the Unicode 
>process a small number of characters of special interest to some may not 
>yet have been included. The obvious solution of avoiding their use may not 
>satisfy all. For these characters the Unicode mechanism involving Private 
>Use Area codes could be deployed, in spite of all the dangers of confusion 
>and collisions of conventions this brings with it. However, this is the 
>situation for which mglyph was introduced.
> >>>>
>
>This paragraph should be rewritten and shortened, if it belongs
>into this section at all. It is particularly important to us
>that mention of the private use area is removed. What about:
>
>To refer to symbols not included in Unicode, please use the <mglyph>
>element.
>
>
>A.1 Use of MathML as Well-Formed XML
>
> >>>>
>The document should be encoded in an encoding (for example UTF-8) in which 
>al needed characters may be encoded as character data,...
> >>>>
>
>al -> all
>
>Finally UTF-8 is mentioned. Great!
>
> >>>>
>However, in many circumstance,
> >>>>
>
>circumstance -> circumstances; rest of this paragraph needs some
>work too, e.g. "specification, Following" -> "specification. Following";
>"the a schema validating processor schema" ->
>"a schema validating processor"
>
>
>A.2.2.2 Plane 1 Characters
>
>As discussed earlier, what this section tries to do
>(to provide workarounds for non-compliant XML implementations)
>is unacceptable. This is even more so in that the problems in
>IE, according to our knowledge, have been fixed. This section
>should be removed, and the corresponding DTD fragments fixed
>to eliminate the "plane1D" parameter entity.
>
>
>B Content Markup Validation Grammar
>
> >>>>
>[4]     Char     ::=      Space | [#x21 - #xFFFD] | [#x00010000 - 
>#x7FFFFFFFF]  /* valid XML chars */
> >>>>
>
>This production is clearly wrong, and needs to be fixed.
>
>
>XML Schema, at http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd
>(several files):
>
>
><?xml version="1.0" encoding="UTF-8"?>
>...
><xs:annotation>
>   <xs:documentation>
>   This is an XML Schema for MathML.
>   Author: St&#233;phane Dalmas, INRIA.
>   </xs:documentation>
></xs:annotation>
>
>"&#233; : If this is UTF-8, then please use UTF-8.
>
>
>Regards,    Martin.
Received on Wednesday, 14 May 2003 17:34:44 UTC