- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 07 May 2003 18:02:43 -0400
- To: www-math@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear MathML WG, This is my review of your last call at http://www.w3.org/TR/2003/WD-MathML2-20030411/, mainly based on the diff-marked HTML version. This is currently a personal review. I'm sending this to you because your last call period is closing shortly. The core task force of the Internationalization (I18N) WG meets again next Tuesday, and will have a look at my comments and may then indorse some of the comments, add some, or modify some. I hope you can grant the I18N WG an extension of a few days. Front page: "Please refer to the errata for this document, which may include some normative corrections." According to the new process document (now in review), errata pages are not normative. Overall: It would be extremely nice to have an index of elements and attributes. Given all the technology used for producing the spec, this shouldn't be a problem at all. 3.2.8 String Literal (ms) "In practice, non-ASCII characters will typically be represented by entity references." This sentence should be removed. It is technologically biased (there are enccodings and tools that don't need entity references), and it is also biased with regards to language. E.g. Japanese mathematician would never represent Japanese (i.e. non-ASCII) characters with entity references. 3.2.9 Adding new character glyphs to MathML (mglyph) We have earlier complained about this: "character glyph" is not a term that we know, nor is it defined in this spec, nor should it be used, because it is highly confusing. The text of the section now mostly manages to avoid it. If you want an easy way out, substitute "character glyphs" with "characters/glyphs". This at least makes it clear that these are two different things. 4.4.11.1 Annotation (annotation) (and same for 4.4.11.3 XML-based annotation (annotation-xml) ) >>>> The annotation element takes the attributes definitionURL and encoding that can be used to override the default semantics. Only the encoding attribute is required whenever the semantics remains unchanged. >>>> It would be good to have a clear explanation of 'encoding' here, so that people don't confuse it with the 'encoding' pseudo-attribute in the XML declaration. 6.1 Introduction >>>> It did not fall naturally within the purview of developing a specification enabling mathematics to be used with HTML and producing a DTD for the Working group this to worry about more than the entities allowed in the DTD. >>>> "this" is weird. More general, the I18N WG has on various occasions requested that the introduction in chapter 6 be seriously shortened to make sure the document stays a spec rather than a historical account of a spec's history. "While a long process of review and adoption by UTC and ISO/IEC of the characters of special interest to mathematics and MathML is now complete (Unicode Work in Progress) there remains the possibility of some further modification of the lists of characters accepted, of the code assignments for those adopted, or of the names given them by Unicode. To make sure any possible corrections to relevant standards are taken into account, and for the latest character tables and font information, see the W3C Math Working Group home page and the Unicode site." This is highly misleading. There is a very strong commitment by Unicode and ISO to not change any codepoints or names. The characters referenced in the spec to our knowledge all have been fully accepted, and any language such as the above suggesting there will be further changes is highly confusing and misleading and should be removed. "The parenthetical notation beginning with U+ is one recommended by Unicode for referring to Unicode characters [see [Unicode], page xxviii]." What about this notation is parenthetical? Proposal: remove 'parenthetical'. 'is one' -> 'is the one'; also, just introduce the notation, and then avoid to list the same numbers twice, once without and once with U+. 6.2.1 Unicode Character Data >>>>>>>> * Using characters directly: For example, an A may be entered as 'A' from a keyboard (character U+0041J). This option is only available if the character encoding specified for the XML document includes the character. Most commonly used encodings will have 'A' in the ASCII position. In many encodings, characters may need more than one byte. Note that if the document is, for example, encoded in Latin-1 (ISO-8859-1) then only the characters in that encoding are available directly. Unfortunately, most mathematical symbols may not be encoded as character data in this way. >>>>>>>> The last sentence is misleading. Using UTF-8 or UTF-16, the two only encodings that all XML processors are required to accept, mathematical symbols can be encoded as character data. >>>> By using Character references it is always possible to access the entire Unicode range. >>>> 'Character references': inconsistent capitalization. 6.2.2 Special Characters Not in Unicode >>>> In these cases one may use the mglyph element for direct access to a glyph from some font and creation of a MathML character corresponding. >>>> corresponding to what? 6.2.3 Mathematical Alphanumeric Symbols Characters. there should not be a dot after the title >>>> The new Mathematical Alphanumeric Symbols provided in Unicode 3.1 >>>> remove 'new'. Otherwise, the spec already looks outdated before it is approved. >>>> ... in contrast to the Basic Multilingual Plane (BMP) which has been used by Unicode so far. >>>> remove temporal context ('so far') >>>> For example, a Mathematical Fraktur alphabet is being added, and the code point for Mathematical Fraktur A is U1D504. >>>> 'is being added' seems to refer to some activity that is now complete. Please update. Also, U1D504 -> U+1D504 6.2.4 Non-Marking Characters >>>> Some characters, although important for the quality of print or alternative rendering, do not have glyph marks that correspond directly. >>>> correspond to what? >>>> The Universal Character Set (UCS) of Unicode and ISO 10646 continues to evolve, see Section 6.4.4 Status of Character Encodings. A small number of the changes recently introduced, relative to those resulting from the needs of Asian languages, are those designed exactly to facilitate the use of Unicode by the 'equation-writing' community. This specification is written on the assumption that the code assignments suggested to ISO/IEC JTC1/SC2/WG2 by the UTC will be confirmed as they are in public draft forms of Unicode 3.1 and 3.2. As before, we can only reiterate that for latest developments on details of character standards as far as they influence mathematical formalism the home page of the W3C Math Working Group should be consulted. >>>> This seems to be totally outdated. Also, http://www.w3.org/Math/workingGroup does not provide any relevant info. As text such as this has appeared in older versions, http://www.w3.org/Math/workingGroup should contain such info, even if it is just to say that all characters in question have been approved in the meantime. 6.3 Character Symbol Listings >>>> The characters are listed by name, and sample glyphs provided for all of them. Each character name is accompanied by a code for a character grouping chosen from a list given below, a short verbal description, and a Unicode hex code drawn from ISO 10646, now extended in accordance with the proposal forwarded by the UTC to ISO/IEC WG2 in March 2000. >>>> outdated, please fix 6.3.1 Special Constants >>>> These have been accorded new Unicode values. >>>> 'have been accorded': remove temporal reference 6.3.4 Negated Mathematical Characters >>>> Note that it is the policy of the W3C and of Unicode that if a single character is already defined for what can be achieved with a combining character, that character must be used instead of the decomposed form. It is also intended that no new single characters representing what can be done by with existing compositions will be introduced. >>>> There should be an explicit mention of NFC, with a reference to Unicode Standard Annex #15. 6.3.6 Mathematical Alphanumeric Symbols >>>> Most of these characters come from the additions to Plane 1, however a few characters (such as the double-struck letters N, P, Z, Q, R, C, H representing common number sets) were already present in Unicode 3.0 and retain their original positions. >>>> This is again more version/history-oriented than necessary. What about: Most of these characters are in Plane 1, except for a few characters (such as the double-struck letters N, P, Z, Q, R, C, H representing common number sets) which are in the BMP. 6.4.2 Fewer Non-marking Characters >>>> It used to be in MathML 1.0 that there were a number more non-marking character entities listed. >>>> 'It used to be' reads like 'once upon a time'. But this is a spec, not a fairy tale. What about: MathML 1.0 contained a small number of non-marking character entities that have been removed in MathML 2.0. 6.4.4 Status of Character Encodings This section needs serious rework. Some of the (updated) text is speaking about events in 2001. The section simply should say that earlier versions may have mentioned that different characters were in different stages of adoption in the standards process, but that all characters now in the spec are fully standardized. This is the message that we need to get out, and this is the way to avoid that the spec looks silly in a few years. >>>> Even with the good will shown to the mathenatical community by the Unicode process a small number of characters of special interest to some may not yet have been included. The obvious solution of avoiding their use may not satisfy all. For these characters the Unicode mechanism involving Private Use Area codes could be deployed, in spite of all the dangers of confusion and collisions of conventions this brings with it. However, this is the situation for which mglyph was introduced. >>>> This paragraph should be rewritten and shortened, if it belongs into this section at all. It is particularly important to us that mention of the private use area is removed. What about: To refer to symbols not included in Unicode, please use the <mglyph> element. A.1 Use of MathML as Well-Formed XML >>>> The document should be encoded in an encoding (for example UTF-8) in which al needed characters may be encoded as character data,... >>>> al -> all Finally UTF-8 is mentioned. Great! >>>> However, in many circumstance, >>>> circumstance -> circumstances; rest of this paragraph needs some work too, e.g. "specification, Following" -> "specification. Following"; "the a schema validating processor schema" -> "a schema validating processor" A.2.2.2 Plane 1 Characters As discussed earlier, what this section tries to do (to provide workarounds for non-compliant XML implementations) is unacceptable. This is even more so in that the problems in IE, according to our knowledge, have been fixed. This section should be removed, and the corresponding DTD fragments fixed to eliminate the "plane1D" parameter entity. B Content Markup Validation Grammar >>>> [4] Char ::= Space | [#x21 - #xFFFD] | [#x00010000 - #x7FFFFFFFF] /* valid XML chars */ >>>> This production is clearly wrong, and needs to be fixed. XML Schema, at http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd (several files): <?xml version="1.0" encoding="UTF-8"?> ... <xs:annotation> <xs:documentation> This is an XML Schema for MathML. Author: Stéphane Dalmas, INRIA. </xs:documentation> </xs:annotation> "é : If this is UTF-8, then please use UTF-8. Regards, Martin.
Received on Wednesday, 7 May 2003 18:05:50 UTC