- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 14 May 2003 17:34:32 -0400
- To: www-math@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear MathML WG, At the teleconference yesterday (http://www.w3.org/mid/4.2.0.58.J.20030513130320.071fb188@localhost), the I18N WG (core task force) has decided to endorse the comments below and has actioned me to tell you. Regards, Martin. At 18:02 03/05/07 -0400, Martin Duerst wrote: >Dear MathML WG, > >This is my review of your last call at >http://www.w3.org/TR/2003/WD-MathML2-20030411/, mainly based >on the diff-marked HTML version. > >This is currently a personal review. I'm sending this to you >because your last call period is closing shortly. > >The core task force of the Internationalization (I18N) WG meets >again next Tuesday, and will have a look at my comments and may >then indorse some of the comments, add some, or modify some. >I hope you can grant the I18N WG an extension of a few days. > > > >Front page: > >"Please refer to the errata for this document, which may include some >normative corrections." > >According to the new process document (now in review), errata >pages are not normative. > > >Overall: It would be extremely nice to have an index of elements >and attributes. Given all the technology used for producing the >spec, this shouldn't be a problem at all. > > >3.2.8 String Literal (ms) > >"In practice, non-ASCII characters will typically be represented by entity >references." > >This sentence should be removed. It is technologically biased >(there are enccodings and tools that don't need entity references), >and it is also biased with regards to language. E.g. Japanese >mathematician would never represent Japanese (i.e. non-ASCII) >characters with entity references. > > >3.2.9 Adding new character glyphs to MathML (mglyph) > >We have earlier complained about this: "character glyph" is >not a term that we know, nor is it defined in this spec, nor >should it be used, because it is highly confusing. The text >of the section now mostly manages to avoid it. If you want >an easy way out, substitute "character glyphs" with >"characters/glyphs". This at least makes it clear that >these are two different things. > > > >4.4.11.1 Annotation (annotation) > >(and same for 4.4.11.3 XML-based annotation (annotation-xml) ) > > >>>> >The annotation element takes the attributes definitionURL and encoding >that can be used to override the default semantics. Only the encoding >attribute is required whenever the semantics remains unchanged. > >>>> > >It would be good to have a clear explanation of 'encoding' here, >so that people don't confuse it with the 'encoding' pseudo-attribute >in the XML declaration. > > >6.1 Introduction > > >>>> >It did not fall naturally within the purview of developing a specification >enabling mathematics to be used with HTML and producing a DTD for the >Working group this to worry about more than the entities allowed in the DTD. > >>>> > >"this" is weird. > >More general, the I18N WG has on various occasions requested that the >introduction in chapter 6 be seriously shortened to make sure the document >stays a spec rather than a historical account of a spec's history. > > >"While a long process of review and adoption by UTC and ISO/IEC of the >characters of special interest to mathematics and MathML is now complete >(Unicode Work in Progress) there remains the possibility of some further >modification of the lists of characters accepted, of the code assignments >for those adopted, or of the names given them by Unicode. To make sure any >possible corrections to relevant standards are taken into account, and for >the latest character tables and font information, see the W3C Math Working >Group home page and the Unicode site." > >This is highly misleading. There is a very strong commitment by >Unicode and ISO to not change any codepoints or names. The characters >referenced in the spec to our knowledge all have been fully >accepted, and any language such as the above suggesting there >will be further changes is highly confusing and misleading and >should be removed. > > >"The parenthetical notation beginning with U+ is one recommended by >Unicode for referring to Unicode characters [see [Unicode], page xxviii]." > >What about this notation is parenthetical? Proposal: remove 'parenthetical'. >'is one' -> 'is the one'; also, just introduce the notation, and then >avoid to list the same numbers twice, once without and once with U+. > > >6.2.1 Unicode Character Data > > >>>>>>>> > * Using characters directly: For example, an A may be entered as 'A' > from a keyboard (character U+0041J). This option is only available if the > character encoding specified for the XML document includes the character. > Most commonly used encodings will have 'A' in the ASCII position. In many > encodings, characters may need more than one byte. Note that if the > document is, for example, encoded in Latin-1 (ISO-8859-1) then only the > characters in that encoding are available directly. Unfortunately, most > mathematical symbols may not be encoded as character data in this way. > >>>>>>>> > >The last sentence is misleading. Using UTF-8 or UTF-16, the two only >encodings that all XML processors are required to accept, mathematical >symbols can be encoded as character data. > > >>>> >By using Character references it is always possible to access the entire >Unicode range. > >>>> > >'Character references': inconsistent capitalization. > > > >6.2.2 Special Characters Not in Unicode > > >>>> >In these cases one may use the mglyph element for direct access to a >glyph from some font and creation of a MathML character corresponding. > >>>> > >corresponding to what? > > >6.2.3 Mathematical Alphanumeric Symbols Characters. > >there should not be a dot after the title > > >>>> > The new Mathematical Alphanumeric Symbols provided in Unicode 3.1 > >>>> > >remove 'new'. Otherwise, the spec already looks outdated >before it is approved. > > >>>> >... in contrast to the Basic Multilingual Plane (BMP) which has been used >by Unicode so far. > >>>> > >remove temporal context ('so far') > > >>>> >For example, a Mathematical Fraktur alphabet is being added, and the code >point for Mathematical Fraktur A is U1D504. > >>>> > >'is being added' seems to refer to some activity that is now complete. >Please update. Also, U1D504 -> U+1D504 > > >6.2.4 Non-Marking Characters > > >>>> >Some characters, although important for the quality of print or >alternative rendering, do not have glyph marks that correspond directly. > >>>> > >correspond to what? > > > >>>> >The Universal Character Set (UCS) of Unicode and ISO 10646 continues to >evolve, see Section 6.4.4 Status of Character Encodings. A small number of >the changes recently introduced, relative to those resulting from the >needs of Asian languages, are those designed exactly to facilitate the use >of Unicode by the 'equation-writing' community. This specification is >written on the assumption that the code assignments suggested to ISO/IEC >JTC1/SC2/WG2 by the UTC will be confirmed as they are in public draft >forms of Unicode 3.1 and 3.2. As before, we can only reiterate that for >latest developments on details of character standards as far as they >influence mathematical formalism the home page of the W3C Math Working >Group should be consulted. > >>>> > >This seems to be totally outdated. Also, http://www.w3.org/Math/workingGroup >does not provide any relevant info. As text such as this has appeared >in older versions, http://www.w3.org/Math/workingGroup should contain >such info, even if it is just to say that all characters in question have >been approved in the meantime. > > >6.3 Character Symbol Listings > > >>>> > The characters are listed by name, and sample glyphs provided for all of > them. Each character name is accompanied by a code for a character > grouping chosen from a list given below, a short verbal description, and > a Unicode hex code drawn from ISO 10646, now extended in accordance with > the proposal forwarded by the UTC to ISO/IEC WG2 in March 2000. > >>>> > >outdated, please fix > > >6.3.1 Special Constants > > >>>> >These have been accorded new Unicode values. > >>>> > >'have been accorded': remove temporal reference > > >6.3.4 Negated Mathematical Characters > > >>>> >Note that it is the policy of the W3C and of Unicode that if a single >character is already defined for what can be achieved with a combining >character, that character must be used instead of the decomposed form. It >is also intended that no new single characters representing what can be >done by with existing compositions will be introduced. > >>>> > >There should be an explicit mention of NFC, with a reference to Unicode >Standard Annex #15. > > > >6.3.6 Mathematical Alphanumeric Symbols > > >>>> >Most of these characters come from the additions to Plane 1, however a few >characters (such as the double-struck letters N, P, Z, Q, R, C, H >representing common number sets) were already present in Unicode 3.0 and >retain their original positions. > >>>> > >This is again more version/history-oriented than necessary. What about: > >Most of these characters are in Plane 1, except for a few characters (such >as the double-struck letters N, P, Z, Q, R, C, H representing common >number sets) which are in the BMP. > > > >6.4.2 Fewer Non-marking Characters > > >>>> >It used to be in MathML 1.0 that there were a number more non-marking >character entities listed. > >>>> > >'It used to be' reads like 'once upon a time'. But this is a spec, not >a fairy tale. What about: > >MathML 1.0 contained a small number of non-marking character entities that >have been removed in MathML 2.0. > > > >6.4.4 Status of Character Encodings > >This section needs serious rework. Some of the (updated) text is speaking >about events in 2001. The section simply should say that earlier >versions may have mentioned that different characters were in different >stages of adoption in the standards process, but that all characters >now in the spec are fully standardized. This is the message that >we need to get out, and this is the way to avoid that the spec >looks silly in a few years. > > > >>>> >Even with the good will shown to the mathenatical community by the Unicode >process a small number of characters of special interest to some may not >yet have been included. The obvious solution of avoiding their use may not >satisfy all. For these characters the Unicode mechanism involving Private >Use Area codes could be deployed, in spite of all the dangers of confusion >and collisions of conventions this brings with it. However, this is the >situation for which mglyph was introduced. > >>>> > >This paragraph should be rewritten and shortened, if it belongs >into this section at all. It is particularly important to us >that mention of the private use area is removed. What about: > >To refer to symbols not included in Unicode, please use the <mglyph> >element. > > >A.1 Use of MathML as Well-Formed XML > > >>>> >The document should be encoded in an encoding (for example UTF-8) in which >al needed characters may be encoded as character data,... > >>>> > >al -> all > >Finally UTF-8 is mentioned. Great! > > >>>> >However, in many circumstance, > >>>> > >circumstance -> circumstances; rest of this paragraph needs some >work too, e.g. "specification, Following" -> "specification. Following"; >"the a schema validating processor schema" -> >"a schema validating processor" > > >A.2.2.2 Plane 1 Characters > >As discussed earlier, what this section tries to do >(to provide workarounds for non-compliant XML implementations) >is unacceptable. This is even more so in that the problems in >IE, according to our knowledge, have been fixed. This section >should be removed, and the corresponding DTD fragments fixed >to eliminate the "plane1D" parameter entity. > > >B Content Markup Validation Grammar > > >>>> >[4] Char ::= Space | [#x21 - #xFFFD] | [#x00010000 - >#x7FFFFFFFF] /* valid XML chars */ > >>>> > >This production is clearly wrong, and needs to be fixed. > > >XML Schema, at http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd >(several files): > > ><?xml version="1.0" encoding="UTF-8"?> >... ><xs:annotation> > <xs:documentation> > This is an XML Schema for MathML. > Author: Stéphane Dalmas, INRIA. > </xs:documentation> ></xs:annotation> > >"é : If this is UTF-8, then please use UTF-8. > > >Regards, Martin.
Received on Wednesday, 14 May 2003 17:34:44 UTC