- From: John Cowan <cowan@ccil.org>
- Date: Thu, 21 May 2009 10:54:12 -0400
- To: "Grosso, Paul" <pgrosso@ptc.com>
- Cc: public-xml-core-wg@w3.org
Addison scripsit: > _Unicode_ (rule C06) says that canonically equivalent > sequences of characters ought to be treated as identical. > However, XML _parsed entities_ (including _document > entities_) that are canonically equivalent according to > Unicode but which use distinct code point (character) > sequences are considered distinct by XML processors. > Therefore, all XML parsed entities SHOULD be created in a > "fully normalized" form per _[CharMod-Norm]_. Otherwise the > user might unknowingly create canonically equivalent but > unequal sequences that appear identical to the user but which > are treated as distinct by XML processors. > > A document is still well-formed, even if it is not in a > normalized form. XML processors MAY verify that the document > being processed is in a fully-normalized form and report to > the application whether it is or not. Looks good to me. > This sequence is not "full normalized", but, we think it is > both your and our intention that it be valid and that the > element 'foo' contain the character U+0301, even though > U+0301 is a combining mark. In considering our proposed text > above, we are concerned that the term "parsed entity" might > be too broad, if it is considered to include attribute and > element content (and not just the names of XML document > structures). Please consider this when implementing our > proposed text and/or advise us whether or not parsed entity > is the right choice for the meaning imputed here. Informally, "full normalization" means that when you strip the markup away, the resulting plain text is still normalized. This is a Good Thing, but sometimes not the Right Thing. I believe that the SHOULD in the above text covers this contingency. -- While staying with the Asonu, I met a man from John Cowan the Candensian plane, which is very much like cowan@ccil.org ours, only more of it consists of Toronto. http://www.ccil.org/~cowan --Ursula K. Le Guin, Changing Planes
Received on Thursday, 21 May 2009 14:54:49 UTC