- From: John Cowan <cowan@ccil.org>
- Date: Thu, 21 May 2009 10:54:12 -0400
- To: "Grosso, Paul" <pgrosso@ptc.com>
- Cc: public-xml-core-wg@w3.org
Addison scripsit:
> _Unicode_ (rule C06) says that canonically equivalent
> sequences of characters ought to be treated as identical.
> However, XML _parsed entities_ (including _document
> entities_) that are canonically equivalent according to
> Unicode but which use distinct code point (character)
> sequences are considered distinct by XML processors.
> Therefore, all XML parsed entities SHOULD be created in a
> "fully normalized" form per _[CharMod-Norm]_. Otherwise the
> user might unknowingly create canonically equivalent but
> unequal sequences that appear identical to the user but which
> are treated as distinct by XML processors.
>
> A document is still well-formed, even if it is not in a
> normalized form. XML processors MAY verify that the document
> being processed is in a fully-normalized form and report to
> the application whether it is or not.
Looks good to me.
> This sequence is not "full normalized", but, we think it is
> both your and our intention that it be valid and that the
> element 'foo' contain the character U+0301, even though
> U+0301 is a combining mark. In considering our proposed text
> above, we are concerned that the term "parsed entity" might
> be too broad, if it is considered to include attribute and
> element content (and not just the names of XML document
> structures). Please consider this when implementing our
> proposed text and/or advise us whether or not parsed entity
> is the right choice for the meaning imputed here.
Informally, "full normalization" means that when you strip the markup
away, the resulting plain text is still normalized. This is a Good
Thing, but sometimes not the Right Thing. I believe that the SHOULD in
the above text covers this contingency.
--
While staying with the Asonu, I met a man from John Cowan
the Candensian plane, which is very much like cowan@ccil.org
ours, only more of it consists of Toronto. http://www.ccil.org/~cowan
--Ursula K. Le Guin, Changing Planes
Received on Thursday, 21 May 2009 14:54:49 UTC