Re: Unicode Normalization in XML 1.0 5e

Addison scripsit:

> _Unicode_ (rule C06) says that canonically equivalent 
> sequences of characters ought to be treated as identical. 
> However, XML _parsed entities_ (including _document 
> entities_) that are canonically equivalent according to 
> Unicode but which use distinct code point (character) 
> sequences are considered distinct by XML processors. 
> Therefore, all XML parsed entities SHOULD be created in a 
> "fully normalized" form per _[CharMod-Norm]_. Otherwise the 
> user might unknowingly create canonically equivalent but 
> unequal sequences that appear identical to the user but which 
> are treated as distinct by XML processors.
> 
> A document is still well-formed, even if it is not in a 
> normalized form. XML processors MAY verify that the document 
> being processed is in a fully-normalized form and report to 
> the application whether it is or not.

Looks good to me.

> This sequence is not "full normalized", but, we think it is 
> both your and our intention that it be valid and that the 
> element 'foo' contain the character U+0301, even though 
> U+0301 is a combining mark. In considering our proposed text 
> above, we are concerned that the term "parsed entity" might 
> be too broad, if it is considered to include attribute and 
> element content (and not just the names of XML document 
> structures). Please consider this when implementing our 
> proposed text and/or advise us whether or not parsed entity 
> is the right choice for the meaning imputed here.

Informally, "full normalization" means that when you strip the markup
away, the resulting plain text is still normalized.  This is a Good
Thing, but sometimes not the Right Thing.  I believe that the SHOULD in
the above text covers this contingency.

-- 
While staying with the Asonu, I met a man from      John Cowan
the Candensian plane, which is very much like       cowan@ccil.org
ours, only more of it consists of Toronto.          http://www.ccil.org/~cowan
        --Ursula K. Le Guin, Changing Planes

Received on Thursday, 21 May 2009 14:54:49 UTC