(draft) suitable xml 5e health warning

All,

In our last call, we discussed the need for a better health warning regarding normalization than proposed in [1]. XML-WG proposed using this text:

--
All XML _parsed entities_ (including _document entities_) SHOULD
 be fully normalized as per _[CharMod]_.

 However, a document is still well-formed even if it is not fully
 normalized. XML processors MAY verify that the document being
 processed is in fully normalized form and report to the application
 whether it is or not.
--

Here is my proposal:

--
Although _Unicode_ (rule C06) says that canonically equivalent sequences of characters ought to be treated as identical, XML _parsed entities_ (including _document entities_) that are canonically equivalent according to Unicode but which use distinct code point (character) sequences are considered distinct by XML processors. Therefore, all XML parsed entities SHOULD be "fully normalized" per _[CharMod-Norm]_. Otherwise, entities that appear to be identical can be treated as distinct, even though this might not be the intention of the user.

A document is still well-formed, even if it is not fully normalized. XML processors MAY verify that the document being processed is in fully-normalized form and report to the application whether it is or not.
--


Addison

[1] http://lists.w3.org/Archives/Public/public-i18n-core/2009AprJun/0031.html 

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

Received on Tuesday, 5 May 2009 04:22:18 UTC