draft wording of normalization note for xml 1.0

> > Addison Phillips of I18N sent email about 
> > Unicode Normalization in XML 1.0 5th Ed.; see
> > http://lists.w3.org/Archives/Public/public-xml-core-wg/2009Feb/0019
> > 
> > Allowing canonical equivalents to be treated as identical 
> > directly in XML implies that an element's start tag and
> > end tag could be character-for-character different.  This
> > is not currently the case--such would be not well-formed
> > and the input is therefore not XML--and the WG does not want 
> > to make it the case.
> > 
> > We had no objections to adding some "motherhood" notes saying 
> > that XML producers SHOULD produce normalized output.
> > 
> > We are still considering whether we should put XML 1.1 wording
> > about normalization checking into XML 1.0.
> > 
> > ACTION to Henry:  Discuss with others at the AC meeting
> > the possibility of adding to XML 1.0 via erratum the 
> > "should" normalization checking from XML 1.1.
> > 
> 
> Henry points out that "XML processors SHOULD provide a user option..."
> implies a new feature which means we cannot do this in an erratum,
> so he doesn't think we can change 1.0 to add normalization 
> verification.
> 
> John suggests we could perhaps use some MAY wording associated
> with the motherhood note.
> 
> ACTION to Paul:  Send suggested draft wording to the WG mailing list.

XML 1.1 has added (compared to XML 1.0) section 2.13 on Normalization.
Given that we just want to add an informative note to XML 1.0, I do
not think we should add a new section.  Therefore, I am suggesting
that we add a note to the bottom of section 2.2 Characters (though
I am open to other suggestions).  I do not want a reference from
XML 1.0 to XML 1.1, so I am suggesting we add CharMod to the list
of references in A.2 Other References in XML 1.0.

We could add all the text from XML 1.1 2.13 Normalization Checking
plus appendix B Definitions for Character Normalization, but I am
opting not to do that (though I could perhaps be talking into it).

My suggestion is that we add the following (where things delimited
by underscores should be links to the appropriate definition or
reference) to the end of section 2.2 Characters in XML 1.0:

 Note:

 All XML _parsed entities_ (including _document entities_) SHOULD
 be fully normalized as per _[CharMod]_.

 However, a document is still well-formed even if it is not fully
 normalized. XML processors MAY verify that the document being
 processed is in fully normalized form and report to the application
 whether it is or not.

Then we should also add to A.2 Other References in XML 1.0:

 Charmod
    W3C. Character Model for the World Wide Web 1.0.
    Martin J. Dürst, François Yergeau, Richard Ishida, Misha Wolf,
    Tex Texin. (See http://www.w3.org/TR/2005/REC-charmod-20050215/.)

paul

Received on Friday, 10 April 2009 14:35:48 UTC