- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Mon, 18 May 2009 11:32:36 -0400
- To: "Phillips, Addison" <addison@amazon.com>, <public-xml-core-wg@w3.org>
- Cc: <public-i18n-core@w3.org>, <w3c-html-cg@w3.org>
Addison, Will you be sending any further input in time for the XML Core WG telcon this Wednesday? paul > -----Original Message----- > From: Phillips, Addison [mailto:addison@amazon.com] > Sent: Thursday, 2009 April 30 13:00 > To: Grosso, Paul; public-xml-core-wg@w3.org > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > Subject: RE: Unicode Normalization in XML 1.0 5e > > Hi Paul, > > Thanks for the note and additional background information. > It's quite helpful. > > We appreciate the restrictions on XML 1.0. I think our > concern is not that we want new features, but rather to > clarify specifically what the "old features" are and, as > necessary, provide useful health warnings for end users. > > The unpleasant task before our WG is that, given that XML > considers two (Unicode) "canonically equivalent" elements > represented by different code point sequences to be distinct, > how or when should we encourage or insist that other Specs > built upon XML normalize items for string identity > operations? Clearly specs like XPath and the like are "in > trouble" if they normalize (can't select certain discrete > elements discretely) and "in trouble" in a different way if > they don't (a user's request in one place, even though > canonically equivalent to that in the XML document being > processed, doesn't match). > > This suggests that early normalization is a requirement for > certain kinds of XML operation to be reliable, an idea that > is already unpopular with implementers :-). > > Regards, > > Addison > > Addison Phillips > Globalization Architect -- Lab126 > > Internationalization is not a feature. > It is an architecture. > > > > -----Original Message----- > > From: Grosso, Paul [mailto:pgrosso@ptc.com] > > Sent: Thursday, April 30, 2009 10:35 AM > > To: Phillips, Addison; public-xml-core-wg@w3.org > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > Addison, > > > > Thanks for your update. Please allow me to give you some > > more background for your WG's discussion. > > > > Whereas XML 1.1 does include a normalization checking > > option, we cannot add such a feature to XML 1.0. At > > http://www.w3.org/TR/xml11/#sec-normalization-checking > > XML 1.1 starts with a sentence that is basically the > > first paragraph of the note we propose below (with a > > reference to CharMod). > > > > Then it follows with what is basically the first sentence > > of the second paragraph of the note proposed below. That > > paragraph in XML 1.1 goes on to talk about a user option. > > > > Our staff contact has informed us that we cannot do something > > that is effectively introducing a new feature into the > > language, and a user option is a new feature. Hence the > > rest of the second paragraph in our proposed note suggests > > that processors may do what, in XML 1.1, is allowed by > > user option. > > > > regards, > > > > paul > > > > > -----Original Message----- > > > From: Phillips, Addison [mailto:addison@amazon.com] > > > Sent: Thursday, 2009 April 30 11:42 > > > To: Grosso, Paul; public-xml-core-wg@w3.org > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > > > Hello Paul & XML WG, > > > > > > At our most recent teleconference [1], the > > > Internationalization WG discussed your email below regarding > > > normalization in XML. We have scheduled time in our next > > > teleconference (scheduled for 6 May 2009) to finalize a > > > response for you. > > > > > > Our initial reaction is that we are not quite satisfied with > > > the proposed text: we think a stronger health warning is > > > probably called for here and would like to suggest one. Also, > > > please note that the reference(s) to CharMod need to be > > > updated, as Martin Dürst kindly pointed out in [2]. > > > > > > Kind regards, > > > > > > Addison (for I18N) > > > > > > Addison Phillips > > > Globalization Architect -- Lab126 > > > Chair -- W3C Internationalization WG > > > > > > Internationalization is not a feature. > > > It is an architecture. > > > > > > > > > [1] http://www.w3.org/2009/04/29-core-minutes.html > > > [2] > > > http://lists.w3.org/Archives/Public/public-i18n-core/2009AprJu > > > n/0037.html > > > > > > > > > > -----Original Message----- > > > > From: Grosso, Paul [mailto:pgrosso@ptc.com] > > > > Sent: Wednesday, April 22, 2009 10:03 AM > > > > To: Phillips, Addison; public-xml-core-wg@w3.org > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > > > > > Addison et al., > > > > > > > > Regarding this issue, the XML Core WG plans to issue > > > > an erratum to XML 1.0 5th Edition that adds a note > > > > as follows (where things delimited by underscores should > > > > be links to the appropriate definition or reference) > > > > to the end of section 2.2 Characters in XML 1.0: > > > > > > > > Note: > > > > > > > > All XML _parsed entities_ (including _document entities_) > > SHOULD > > > > be fully normalized as per _[CharMod]_. > > > > > > > > However, a document is still well-formed even if it is not > > fully > > > > normalized. XML processors MAY verify that the document being > > > > processed is in fully normalized form and report to the > > > > application > > > > whether it is or not. > > > > > > > > Then we would also add to A.2 Other References in XML 1.0: > > > > > > > > Charmod > > > > W3C. Character Model for the World Wide Web 1.0. > > > > Martin J. Dürst, François Yergeau, Richard Ishida, Misha > > Wolf, > > > > Tex Texin. (See http://www.w3.org/TR/2005/REC-charmod- > > > > 20050215/.) > > > > > > > > Please let us know if this resolution of your issue is > > acceptable. > > > > > > > > regards, > > > > > > > > paul > > > > > > > > Paul Grosso for the XML Core WG > > > > > > > > > -----Original Message----- > > > > > From: public-xml-core-wg-request@w3.org > > > > > [mailto:public-xml-core-wg-request@w3.org] On Behalf Of > > Grosso, > > > > Paul > > > > > Sent: Wednesday, 2009 March 11 11:32 > > > > > To: Phillips, Addison; public-xml-core-wg@w3.org > > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > > > > > > > Addison et al., > > > > > > > > > > The XML Core WG has discussed your message during several > > > > > telcons, and we are still in the process of determining > > > > > just what we might do in response. > > > > > > > > > > At this time, we are quite sure we do not want to change > > > > > the XML spec so that canonical equivalents could be treated > > > > > as identical directly in XML. Aside from being a serious > > > > > change to parser behavior, this would make some previously > > > > > ill-formed (non-XML) documents well-formed XML as well as > > > > > make some previously well-formed XML ill-formed (non-XML). > > > > > > > > > > We are also pretty sure it would be a good idea to add at > > > > > least a note to the XML 1.0 spec saying that XML producers > > > > > SHOULD produce normalized output. > > > > > > > > > > We are considering whether we should add (some version of) > > > > > what the XML 1.1 spec says about normalization checking [1] > > > > > to the XML 1.0 spec. We haven't made a decision here yet, > > > > > and given our biweekly telcon schedule and the upcoming AC > > > > > meeting, we are not likely to do so until some time in April. > > > > > > > > > > I will, of course, let you know when we have a further status > > > > > update to give you. > > > > > > > > > > regards, > > > > > > > > > > paul > > > > > > > > > > for the XML Core WG > > > > > > > > > > [1] http://www.w3.org/TR/xml11/#sec-normalization-checking > > > > > > > > > > > -----Original Message----- > > > > > > From: public-xml-core-wg-request@w3.org > > > > > > [mailto:public-xml-core-wg-request@w3.org] On Behalf Of > > > > > > Phillips, Addison > > > > > > Sent: Wednesday, 2009 February 25 0:17 > > > > > > To: public-xml-core-wg@w3.org > > > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > > > > Subject: Unicode Normalization in XML 1.0 5e > > > > > > > > > > > > Dear XML Core WG, > > > > > > > > > > > > I am writing on behalf of both the Internationalization > > Core > > > > > > WG and the HTML Coordination Group (HCG). > > > > > > > > > > > > Recently there has been an extensive discussion of > > > > > > normalization in W3C specifications, mainly related to > > > > > > handling of element and attribute names and values (as in > > > > > > CSS3 Selectors). Some of this discussion revolves around > > how > > > > > > Unicode normalization should work with XML and XML-derived > > > > > > specifications, hence I was actioned by HCG [0] to contact > > > > > you folks. > > > > > > > > > > > > I produced a general summary of the Unicode normalization > > > > > > problem at [1] for the HCG. Those unfamiliar with Unicode > > > > > > normalization may wish to review that message. > > > > > > > > > > > > The basic question is whether XML can (or should?) take a > > > > > > clearer stance on Unicode normalization. At present, XML > > 1.0 > > > > > > 5e, like its predecessors, does not require any particular > > > > > > normalization form; it says nothing about whether canonical > > > > > > equivalents in Unicode are "equal" from an XML point of > > view; > > > > > > and thus implies that Unicode canonical equivalence does > > > > > > *not* apply when considering an XML document's formation. > > The > > > > > > recommendations in Appendix J (which does include > > > > > > normalization among its suggestions) further suggest that > > > > > > this is true. > > > > > > > > > > > > On the other hand, it seems reasonable to suppose that > > > > > > Unicode canonical equivalence might apply to XML. Processes > > > > > > such as transcoding legacy charsets to Unicode might result > > > > > > in canonically-equivalent-but-unequal code point sequences, > > > > > > for example. > > > > > > > > > > > > In a survey done at I18N's behest, our Unicode liaison > > (Mark > > > > > > Davis) produced a survey of content of the Web, as well as > > a > > > > > > summary on performance [2], which found that 99.98% of Web > > > > > > HTML content was, in fact, in Unicode form NFC. It seems > > > > > > reasonable to suppose that XML content and documents would > > > > > > follow a similar pattern. > > > > > > > > > > > > Our questions to XML Core WG, thus, are: > > > > > > > > > > > > What, precisely, should XML say with regard to Unicode > > > > > > canonical equivalence? > > > > > > > > > > > > Would it be possible to require or allow canonical > > > > > > equivalents to be treated as identical directly in XML (and > > > > > > not merely as a side effect of other specifications)? > > > > > > > > > > > > Is there a problem if XML permits/requires > > > > > > canonically-equivalent-yet-different sequences to be > > treated > > > > > > as distinct if other specifications require/allow canonical > > > > > > equivalence to be recognized? > > > > > > > > > > > > The Internationalization Core WG would be happy to work > > with > > > > > > you on these thorny issues. Please advise if you need more > > > > > > information, consultation, participation, or just need to > > > > vent :-). > > > > > > > > > > > > Kind Regards, > > > > > > > > > > > > Addison (for I18N/HCG) > > > > > > > > > > > > > > > > > > [0] > > > > > > http://lists.w3.org/Archives/Member/w3c-html- > > > > cg/2009JanMar/0061.html > > > > > > See ACTION-29 > > > > > > [1] > > > > > > http://lists.w3.org/Archives/Public/public-i18n- > > core/2009JanMa > > > > > > r/0259.html > > > > > > [2] http://www.macchiato.com/unicode/nfc-faq > > > > > > > > > > > > > > > > > > Addison Phillips > > > > > > Globalization Architect -- Lab126 > > > > > > Chair -- W3C Internationalization WG > > > > > > > > > > > > Internationalization is not a feature. > > > > > > It is an architecture. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Received on Monday, 18 May 2009 15:33:22 UTC