- From: Phillips, Addison <addison@amazon.com>
- Date: Mon, 18 May 2009 10:19:15 -0700
- To: "Grosso, Paul" <pgrosso@ptc.com>, "public-xml-core-wg@w3.org" <public-xml-core-wg@w3.org>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "w3c-html-cg@w3.org" <w3c-html-cg@w3.org>
"Maybe" The problem is that our next teleconference is also on Wednesday, so depending on relative timing, I might have something for you. Actually, having glanced at the Zakim calendar, I see your call is before ours... As a pointer, I think we'll propose an alternate form of your proposed text, accepting your resolution in general. There has been discussion of it in I18N Core here: http://www.w3.org/2009/05/06-core-minutes.html#item05 And you can find some discussion of the wording, etc., here: http://lists.w3.org/Archives/Public/public-i18n-core/2009AprJun/0053.html Hopefully as a WG we will finalize all this at our teleconference scheduled for Wednesday, 20 May, at 1900Z: http://www.w3.org/Guide/1998/08/teleconference-calendar#s_2132 Addison Phillips Globalization Architect -- Lab126 Internationalization is not a feature. It is an architecture. > -----Original Message----- > From: Grosso, Paul [mailto:pgrosso@ptc.com] > Sent: Monday, May 18, 2009 8:33 AM > To: Phillips, Addison; public-xml-core-wg@w3.org > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > Subject: RE: Unicode Normalization in XML 1.0 5e > > Addison, > > Will you be sending any further input in time for the XML Core WG > telcon this Wednesday? > > paul > > > -----Original Message----- > > From: Phillips, Addison [mailto:addison@amazon.com] > > Sent: Thursday, 2009 April 30 13:00 > > To: Grosso, Paul; public-xml-core-wg@w3.org > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > Hi Paul, > > > > Thanks for the note and additional background information. > > It's quite helpful. > > > > We appreciate the restrictions on XML 1.0. I think our > > concern is not that we want new features, but rather to > > clarify specifically what the "old features" are and, as > > necessary, provide useful health warnings for end users. > > > > The unpleasant task before our WG is that, given that XML > > considers two (Unicode) "canonically equivalent" elements > > represented by different code point sequences to be distinct, > > how or when should we encourage or insist that other Specs > > built upon XML normalize items for string identity > > operations? Clearly specs like XPath and the like are "in > > trouble" if they normalize (can't select certain discrete > > elements discretely) and "in trouble" in a different way if > > they don't (a user's request in one place, even though > > canonically equivalent to that in the XML document being > > processed, doesn't match). > > > > This suggests that early normalization is a requirement for > > certain kinds of XML operation to be reliable, an idea that > > is already unpopular with implementers :-). > > > > Regards, > > > > Addison > > > > Addison Phillips > > Globalization Architect -- Lab126 > > > > Internationalization is not a feature. > > It is an architecture. > > > > > > > -----Original Message----- > > > From: Grosso, Paul [mailto:pgrosso@ptc.com] > > > Sent: Thursday, April 30, 2009 10:35 AM > > > To: Phillips, Addison; public-xml-core-wg@w3.org > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > > > Addison, > > > > > > Thanks for your update. Please allow me to give you some > > > more background for your WG's discussion. > > > > > > Whereas XML 1.1 does include a normalization checking > > > option, we cannot add such a feature to XML 1.0. At > > > http://www.w3.org/TR/xml11/#sec-normalization-checking > > > XML 1.1 starts with a sentence that is basically the > > > first paragraph of the note we propose below (with a > > > reference to CharMod). > > > > > > Then it follows with what is basically the first sentence > > > of the second paragraph of the note proposed below. That > > > paragraph in XML 1.1 goes on to talk about a user option. > > > > > > Our staff contact has informed us that we cannot do something > > > that is effectively introducing a new feature into the > > > language, and a user option is a new feature. Hence the > > > rest of the second paragraph in our proposed note suggests > > > that processors may do what, in XML 1.1, is allowed by > > > user option. > > > > > > regards, > > > > > > paul > > > > > > > -----Original Message----- > > > > From: Phillips, Addison [mailto:addison@amazon.com] > > > > Sent: Thursday, 2009 April 30 11:42 > > > > To: Grosso, Paul; public-xml-core-wg@w3.org > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > > > > > Hello Paul & XML WG, > > > > > > > > At our most recent teleconference [1], the > > > > Internationalization WG discussed your email below regarding > > > > normalization in XML. We have scheduled time in our next > > > > teleconference (scheduled for 6 May 2009) to finalize a > > > > response for you. > > > > > > > > Our initial reaction is that we are not quite satisfied with > > > > the proposed text: we think a stronger health warning is > > > > probably called for here and would like to suggest one. Also, > > > > please note that the reference(s) to CharMod need to be > > > > updated, as Martin Dürst kindly pointed out in [2]. > > > > > > > > Kind regards, > > > > > > > > Addison (for I18N) > > > > > > > > Addison Phillips > > > > Globalization Architect -- Lab126 > > > > Chair -- W3C Internationalization WG > > > > > > > > Internationalization is not a feature. > > > > It is an architecture. > > > > > > > > > > > > [1] http://www.w3.org/2009/04/29-core-minutes.html > > > > [2] > > > > http://lists.w3.org/Archives/Public/public-i18n- > core/2009AprJu > > > > n/0037.html > > > > > > > > > > > > > -----Original Message----- > > > > > From: Grosso, Paul [mailto:pgrosso@ptc.com] > > > > > Sent: Wednesday, April 22, 2009 10:03 AM > > > > > To: Phillips, Addison; public-xml-core-wg@w3.org > > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > > > > > > > Addison et al., > > > > > > > > > > Regarding this issue, the XML Core WG plans to issue > > > > > an erratum to XML 1.0 5th Edition that adds a note > > > > > as follows (where things delimited by underscores should > > > > > be links to the appropriate definition or reference) > > > > > to the end of section 2.2 Characters in XML 1.0: > > > > > > > > > > Note: > > > > > > > > > > All XML _parsed entities_ (including _document entities_) > > > SHOULD > > > > > be fully normalized as per _[CharMod]_. > > > > > > > > > > However, a document is still well-formed even if it is not > > > fully > > > > > normalized. XML processors MAY verify that the document > being > > > > > processed is in fully normalized form and report to the > > > > > application > > > > > whether it is or not. > > > > > > > > > > Then we would also add to A.2 Other References in XML 1.0: > > > > > > > > > > Charmod > > > > > W3C. Character Model for the World Wide Web 1.0. > > > > > Martin J. Dürst, François Yergeau, Richard Ishida, > Misha > > > Wolf, > > > > > Tex Texin. (See http://www.w3.org/TR/2005/REC-charmod- > > > > > 20050215/.) > > > > > > > > > > Please let us know if this resolution of your issue is > > > acceptable. > > > > > > > > > > regards, > > > > > > > > > > paul > > > > > > > > > > Paul Grosso for the XML Core WG > > > > > > > > > > > -----Original Message----- > > > > > > From: public-xml-core-wg-request@w3.org > > > > > > [mailto:public-xml-core-wg-request@w3.org] On Behalf Of > > > Grosso, > > > > > Paul > > > > > > Sent: Wednesday, 2009 March 11 11:32 > > > > > > To: Phillips, Addison; public-xml-core-wg@w3.org > > > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > > > > Subject: RE: Unicode Normalization in XML 1.0 5e > > > > > > > > > > > > Addison et al., > > > > > > > > > > > > The XML Core WG has discussed your message during several > > > > > > telcons, and we are still in the process of determining > > > > > > just what we might do in response. > > > > > > > > > > > > At this time, we are quite sure we do not want to change > > > > > > the XML spec so that canonical equivalents could be > treated > > > > > > as identical directly in XML. Aside from being a serious > > > > > > change to parser behavior, this would make some > previously > > > > > > ill-formed (non-XML) documents well-formed XML as well as > > > > > > make some previously well-formed XML ill-formed (non-XML). > > > > > > > > > > > > We are also pretty sure it would be a good idea to add at > > > > > > least a note to the XML 1.0 spec saying that XML > producers > > > > > > SHOULD produce normalized output. > > > > > > > > > > > > We are considering whether we should add (some version of) > > > > > > what the XML 1.1 spec says about normalization checking > [1] > > > > > > to the XML 1.0 spec. We haven't made a decision here yet, > > > > > > and given our biweekly telcon schedule and the upcoming > AC > > > > > > meeting, we are not likely to do so until some time in > April. > > > > > > > > > > > > I will, of course, let you know when we have a further > status > > > > > > update to give you. > > > > > > > > > > > > regards, > > > > > > > > > > > > paul > > > > > > > > > > > > for the XML Core WG > > > > > > > > > > > > [1] http://www.w3.org/TR/xml11/#sec-normalization- > checking > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: public-xml-core-wg-request@w3.org > > > > > > > [mailto:public-xml-core-wg-request@w3.org] On Behalf Of > > > > > > > Phillips, Addison > > > > > > > Sent: Wednesday, 2009 February 25 0:17 > > > > > > > To: public-xml-core-wg@w3.org > > > > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org > > > > > > > Subject: Unicode Normalization in XML 1.0 5e > > > > > > > > > > > > > > Dear XML Core WG, > > > > > > > > > > > > > > I am writing on behalf of both the Internationalization > > > Core > > > > > > > WG and the HTML Coordination Group (HCG). > > > > > > > > > > > > > > Recently there has been an extensive discussion of > > > > > > > normalization in W3C specifications, mainly related to > > > > > > > handling of element and attribute names and values (as > in > > > > > > > CSS3 Selectors). Some of this discussion revolves > around > > > how > > > > > > > Unicode normalization should work with XML and XML- > derived > > > > > > > specifications, hence I was actioned by HCG [0] to > contact > > > > > > you folks. > > > > > > > > > > > > > > I produced a general summary of the Unicode > normalization > > > > > > > problem at [1] for the HCG. Those unfamiliar with > Unicode > > > > > > > normalization may wish to review that message. > > > > > > > > > > > > > > The basic question is whether XML can (or should?) take > a > > > > > > > clearer stance on Unicode normalization. At present, > XML > > > 1.0 > > > > > > > 5e, like its predecessors, does not require any > particular > > > > > > > normalization form; it says nothing about whether > canonical > > > > > > > equivalents in Unicode are "equal" from an XML point of > > > view; > > > > > > > and thus implies that Unicode canonical equivalence > does > > > > > > > *not* apply when considering an XML document's > formation. > > > The > > > > > > > recommendations in Appendix J (which does include > > > > > > > normalization among its suggestions) further suggest > that > > > > > > > this is true. > > > > > > > > > > > > > > On the other hand, it seems reasonable to suppose that > > > > > > > Unicode canonical equivalence might apply to XML. > Processes > > > > > > > such as transcoding legacy charsets to Unicode might > result > > > > > > > in canonically-equivalent-but-unequal code point > sequences, > > > > > > > for example. > > > > > > > > > > > > > > In a survey done at I18N's behest, our Unicode liaison > > > (Mark > > > > > > > Davis) produced a survey of content of the Web, as well > as > > > a > > > > > > > summary on performance [2], which found that 99.98% of > Web > > > > > > > HTML content was, in fact, in Unicode form NFC. It > seems > > > > > > > reasonable to suppose that XML content and documents > would > > > > > > > follow a similar pattern. > > > > > > > > > > > > > > Our questions to XML Core WG, thus, are: > > > > > > > > > > > > > > What, precisely, should XML say with regard to > Unicode > > > > > > > canonical equivalence? > > > > > > > > > > > > > > Would it be possible to require or allow canonical > > > > > > > equivalents to be treated as identical directly in XML > (and > > > > > > > not merely as a side effect of other specifications)? > > > > > > > > > > > > > > Is there a problem if XML permits/requires > > > > > > > canonically-equivalent-yet-different sequences to be > > > treated > > > > > > > as distinct if other specifications require/allow > canonical > > > > > > > equivalence to be recognized? > > > > > > > > > > > > > > The Internationalization Core WG would be happy to work > > > with > > > > > > > you on these thorny issues. Please advise if you need > more > > > > > > > information, consultation, participation, or just need > to > > > > > vent :-). > > > > > > > > > > > > > > Kind Regards, > > > > > > > > > > > > > > Addison (for I18N/HCG) > > > > > > > > > > > > > > > > > > > > > [0] > > > > > > > http://lists.w3.org/Archives/Member/w3c-html- > > > > > cg/2009JanMar/0061.html > > > > > > > See ACTION-29 > > > > > > > [1] > > > > > > > http://lists.w3.org/Archives/Public/public-i18n- > > > core/2009JanMa > > > > > > > r/0259.html > > > > > > > [2] http://www.macchiato.com/unicode/nfc-faq > > > > > > > > > > > > > > > > > > > > > Addison Phillips > > > > > > > Globalization Architect -- Lab126 > > > > > > > Chair -- W3C Internationalization WG > > > > > > > > > > > > > > Internationalization is not a feature. > > > > > > > It is an architecture. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Received on Monday, 18 May 2009 17:19:59 UTC