RE: Unicode Normalization in XML 1.0 5e

"Maybe"

The problem is that our next teleconference is also on Wednesday, so depending on relative timing, I might have something for you. Actually, having glanced at the Zakim calendar, I see your call is before ours...

As a pointer, I think we'll propose an alternate form of your proposed text, accepting your resolution in general. There has been discussion of it in I18N Core here:

  http://www.w3.org/2009/05/06-core-minutes.html#item05 

And you can find some discussion of the wording, etc., here:

  http://lists.w3.org/Archives/Public/public-i18n-core/2009AprJun/0053.html 


Hopefully as a WG we will finalize all this at our teleconference scheduled for Wednesday, 20 May, at 1900Z:

  http://www.w3.org/Guide/1998/08/teleconference-calendar#s_2132 

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: Grosso, Paul [mailto:pgrosso@ptc.com]
> Sent: Monday, May 18, 2009 8:33 AM
> To: Phillips, Addison; public-xml-core-wg@w3.org
> Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org
> Subject: RE: Unicode Normalization in XML 1.0 5e
> 
> Addison,
> 
> Will you be sending any further input in time for the XML Core WG
> telcon this Wednesday?
> 
> paul
> 
> > -----Original Message-----
> > From: Phillips, Addison [mailto:addison@amazon.com]
> > Sent: Thursday, 2009 April 30 13:00
> > To: Grosso, Paul; public-xml-core-wg@w3.org
> > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org
> > Subject: RE: Unicode Normalization in XML 1.0 5e
> >
> > Hi Paul,
> >
> > Thanks for the note and additional background information.
> > It's quite helpful.
> >
> > We appreciate the restrictions on XML 1.0. I think our
> > concern is not that we want new features, but rather to
> > clarify specifically what the "old features" are and, as
> > necessary, provide useful health warnings for end users.
> >
> > The unpleasant task before our WG is that, given that XML
> > considers two (Unicode) "canonically equivalent" elements
> > represented by different code point sequences to be distinct,
> > how or when should we encourage or insist that other Specs
> > built upon XML normalize items for string identity
> > operations? Clearly specs like XPath and the like are "in
> > trouble" if they normalize (can't select certain discrete
> > elements discretely) and "in trouble" in a different way if
> > they don't (a user's request in one place, even though
> > canonically equivalent to that in the XML document being
> > processed, doesn't match).
> >
> > This suggests that early normalization is a requirement for
> > certain kinds of XML operation to be reliable, an idea that
> > is already unpopular with implementers :-).
> >
> > Regards,
> >
> > Addison
> >
> > Addison Phillips
> > Globalization Architect -- Lab126
> >
> > Internationalization is not a feature.
> > It is an architecture.
> >
> >
> > > -----Original Message-----
> > > From: Grosso, Paul [mailto:pgrosso@ptc.com]
> > > Sent: Thursday, April 30, 2009 10:35 AM
> > > To: Phillips, Addison; public-xml-core-wg@w3.org
> > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org
> > > Subject: RE: Unicode Normalization in XML 1.0 5e
> > >
> > > Addison,
> > >
> > > Thanks for your update.  Please allow me to give you some
> > > more background for your WG's discussion.
> > >
> > > Whereas XML 1.1 does include a normalization checking
> > > option, we cannot add such a feature to XML 1.0.  At
> > > http://www.w3.org/TR/xml11/#sec-normalization-checking

> > > XML 1.1 starts with a sentence that is basically the
> > > first paragraph of the note we propose below (with a
> > > reference to CharMod).
> > >
> > > Then it follows with what is basically the first sentence
> > > of the second paragraph of the note proposed below.  That
> > > paragraph in XML 1.1 goes on to talk about a user option.
> > >
> > > Our staff contact has informed us that we cannot do something
> > > that is effectively introducing a new feature into the
> > > language, and a user option is a new feature.  Hence the
> > > rest of the second paragraph in our proposed note suggests
> > > that processors may do what, in XML 1.1, is allowed by
> > > user option.
> > >
> > > regards,
> > >
> > > paul
> > >
> > > > -----Original Message-----
> > > > From: Phillips, Addison [mailto:addison@amazon.com]
> > > > Sent: Thursday, 2009 April 30 11:42
> > > > To: Grosso, Paul; public-xml-core-wg@w3.org
> > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org
> > > > Subject: RE: Unicode Normalization in XML 1.0 5e
> > > >
> > > > Hello Paul & XML WG,
> > > >
> > > > At our most recent teleconference [1], the
> > > > Internationalization WG discussed your email below regarding
> > > > normalization in XML. We have scheduled time in our next
> > > > teleconference (scheduled for 6 May 2009) to finalize a
> > > > response for you.
> > > >
> > > > Our initial reaction is that we are not quite satisfied with
> > > > the proposed text: we think a stronger health warning is
> > > > probably called for here and would like to suggest one. Also,
> > > > please note that the reference(s) to CharMod need to be
> > > > updated, as Martin Dürst kindly pointed out in [2].
> > > >
> > > > Kind regards,
> > > >
> > > > Addison (for I18N)
> > > >
> > > > Addison Phillips
> > > > Globalization Architect -- Lab126
> > > > Chair -- W3C Internationalization WG
> > > >
> > > > Internationalization is not a feature.
> > > > It is an architecture.
> > > >
> > > >
> > > > [1] http://www.w3.org/2009/04/29-core-minutes.html

> > > > [2]
> > > > http://lists.w3.org/Archives/Public/public-i18n-

> core/2009AprJu
> > > > n/0037.html
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Grosso, Paul [mailto:pgrosso@ptc.com]
> > > > > Sent: Wednesday, April 22, 2009 10:03 AM
> > > > > To: Phillips, Addison; public-xml-core-wg@w3.org
> > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org
> > > > > Subject: RE: Unicode Normalization in XML 1.0 5e
> > > > >
> > > > > Addison et al.,
> > > > >
> > > > > Regarding this issue, the XML Core WG plans to issue
> > > > > an erratum to XML 1.0 5th Edition that adds a note
> > > > > as follows (where things delimited by underscores should
> > > > > be links to the appropriate definition or reference)
> > > > > to the end of section 2.2 Characters in XML 1.0:
> > > > >
> > > > >  Note:
> > > > >
> > > > >  All XML _parsed entities_ (including _document entities_)
> > > SHOULD
> > > > >  be fully normalized as per _[CharMod]_.
> > > > >
> > > > >  However, a document is still well-formed even if it is not
> > > fully
> > > > >  normalized. XML processors MAY verify that the document
> being
> > > > >  processed is in fully normalized form and report to the
> > > > > application
> > > > >  whether it is or not.
> > > > >
> > > > > Then we would also add to A.2 Other References in XML 1.0:
> > > > >
> > > > >  Charmod
> > > > >     W3C. Character Model for the World Wide Web 1.0.
> > > > >     Martin J. Dürst, François Yergeau, Richard Ishida,
> Misha
> > > Wolf,
> > > > >     Tex Texin. (See http://www.w3.org/TR/2005/REC-charmod-

> > > > > 20050215/.)
> > > > >
> > > > > Please let us know if this resolution of your issue is
> > > acceptable.
> > > > >
> > > > > regards,
> > > > >
> > > > > paul
> > > > >
> > > > > Paul Grosso for the XML Core WG
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: public-xml-core-wg-request@w3.org
> > > > > > [mailto:public-xml-core-wg-request@w3.org] On Behalf Of
> > > Grosso,
> > > > > Paul
> > > > > > Sent: Wednesday, 2009 March 11 11:32
> > > > > > To: Phillips, Addison; public-xml-core-wg@w3.org
> > > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org
> > > > > > Subject: RE: Unicode Normalization in XML 1.0 5e
> > > > > >
> > > > > > Addison et al.,
> > > > > >
> > > > > > The XML Core WG has discussed your message during several
> > > > > > telcons, and we are still in the process of determining
> > > > > > just what we might do in response.
> > > > > >
> > > > > > At this time, we are quite sure we do not want to change
> > > > > > the XML spec so that canonical equivalents could be
> treated
> > > > > > as identical directly in XML.  Aside from being a serious
> > > > > > change to parser behavior, this would make some
> previously
> > > > > > ill-formed (non-XML) documents well-formed XML as well as
> > > > > > make some previously well-formed XML ill-formed (non-XML).
> > > > > >
> > > > > > We are also pretty sure it would be a good idea to add at
> > > > > > least a note to the XML 1.0 spec saying that XML
> producers
> > > > > > SHOULD produce normalized output.
> > > > > >
> > > > > > We are considering whether we should add (some version of)
> > > > > > what the XML 1.1 spec says about normalization checking
> [1]
> > > > > > to the XML 1.0 spec.  We haven't made a decision here yet,
> > > > > > and given our biweekly telcon schedule and the upcoming
> AC
> > > > > > meeting, we are not likely to do so until some time in
> April.
> > > > > >
> > > > > > I will, of course, let you know when we have a further
> status
> > > > > > update to give you.
> > > > > >
> > > > > > regards,
> > > > > >
> > > > > > paul
> > > > > >
> > > > > > for the XML Core WG
> > > > > >
> > > > > > [1] http://www.w3.org/TR/xml11/#sec-normalization-

> checking
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: public-xml-core-wg-request@w3.org
> > > > > > > [mailto:public-xml-core-wg-request@w3.org] On Behalf Of
> > > > > > > Phillips, Addison
> > > > > > > Sent: Wednesday, 2009 February 25 0:17
> > > > > > > To: public-xml-core-wg@w3.org
> > > > > > > Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org
> > > > > > > Subject: Unicode Normalization in XML 1.0 5e
> > > > > > >
> > > > > > > Dear XML Core WG,
> > > > > > >
> > > > > > > I am writing on behalf of both the Internationalization
> > > Core
> > > > > > > WG and the HTML Coordination Group (HCG).
> > > > > > >
> > > > > > > Recently there has been an extensive discussion of
> > > > > > > normalization in W3C specifications, mainly related to
> > > > > > > handling of element and attribute names and values (as
> in
> > > > > > > CSS3 Selectors). Some of this discussion revolves
> around
> > > how
> > > > > > > Unicode normalization should work with XML and XML-
> derived
> > > > > > > specifications, hence I was actioned by HCG [0] to
> contact
> > > > > > you folks.
> > > > > > >
> > > > > > > I produced a general summary of the Unicode
> normalization
> > > > > > > problem at [1] for the HCG. Those unfamiliar with
> Unicode
> > > > > > > normalization may wish to review that message.
> > > > > > >
> > > > > > > The basic question is whether XML can (or should?) take
> a
> > > > > > > clearer stance on Unicode normalization. At present,
> XML
> > > 1.0
> > > > > > > 5e, like its predecessors, does not require any
> particular
> > > > > > > normalization form; it says nothing about whether
> canonical
> > > > > > > equivalents in Unicode are "equal" from an XML point of
> > > view;
> > > > > > > and thus implies that Unicode canonical equivalence
> does
> > > > > > > *not* apply when considering an XML document's
> formation.
> > > The
> > > > > > > recommendations in Appendix J (which does include
> > > > > > > normalization among its suggestions) further suggest
> that
> > > > > > > this is true.
> > > > > > >
> > > > > > > On the other hand, it seems reasonable to suppose that
> > > > > > > Unicode canonical equivalence might apply to XML.
> Processes
> > > > > > > such as transcoding legacy charsets to Unicode might
> result
> > > > > > > in canonically-equivalent-but-unequal code point
> sequences,
> > > > > > > for example.
> > > > > > >
> > > > > > > In a survey done at I18N's behest, our Unicode liaison
> > > (Mark
> > > > > > > Davis) produced a survey of content of the Web, as well
> as
> > > a
> > > > > > > summary on performance [2], which found that 99.98% of
> Web
> > > > > > > HTML content was, in fact, in Unicode form NFC. It
> seems
> > > > > > > reasonable to suppose that XML content and documents
> would
> > > > > > > follow a similar pattern.
> > > > > > >
> > > > > > > Our questions to XML Core WG, thus, are:
> > > > > > >
> > > > > > >    What, precisely, should XML say with regard to
> Unicode
> > > > > > > canonical equivalence?
> > > > > > >
> > > > > > >    Would it be possible to require or allow canonical
> > > > > > > equivalents to be treated as identical directly in XML
> (and
> > > > > > > not merely as a side effect of other specifications)?
> > > > > > >
> > > > > > >    Is there a problem if XML permits/requires
> > > > > > > canonically-equivalent-yet-different sequences to be
> > > treated
> > > > > > > as distinct if other specifications require/allow
> canonical
> > > > > > > equivalence to be recognized?
> > > > > > >
> > > > > > > The Internationalization Core WG would be happy to work
> > > with
> > > > > > > you on these thorny issues. Please advise if you need
> more
> > > > > > > information, consultation, participation, or just need
> to
> > > > > vent :-).
> > > > > > >
> > > > > > > Kind Regards,
> > > > > > >
> > > > > > > Addison (for I18N/HCG)
> > > > > > >
> > > > > > >
> > > > > > > [0]
> > > > > > > http://lists.w3.org/Archives/Member/w3c-html-

> > > > > cg/2009JanMar/0061.html
> > > > > > >     See ACTION-29
> > > > > > > [1]
> > > > > > > http://lists.w3.org/Archives/Public/public-i18n-

> > > core/2009JanMa
> > > > > > > r/0259.html
> > > > > > > [2] http://www.macchiato.com/unicode/nfc-faq

> > > > > > >
> > > > > > >
> > > > > > > Addison Phillips
> > > > > > > Globalization Architect -- Lab126
> > > > > > > Chair -- W3C Internationalization WG
> > > > > > >
> > > > > > > Internationalization is not a feature.
> > > > > > > It is an architecture.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > >
> >

Received on Monday, 18 May 2009 17:19:58 UTC