- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Sun, 26 Apr 2009 19:38:15 +0900
- To: "Grosso, Paul" <pgrosso@ptc.com>
- CC: "Phillips, Addison" <addison@amazon.com>, public-xml-core-wg@w3.org, public-i18n-core@w3.org, w3c-html-cg@w3.org
Hello Paul, This looks good, except that instead of http://www.w3.org/TR/2005/REC-charmod-20050215/, the reference has to be to http://www.w3.org/TR/charmod-norm/ (which is still a WD). Regards, Martin. On 2009/04/23 2:03, Grosso, Paul wrote: > Addison et al., > > Regarding this issue, the XML Core WG plans to issue > an erratum to XML 1.0 5th Edition that adds a note > as follows (where things delimited by underscores should > be links to the appropriate definition or reference) > to the end of section 2.2 Characters in XML 1.0: > > Note: > > All XML _parsed entities_ (including _document entities_) SHOULD > be fully normalized as per _[CharMod]_. > > However, a document is still well-formed even if it is not fully > normalized. XML processors MAY verify that the document being > processed is in fully normalized form and report to the application > whether it is or not. > > Then we would also add to A.2 Other References in XML 1.0: > > Charmod > W3C. Character Model for the World Wide Web 1.0. > Martin J. Dürst, François Yergeau, Richard Ishida, Misha Wolf, > Tex Texin. (See http://www.w3.org/TR/2005/REC-charmod-20050215/.) > > Please let us know if this resolution of your issue is acceptable. > > regards, > > paul > > Paul Grosso for the XML Core WG > >> -----Original Message----- >> From: public-xml-core-wg-request@w3.org >> [mailto:public-xml-core-wg-request@w3.org] On Behalf Of Grosso, Paul >> Sent: Wednesday, 2009 March 11 11:32 >> To: Phillips, Addison; public-xml-core-wg@w3.org >> Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org >> Subject: RE: Unicode Normalization in XML 1.0 5e >> >> Addison et al., >> >> The XML Core WG has discussed your message during several >> telcons, and we are still in the process of determining >> just what we might do in response. >> >> At this time, we are quite sure we do not want to change >> the XML spec so that canonical equivalents could be treated >> as identical directly in XML. Aside from being a serious >> change to parser behavior, this would make some previously >> ill-formed (non-XML) documents well-formed XML as well as >> make some previously well-formed XML ill-formed (non-XML). >> >> We are also pretty sure it would be a good idea to add at >> least a note to the XML 1.0 spec saying that XML producers >> SHOULD produce normalized output. >> >> We are considering whether we should add (some version of) >> what the XML 1.1 spec says about normalization checking [1] >> to the XML 1.0 spec. We haven't made a decision here yet, >> and given our biweekly telcon schedule and the upcoming AC >> meeting, we are not likely to do so until some time in April. >> >> I will, of course, let you know when we have a further status >> update to give you. >> >> regards, >> >> paul >> >> for the XML Core WG >> >> [1] http://www.w3.org/TR/xml11/#sec-normalization-checking >> >>> -----Original Message----- >>> From: public-xml-core-wg-request@w3.org >>> [mailto:public-xml-core-wg-request@w3.org] On Behalf Of >>> Phillips, Addison >>> Sent: Wednesday, 2009 February 25 0:17 >>> To: public-xml-core-wg@w3.org >>> Cc: public-i18n-core@w3.org; w3c-html-cg@w3.org >>> Subject: Unicode Normalization in XML 1.0 5e >>> >>> Dear XML Core WG, >>> >>> I am writing on behalf of both the Internationalization Core >>> WG and the HTML Coordination Group (HCG). >>> >>> Recently there has been an extensive discussion of >>> normalization in W3C specifications, mainly related to >>> handling of element and attribute names and values (as in >>> CSS3 Selectors). Some of this discussion revolves around how >>> Unicode normalization should work with XML and XML-derived >>> specifications, hence I was actioned by HCG [0] to contact >> you folks. >>> I produced a general summary of the Unicode normalization >>> problem at [1] for the HCG. Those unfamiliar with Unicode >>> normalization may wish to review that message. >>> >>> The basic question is whether XML can (or should?) take a >>> clearer stance on Unicode normalization. At present, XML 1.0 >>> 5e, like its predecessors, does not require any particular >>> normalization form; it says nothing about whether canonical >>> equivalents in Unicode are "equal" from an XML point of view; >>> and thus implies that Unicode canonical equivalence does >>> *not* apply when considering an XML document's formation. The >>> recommendations in Appendix J (which does include >>> normalization among its suggestions) further suggest that >>> this is true. >>> >>> On the other hand, it seems reasonable to suppose that >>> Unicode canonical equivalence might apply to XML. Processes >>> such as transcoding legacy charsets to Unicode might result >>> in canonically-equivalent-but-unequal code point sequences, >>> for example. >>> >>> In a survey done at I18N's behest, our Unicode liaison (Mark >>> Davis) produced a survey of content of the Web, as well as a >>> summary on performance [2], which found that 99.98% of Web >>> HTML content was, in fact, in Unicode form NFC. It seems >>> reasonable to suppose that XML content and documents would >>> follow a similar pattern. >>> >>> Our questions to XML Core WG, thus, are: >>> >>> What, precisely, should XML say with regard to Unicode >>> canonical equivalence? >>> >>> Would it be possible to require or allow canonical >>> equivalents to be treated as identical directly in XML (and >>> not merely as a side effect of other specifications)? >>> >>> Is there a problem if XML permits/requires >>> canonically-equivalent-yet-different sequences to be treated >>> as distinct if other specifications require/allow canonical >>> equivalence to be recognized? >>> >>> The Internationalization Core WG would be happy to work with >>> you on these thorny issues. Please advise if you need more >>> information, consultation, participation, or just need to vent :-). >>> >>> Kind Regards, >>> >>> Addison (for I18N/HCG) >>> >>> >>> [0] >>> http://lists.w3.org/Archives/Member/w3c-html-cg/2009JanMar/0061.html >>> See ACTION-29 >>> [1] >>> http://lists.w3.org/Archives/Public/public-i18n-core/2009JanMa >>> r/0259.html >>> [2] http://www.macchiato.com/unicode/nfc-faq >>> >>> >>> Addison Phillips >>> Globalization Architect -- Lab126 >>> Chair -- W3C Internationalization WG >>> >>> Internationalization is not a feature. >>> It is an architecture. >>> >>> >>> >> > > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Sunday, 26 April 2009 10:39:17 UTC