- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 26 Aug 2003 14:43:13 -0400
- To: "Ashok Malhotra" <ashokma@microsoft.com>, <w3c-i18n-ig@w3.org>, <public-qt-comments@w3.org>
Hello Ashok, Thanks for checking back with us. At 07:34 03/08/26 -0700, Ashok Malhotra wrote: >I m having trouble figuring out how to respond to your comment below from > ><http://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0106.html>h >ttp://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0106.html > >Could you please provide some guidance? > >[82] 7.4.11 normalize-unicode: 'full normalization' needs a defition of the > > relevant constructs. For strings, the string itself is most > > conveniently the relevant construct, but this should be said > > explicitly. The Character Model contains various definitions of normalization. (http://www.w3.org/TR/charmod/#sec-TextNormalization). In particular, 'full normalization' is designed so that pieces of a format (for example, element content in XML) can easily be concatenated without creating normalization issues at the point of concatenation. As an example, in <foo>some text u</foo><bar>̀ and some more text</bar> the 'u' and the combining grave accent (̀) would have to be normalized to a precombined "u with grave" when the content of the <foo> and <bar> elements are concatenated. So a) <bar>̀ and some more text</bar> is not fully normalized. On the other hand, b) <bar> and some more tex̀t</bar> is fully normalized, because there is no "x with grave" precomposed character in Unicode, and "x̀" is the only way to denote an 'x' with a grave. But how do we distinguish between case a) and case b)? This distinction is made in the relevant format (e.g. XML 1.1), which defines the 'relevant constructs', in this case "element content". Relevant constructs cannot start e.g. with a combining grave character, but they can contain such a character internally, unless of course in a case such as à, where it would clearly not be in NFC. So XQuery should define the relevant constructs when it speaks about text normalization, the same way XML 1.1 defines the relevant constructs (see http://www.w3.org/TR/xml11/#sec2.13). My understanding is that in the case above, you are dealing with simple strings, so the only thing you need to say is that the relevant construct for the purpose of full normalization is the whole string. Hope this helps. Regards, Martin.
Received on Tuesday, 26 August 2003 15:06:33 UTC