RE: Your comment on the F&O from Martin Duerst on 2003-08-26 (public-qt-comments@w3.org from August 2003)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 26 Aug 2003 14:43:13 -0400
To: "Ashok Malhotra" <ashokma@microsoft.com>, <w3c-i18n-ig@w3.org>, <public-qt-comments@w3.org>
Message-Id: <4.2.0.58.J.20030826142443.04a3dce8@localhost>

Hello Ashok,

Thanks for checking back with us.

At 07:34 03/08/26 -0700, Ashok Malhotra wrote:

>I m having trouble figuring out how to respond to your comment below from
>
><http://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0106.html>h 
>ttp://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0106.html
>
>Could you please provide some guidance?
>
>[82] 7.4.11 normalize-unicode: 'full normalization' needs a defition of the
>
>      relevant constructs. For strings, the string itself is most
>
>      conveniently the relevant construct, but this should be said
>
>      explicitly.

The Character Model contains various definitions of
normalization. (http://www.w3.org/TR/charmod/#sec-TextNormalization).
In particular, 'full normalization' is designed so that pieces
of a format (for example, element content in XML) can easily
be concatenated without creating normalization issues at the
point of concatenation. As an example, in
   <foo>some text u</foo><bar>&#x300; and some more text</bar>
the 'u' and the combining grave accent (&#x300;) would have to
be normalized to a precombined "u with grave" when the content
of the <foo> and <bar> elements are concatenated. So
    a)  <bar>&#x300; and some more text</bar>
is not fully normalized. On the other hand,
    b)  <bar> and some more tex&#x300;t</bar>
is fully normalized, because there is no "x with grave"
precomposed character in Unicode, and "x&#x300;" is the only
way to denote an 'x' with a grave. But how do we distinguish
between case a) and case b)? This distinction is made in
the relevant format (e.g. XML 1.1), which defines the
'relevant constructs', in this case "element content".
Relevant constructs cannot start e.g. with a combining
grave character, but they can contain such a character
internally, unless of course in a case such as a&#x300;,
where it would clearly not be in NFC.
So XQuery should define the relevant constructs when it
speaks about text normalization, the same way XML 1.1
defines the relevant constructs (see http://www.w3.org/TR/xml11/#sec2.13).

My understanding is that in the case above, you are dealing
with simple strings, so the only thing you need to say is
that the relevant construct for the purpose of full normalization
is the whole string.

Hope this helps.     Regards,    Martin.

Received on Tuesday, 26 August 2003 15:06:33 UTC