[Bug 3245] Equality of strings

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3245


cmsmcq@w3.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |needsDrafting




------- Comment #9 from cmsmcq@w3.org  2007-10-14 19:56 -------
The WG discussed this issue both with Query and XSL, and then among
ourselves, at the October 2007 ftf meetings in Redmond.  See also bug
3222, which is closely related in practice.

We discussed several proposals for defining equality conditions for
string which might depend on normalization and/or collation
information.  Eventually, we converged on a proposal to add a
Unicode-normalization facet applicable to xs:string. Its value will be
an identifier denoting a specific Unicode collation form (e.g. 'c').
To begin with, the only legal values will be the identifier for
normalization form C and ABSENT.  The default value will be ABSENT,
which means the unnormalized form is used.  Once specified, the facet
cannot be changed (it's effectively fixed from the time of first use).
The meaning of the facet is that the lexical form is prepared by
calculating the named normaliztion form for the 'normalized value' in
the input infoset, and then performing whitespace normalization to
calculate the candidate lexical form.

We noted that it does matter that Unicode normalization be done first:
For the string s = x, y, z, space, space, combining umlaut, x, y, z,
it's clear that norm(ws(s)) = x, y, z, non-combining umlaut, x, y, z,
while ws(norm(s)) = x, y, z, space, non-combining umlaut, x, y, z.  We
thought that in this case the double space in the original seems a
clear signal that two tokens are intended, not one.

After the meeting, it occurred to some WG members that it might be
good to have an explicit identifier for no-normalization, so that the
value of the facet can be fixed that way if desired.  (This would
entail reformulatiing the rule about changing the facet:  the value
might change from no-normalization to some normalization form, but
not from any specified normalization form to any other value.)

Received on Sunday, 14 October 2007 19:56:37 UTC