Cotton on Collation

The Schema WG has asked me to respond to Paul Cotton's discussion of 
collations, found in
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0201.html. 
This
response represents my opinion, and will be taken as further feedback on 
the topic by the Working Group.

>4. Section 3.2.1 string
>This section states "The ordered property of string is the Unicode
>character number sequence."  I wonder why the definition of the string
>datatype does not permit a user to define the "collation" to be used?
>"Unicode character number sequence" is only one "collation" and is not very
>useful.  In addition the specification does not explain why this
>"collation" is needed.

A collation sequence defines how comparisons of strings are done to 
establish order. Since we allow minOccurs and maxOccurs to be defined on 
strings, and minimum and maximum can not be defined until we have some way 
to determine whether the value of one string is less than the value of 
another string, I believe that collation sequences are needed for our own 
purposes if we are to compare strings in foreign languages appropriately.

>XML Query will need to support different collations for the string data
>type.  It would be preferable if the collation was defined as part of the
><data type> not as part of the query <predicate>s.  I would recommend you
>consider a solution such as one adopted by SQL to permit the type definer
>to simply name the collation to be used.  No exact definition of the action
>collation needs to be provide since there are several other sources for
>this information.

An advantage of this is that it is possible to sort or compare strings 
appropriately without forcing the person who composes the query to 
explicitly state the collation sequence to be used, which simplifies 
writing queries significantly. I think there would probably be cases in 
which a query still must explicitly specify, e.g. if strings with two 
different collations are compared.

Are there important issues that I'm neglecting here?

Jonathan

Received on Friday, 2 June 2000 11:52:40 UTC