- From: Dimitre Novatchev <dnovatchev@gmail.com>
- Date: Fri, 15 Mar 2024 11:25:24 -0700
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: Christian GrĆ¼n <cg@basex.org>, public-xslt-40@w3.org
- Message-ID: <CAK4KnZc0_sUkq9GxkvP9Ck9ydSW-V3FK39_qKfMXY_6mS6efCg@mail.gmail.com>
> > Seems to work like a charm š > > Doesnāt blindly subtracting the code point for 0x110000 run the risk of producing a non-Unicode character? > I think the original code point would have to be inā¦checks notesā¦plane 16, so fairly unlikely, but stillā¦ I think there is a more important question: Will this work correctly with collations that are not Binary (that means that cp1 > cp2 doesn't guarantee that in this collation Char(cp1) > Char(cp2) ). The answer is no - and we have too-many collations (like any CI (case-insensitive) collation) in which 'X' and 'x' are consecutive in the sorted character set of this collation. When sorting using a collation, we must use not the codepoint for a character, but its index in the sorted characters of this collation. This is why it is important to have a function *fn:collation-characters($collation-name as xs:string) as xs:string * that returns the sorted (according to this collation) individual characters of the collation. Not to mention that the user, before specifying a collation name to one of the variety of functions that take collations as parameters, needs to be well-informed of exactly which characters are in this collation. At present we don't have such a function, and it would really be very useful to have a function like this. Thanks, Dimitre On Fri, Mar 15, 2024 at 5:42 AM Norm Tovey-Walsh <norm@saxonica.com> wrote: > Dimitre Novatchev <dnovatchev@gmail.com> writes: > > Seems to work like a charm š > > Doesnāt blindly subtracting the code point for 0x110000 run the risk of > producing a non-Unicode character? I think the original code point would > have to be inā¦checks notesā¦plane 16, so fairly unlikely, but stillā¦ > > Be seeing you, > norm > > -- > Norm Tovey-Walsh > Saxonica >
Received on Friday, 15 March 2024 18:25:42 UTC