Re: String complements

> > Seems to work like a charm šŸ˜€
>
>   Doesnā€™t blindly subtracting the code point for 0x110000 run the risk of
producing a non-Unicode character?
> I think the original code point would have to be inā€¦checks notesā€¦plane
16, so fairly unlikely, but stillā€¦

I think there is a more important question: Will this work correctly with
collations that are not Binary (that means that cp1 > cp2 doesn't guarantee
that in this collation Char(cp1) > Char(cp2) ).

The answer is no - and we have too-many collations (like any CI
(case-insensitive) collation) in which 'X' and 'x' are consecutive in the
sorted character set of this collation.

When sorting using a collation, we must use not the codepoint for a
character, but its index in the sorted characters of this collation.

This is why it is important to have a function



*fn:collation-characters($collation-name as xs:string) as xs:string *
that returns the sorted (according to this collation) individual characters
of the collation.

Not to mention that the user, before specifying a collation name to one of
the variety of functions that take collations as parameters, needs to be
well-informed of exactly which characters are in this collation.

At present we don't have such a function, and it would  really be very
useful to have a function like this.


Thanks,
Dimitre

On Fri, Mar 15, 2024 at 5:42 AM Norm Tovey-Walsh <norm@saxonica.com> wrote:

> Dimitre Novatchev <dnovatchev@gmail.com> writes:
> > Seems to work like a charm šŸ˜€
>
> Doesnā€™t blindly subtracting the code point for 0x110000 run the risk of
> producing a non-Unicode character? I think the original code point would
> have to be inā€¦checks notesā€¦plane 16, so fairly unlikely, but stillā€¦
>
>                                         Be seeing you,
>                                           norm
>
> --
> Norm Tovey-Walsh
> Saxonica
>

Received on Friday, 15 March 2024 18:25:42 UTC