- From: Christian Grün <cg@basex.org>
- Date: Fri, 15 Mar 2024 18:41:26 +0000
- To: Dimitre Novatchev <dnovatchev@gmail.com>
- CC: Norm Tovey-Walsh <norm@saxonica.com>, "public-xslt-40@w3.org" <public-xslt-40@w3.org>
- Message-ID: <245ce2bf-8c84-4dca-b6d0-b924f2187bd3@email.android.com>
Before requesting new functions, we should clarify if there’s a problem to solve – or, in other words, if at least 2, 3 people believe there's a problem.
Am 15.03.2024 19:25 schrieb Dimitre Novatchev <dnovatchev@gmail.com>:
> > Seems to work like a charm 😀
>
> Doesn’t blindly subtracting the code point for 0x110000 run the risk of producing a non-Unicode character?
> I think the original code point would have to be in…checks notes…plane 16, so fairly unlikely, but still…
I think there is a more important question: Will this work correctly with collations that are not Binary (that means that cp1 > cp2 doesn't guarantee that in this collation Char(cp1) > Char(cp2) ).
The answer is no - and we have too-many collations (like any CI (case-insensitive) collation) in which 'X' and 'x' are consecutive in the sorted character set of this collation.
When sorting using a collation, we must use not the codepoint for a character, but its index in the sorted characters of this collation.
This is why it is important to have a function
fn:collation-characters($collation-name as xs:string) as xs:string
that returns the sorted (according to this collation) individual characters of the collation.
Not to mention that the user, before specifying a collation name to one of the variety of functions that take collations as parameters, needs to be well-informed of exactly which characters are in this collation.
At present we don't have such a function, and it would really be very useful to have a function like this.
Thanks,
Dimitre
On Fri, Mar 15, 2024 at 5:42 AM Norm Tovey-Walsh <norm@saxonica.com<mailto:norm@saxonica.com>> wrote:
Dimitre Novatchev <dnovatchev@gmail.com<mailto:dnovatchev@gmail.com>> writes:
> Seems to work like a charm 😀
Doesn’t blindly subtracting the code point for 0x110000 run the risk of producing a non-Unicode character? I think the original code point would have to be in…checks notes…plane 16, so fairly unlikely, but still…
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Received on Friday, 15 March 2024 18:41:33 UTC