- From: Christian Grün <cg@basex.org>
- Date: Thu, 14 Mar 2024 17:29:01 +0000
- To: Dimitre Novatchev <dnovatchev@gmail.com>, Norm Tovey-Walsh <norm@saxonica.com>
- CC: "public-xslt-40@w3.org" <public-xslt-40@w3.org>
I see; so in a nutshell it’s this?
sort(
$input,
keys := fn { string-to-codepoints(.) ! (0x110000 - .), 0x110000 }
)
_________________________________
Excellent question.
And yes, I gave an improper mapping, the correct one is:
"" : '$' ,
S1 : Sn ,
S2 : Sn-1 ,
. . . . . . .
Sk : Sn-k+1 ',
. . . . . . .
Sn : S1
and very importantly, Every mapped string must be appended by the '$' character. Just one ending '$' character.
In this way we will have:
Z => "A"||"$",
ZZ => "AA"||"$"
"A$" > "AA$" because $ is the biggest symbol and is > "A"
Thus "AA$" (the value of the inversion of "ZZ") must be returned before "A$" (the value of the inversion of "Z")
Thanks,
Dimitre
Am 14.03.2024 17:06 schrieb Dimitre Novatchev <dnovatchev@gmail.com<mailto:dnovatchev@gmail.com>>:
> > For a simplified example, revert("abc") would produce "zyx" . This is doable and really valuable.
>
> In what sense is “zyx” the complement of “abc”? Over what set of codepoints and in what collation?
>
> I am very skeptical that such a function is well defined across all collations and will always produce a single, correct result in all cases.
>
> Can you provide a detailed description of how this would work?
Yes, as Michael Kay already explained, this is doable if either: the "biggest" symbol in the collation is not used (which btw happens in some collations, for example the biggest symbol in the English(American) collation is 0xFE) - or add an additional symbol that is "bigger" than any other symbol in the collation.
Let us, just for convenience, refer to this special symbol as '$' (this is just a convention on how to refer to this special symbol, not the actual dollar character).
Then, if S1, S2, ..., Sn are all n symbols in the collation ordered by their value in the collation, perform this mapping:
"" : '$' ,
S1 : Sn || '$' ,
S2 : Sn-1 || '$' ,
. . . . . . .
Sk : Sn-k+1 || '$',
. . . . . . .
Sn : S1 || '$'
And certainly, adding a new symbol to a collation is actually creating a new collation, and this would maybe be the most straight-forward way of inverting strings.
We may not even create any new collation, we could just have a convention that a collation named "Inverted" || {Real-Collation-Name} produces the negated comparison results of the ones produced by the {Real-Collation-Name} collation. Or, as I mentioned before, this is the same as "decorating a collation".
This is one more way to get rid of the $orders parameter in our current functions.
Thanks,
Dimitre
On Thu, Mar 14, 2024 at 3:24 AM Norm Tovey-Walsh <norm@saxonica.com<mailto:norm@saxonica.com>> wrote:
Dimitre Novatchev <dnovatchev@gmail.com<mailto:dnovatchev@gmail.com>> writes:
> This function can easily handle strings - produce a "string complement" in the value space for a particular collation.
>
> For a simplified example, revert("abc") would produce "zyx" . This is doable and really valuable.
In what sense is “zyx” the complement of “abc”? Over what set of codepoints and in what collation?
I am very skeptical that such a function is well defined across all collations and will always produce a single, correct result in all cases.
Can you provide a detailed description of how this would work?
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Received on Thursday, 14 March 2024 17:29:09 UTC