- From: Christian Grün <cg@basex.org>
- Date: Thu, 14 Mar 2024 17:29:01 +0000
- To: Dimitre Novatchev <dnovatchev@gmail.com>, Norm Tovey-Walsh <norm@saxonica.com>
- CC: "public-xslt-40@w3.org" <public-xslt-40@w3.org>
I see; so in a nutshell it’s this? sort( $input, keys := fn { string-to-codepoints(.) ! (0x110000 - .), 0x110000 } ) _________________________________ Excellent question. And yes, I gave an improper mapping, the correct one is: "" : '$' , S1 : Sn , S2 : Sn-1 , . . . . . . . Sk : Sn-k+1 ', . . . . . . . Sn : S1 and very importantly, Every mapped string must be appended by the '$' character. Just one ending '$' character. In this way we will have: Z => "A"||"$", ZZ => "AA"||"$" "A$" > "AA$" because $ is the biggest symbol and is > "A" Thus "AA$" (the value of the inversion of "ZZ") must be returned before "A$" (the value of the inversion of "Z") Thanks, Dimitre Am 14.03.2024 17:06 schrieb Dimitre Novatchev <dnovatchev@gmail.com<mailto:dnovatchev@gmail.com>>: > > For a simplified example, revert("abc") would produce "zyx" . This is doable and really valuable. > > In what sense is “zyx” the complement of “abc”? Over what set of codepoints and in what collation? > > I am very skeptical that such a function is well defined across all collations and will always produce a single, correct result in all cases. > > Can you provide a detailed description of how this would work? Yes, as Michael Kay already explained, this is doable if either: the "biggest" symbol in the collation is not used (which btw happens in some collations, for example the biggest symbol in the English(American) collation is 0xFE) - or add an additional symbol that is "bigger" than any other symbol in the collation. Let us, just for convenience, refer to this special symbol as '$' (this is just a convention on how to refer to this special symbol, not the actual dollar character). Then, if S1, S2, ..., Sn are all n symbols in the collation ordered by their value in the collation, perform this mapping: "" : '$' , S1 : Sn || '$' , S2 : Sn-1 || '$' , . . . . . . . Sk : Sn-k+1 || '$', . . . . . . . Sn : S1 || '$' And certainly, adding a new symbol to a collation is actually creating a new collation, and this would maybe be the most straight-forward way of inverting strings. We may not even create any new collation, we could just have a convention that a collation named "Inverted" || {Real-Collation-Name} produces the negated comparison results of the ones produced by the {Real-Collation-Name} collation. Or, as I mentioned before, this is the same as "decorating a collation". This is one more way to get rid of the $orders parameter in our current functions. Thanks, Dimitre On Thu, Mar 14, 2024 at 3:24 AM Norm Tovey-Walsh <norm@saxonica.com<mailto:norm@saxonica.com>> wrote: Dimitre Novatchev <dnovatchev@gmail.com<mailto:dnovatchev@gmail.com>> writes: > This function can easily handle strings - produce a "string complement" in the value space for a particular collation. > > For a simplified example, revert("abc") would produce "zyx" . This is doable and really valuable. In what sense is “zyx” the complement of “abc”? Over what set of codepoints and in what collation? I am very skeptical that such a function is well defined across all collations and will always produce a single, correct result in all cases. Can you provide a detailed description of how this would work? Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Thursday, 14 March 2024 17:29:09 UTC