Re: String complements from Liam R. E. Quin on 2024-03-15 (public-xslt-40@w3.org from March 2024)

From: Liam R. E. Quin <liam@fromoldbooks.org>
Date: Fri, 15 Mar 2024 17:34:26 -0400
To: "public-xslt-40@w3.org" <public-xslt-40@w3.org>
Message-ID: <63d2fdab54322f64e0b7390e7040d1e5fd5eeffe.camel@fromoldbooks.org>

On Fri, 2024-03-15 at 13:26 -0700, Dimitre Novatchev wrote:
> 
> SQL Server makes this as easy as:
> 
> ```
> 
> while (@codePoint < 255)

Um, we have 21-bit codepoints, so we’d need 2097151.
This isn’t practical.

Yes, collations are chosen by “word of mouth” — actually by looking
them up.

In any event, knowing how a collation handles ch or æ or ß being sorted
as ss, or combining diacriticals or ij or other multi-character
combinations, comes from the reference documentation, not from
inspecting a character at a time.

We do include the HTML ascii-insensitive collation now in XPath, and
that has case insensitivity for a-z/A-Z.

In any case, e + combining-accent-grave had better sort the same as e-
grave, and don't even think about character-at-a-time for Hindi.
Spanish sorts S next to W. In Marathi (widely spoken in India) Lla (ळ,
933 sorts after Ha (ह, 939 and in Hindi it comes in codepoint order.

Where multiple combining marks apply to a single base character (e.g.
Hindi, Vietnamese, polytonic Greek), the input must be normalized by
reordering as needed - see http://www.unicode.org/reports/tr15/

The best way right now to do a descending sort is sort() ! reverse().
However, there’s no such easy way to do a descending sort keeping all
the null values (or empty strings) at the end.

liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org

Received on Friday, 15 March 2024 21:34:52 UTC