- From: Liam R. E. Quin <liam@fromoldbooks.org>
- Date: Fri, 15 Mar 2024 17:34:26 -0400
- To: "public-xslt-40@w3.org" <public-xslt-40@w3.org>
On Fri, 2024-03-15 at 13:26 -0700, Dimitre Novatchev wrote: > > SQL Server makes this as easy as: > > ``` > > while (@codePoint < 255) Um, we have 21-bit codepoints, so we’d need 2097151. This isn’t practical. Yes, collations are chosen by “word of mouth” — actually by looking them up. In any event, knowing how a collation handles ch or æ or ß being sorted as ss, or combining diacriticals or ij or other multi-character combinations, comes from the reference documentation, not from inspecting a character at a time. We do include the HTML ascii-insensitive collation now in XPath, and that has case insensitivity for a-z/A-Z. In any case, e + combining-accent-grave had better sort the same as e- grave, and don't even think about character-at-a-time for Hindi. Spanish sorts S next to W. In Marathi (widely spoken in India) Lla (ळ, 933 sorts after Ha (ह, 939 and in Hindi it comes in codepoint order. Where multiple combining marks apply to a single base character (e.g. Hindi, Vietnamese, polytonic Greek), the input must be normalized by reordering as needed - see http://www.unicode.org/reports/tr15/ The best way right now to do a descending sort is sort() ! reverse(). However, there’s no such easy way to do a descending sort keeping all the null values (or empty strings) at the end. liam -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
Received on Friday, 15 March 2024 21:34:52 UTC