- From: Liam R. E. Quin <liam@fromoldbooks.org>
- Date: Thu, 18 Aug 2022 16:20:27 -0400
- To: Steven Pemberton <steven.pemberton@cwi.nl>, ixml <public-ixml@w3.org>
On Thu, 2022-08-18 at 18:12 +0000, Steven Pemberton wrote: > > It is now live. > > I haven't yet updated the Unicode character classes though. > Well, I'm slowly adding them, with the priority being classes L and > Mn which are both used in the ixml grammar. > > What a pain though! It's as if the Unicode design committee put no > thought into it at all. For instance c0-ff are all letters EXCEPT > they've stuck the multiply sign × in the middle, and the divide sign > ÷ somewhere else in the middle. The story (possibly apocryphal) was that in the final vote for ISO 8859, a claim was made that Œ and œ were not needed by any official language, and that these should be replaced by × and ÷ to go with plus and minus. This, it's said, was the Belgian representative being antagonistic towards the French, who weren't present at the meeting. Since œ is also used in English, i suspect it's apocyrphal. > And then the Roman alphabet (in ASCII) has the lowercase letters in > one range, and the upper case in another. This was for bit twiddling purposes. > But the Latin range 100-17E has them alternating (upper, lower)* > EXCEPT at #138 they stick an orphaned character, and then at #149 > they do it again. Once you get beyond the original US ASCII 7-bit range, the bit- twiddling no longer applies. It's better than EBCDIC in which a-z are not contiguous :) -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
Received on Thursday, 18 August 2022 20:22:12 UTC