W3C home > Mailing lists > Public > public-ixml@w3.org > December 2022

Unassigned characters

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Thu, 15 Dec 2022 10:40:46 +0000
Message-Id: <1671100226900.3349799283.413590541@cwi.nl>
To: ixml <public-ixml@w3.org>
 > Unassigned: ~[C; L; LC; M; N; P; S; Z].


It occurred to me that Unicode has a class Cn "Unassigned". Unsurprisingly, 
there are no characters in the Unicode database with this class.
http://www.unicode.org/reports/tr44/#General_Category_Values



So presumably if we had a grammar


input: char*.
char: assigned; unassigned.
-assigned: -~[Cn].
unassigned: [Cn].


this should only output characters in the input that are not assigned 
Unicode characters.



For discussion at a call sometime.


Steven
Received on Thursday, 15 December 2022 10:41:00 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 15 December 2022 10:41:01 UTC