Unassigned characters

 > Unassigned: ~[C; L; LC; M; N; P; S; Z].


It occurred to me that Unicode has a class Cn "Unassigned". Unsurprisingly, 
there are no characters in the Unicode database with this class.
http://www.unicode.org/reports/tr44/#General_Category_Values



So presumably if we had a grammar


input: char*.
char: assigned; unassigned.
-assigned: -~[Cn].
unassigned: [Cn].


this should only output characters in the input that are not assigned 
Unicode characters.



For discussion at a call sometime.


Steven

Received on Thursday, 15 December 2022 10:41:00 UTC