- From: Martin v. Loewis <martin@loewis.home.cs.tu-berlin.de>
- Date: Tue, 27 Mar 2001 18:30:33 +0200
- To: xml-editor@w3.org
- CC: heninger@us.ibm.com
> BaseChar does not include the remaining Unicode Roman Numerals, > which encompass the range [#x2160-#x2183] > I checked with Mark Davis, and there is nothing from a Unicode > perspective that sets the three included characters apart from the > rest of the Unicode Roman Numerals. It would seem that they either > all ought to be allowed or disallowed as BaseChars. No. Following the procedures explained in section B, it is very clear that U+2180 should be a BaseChar, and that, say, U+2160 should not. Here are the relevant lines from the Unicode 2.0 character data base: 2160;ROMAN NUMERAL ONE;Nl;0;L;<compat> 0049;;;1;N;;;;2170; 2180;ROMAN NUMERAL ONE THOUSAND C D;Nl;0;L;;;;1000;N;;;;; Annex B says # Name start characters must have one of the categories Ll, Lu, Lo, # Lt, Nl. Both are of category Nl, so they qualify. The text goes on saying # Characters which have a font or compatibility decomposition # (i.e. those with a "compatibility formatting tag" in field 5 of the # database -- marked by field 5 beginning with a "<") are not allowed. Now, ROMAN NUMERAL ONE has a compatible mapping of U+0049, LATIN CAPITAL LETTER I, so it is not allowed as a Letter. No restriction applies to U+2180, so it is included. Regards, Martin
Received on Tuesday, 27 March 2001 11:50:30 UTC