BaseChar problem in XML 1.0?

> BaseChar does not include the remaining Unicode Roman Numerals,
> which encompass the range [#x2160-#x2183]

> I checked with Mark Davis, and there is nothing from a Unicode
> perspective that sets the three included characters apart from the
> rest of the Unicode Roman Numerals.  It would seem that they either
> all ought to be allowed or disallowed as BaseChars.

No. Following the procedures explained in section B, it is very clear
that U+2180 should be a BaseChar, and that, say, U+2160 should
not. Here are the relevant lines from the Unicode 2.0 character data
base:

2160;ROMAN NUMERAL ONE;Nl;0;L;<compat> 0049;;;1;N;;;;2170;
2180;ROMAN NUMERAL ONE THOUSAND C D;Nl;0;L;;;;1000;N;;;;;

Annex B says

# Name start characters must have one of the categories Ll, Lu, Lo,
# Lt, Nl.

Both are of category Nl, so they qualify. The text goes on saying

# Characters which have a font or compatibility decomposition
# (i.e. those with a "compatibility formatting tag" in field 5 of the
# database -- marked by field 5 beginning with a "<") are not allowed.

Now, ROMAN NUMERAL ONE has a compatible mapping of U+0049, LATIN
CAPITAL LETTER I, so it is not allowed as a Letter. No restriction
applies to U+2180, so it is included.

Regards,
Martin

Received on Tuesday, 27 March 2001 11:50:30 UTC