consistency of 2007doc entities and Unicode Technical Report#25

There appear to be a few differences between the updated set of
entity definitions
http://www.w3.org/2003/entities/2007doc/overview.html#diff-xhtml1
and UTR #25
http://www.unicode.org/reports/tr25/tr25-9.html

I notice that several mathematical bracket entities have changed
from using CJK punctuation character to new mathematical
characters, consistent with UTR recommendation here:
http://www.unicode.org/reports/tr25/tr25-9.html#_TocDelimiters

However there appear to be a few similar entities that have not
been changed.


lang, LeftAngleBracket, langle still refer to U+2329, and
rang, RightAngleBracket, rangle to U+232A.

"U+2329 LEFT-POINTING ANGLE BRACKET and U+232A RIGHT-POINTING
ANGLE BRACKET, are now deprecated for use with mathematics because
their canonical equivalence to CJK angle brackets is likely to
result in unintended spacing problems when used in mathematical
formulae."

"Unicode 3.2 added two new mathematical angle bracket characters
(U+27E8 and U+27E9) that are unequivocally intended for
mathematical use."
http://www.unicode.org/reports/tr25/tr25-9.html#_Toc25


OverParenthesis, UnderParenthesis, OverBrace, UnderBrace still use
CJK compatibility forms U+FE35 to U+FE39, but there are now
mathematical forms available.

U+23DC TOP PARENTHESIS
U+23DD BOTTOM PARENTHESIS
U+23DE TOP CURLY BRACKET
U+23DF BOTTOM CURLY BRACKET


What do lbbrk and rbbrk (left/right broken bracket) represent?
Are they plain tortoise shell brackets or filled tortoise shell
brackets or something else?

The updated entity definitions have changed these from CJK
punctuation characters U+3014 LEFT TORTOISE SHELL BRACKET and
U+3015 RIGHT TORTOISE SHELL BRACKET (which are usually unfilled?)
to U+2997 LEFT BLACK TORTOISE SHELL BRACKET and U+2998 RIGHT BLACK
TORTOISE SHELL BRACKET (which are wider).

"For ordinary tortoise-shell brackets, the use of U+2772 LIGHT LEFT
TORTOISE SHELL BRACKET ORNAMENT and U+2773 LIGHT RIGHT TORTOISE
SHELL BRACKET ORNAMENT is recommended for mathematical use."


jmath has been changed to U+0237 LATIN SMALL LETTER DOTLESS J (as
there was no dotless j before Unicode 4.1).  Is there a reason why
U+1D6A5 MATHEMATICAL ITALIC SMALL DOTLESS J was not used?

"the Unicode Standard provides the explicitly dotless characters
U+1D6A4 MATHEMATICAL ITALIC DOTLESS I and U+1D6A5 MATHEMATICAL
ITALIC DOTLESS J. They map to the ISOAMSO entities imath and jmath
or the [TeX] macros \imath and \jmath which by default are always
italic."

Perhaps imath should also be changed from U+0131 LATIN SMALL
LETTER DOTLESS I to U+1D6A4 MATHEMATICAL ITALIC SMALL DOTLESS I?

Received on Tuesday, 29 January 2008 03:06:00 UTC