Re: consistency of 2007doc entities and Unicode Technical Report#25

Karl, thanks for your comments.


> There appear to be a few differences between the updated set of
> entity definitions and UTR #25

In general there are two competing forces at play on the entity sets,

One wants to use the "correct" Unicode character wherever possible, and
where previously there was no character or the character was "clearly"
wrong, then it makes sense to change the definition. However if the old
definition worked but is sub-optimal in some way then especially if the
"better" character is very new (Unicode x for x > a number to be decided...)
then the danger of changing the definition is that existing documents
typeset by existing applications and fonts suddenly change from showing
an acceptable character to showing a missing glyph symbol, as it takes
quite some while for deployed fonts to catch up.

The forces act with different strengths on the TR 25, as that is a
Unicode document essentially documenting how to make best use of the
characters available in the most up to date version of Unicode.
So, as a general principle I don't think it can be guaranteed that all
differences can be eliminated, but obviously eliminating as many as
possible would be a good thing....


> I notice that several mathematical bracket entities have changed
> from using CJK punctuation character to new mathematical
> characters ... However there appear to be a few similar entities that
> have not been changed.

Yes, I did a specific sweep through to remove things in the 3xxx
block but

> OverParenthesis, UnderParenthesis, OverBrace, UnderBrace still use
> CJK compatibility forms U+FE35 to U+FE39, but there are now
> mathematical forms available.

Hmm yes, we should look again at removing the FExx characters.


>  TORTOISE SHELL 

Thanks for flagging these, I'll try to trawl over my various sources
again and see if I can construct any consensus on what previous systems
have (a) thought these characters should look like and (b) what Unicode
slot they should have. I think I'll refrain from trying to give any
instant gut reaction on these ones.

> jmath

This is mapped to the base plane partly for compatibility with imath
and partly for better functionality with mathml's mathvariant attribute.
imath has mapped to U+0131 since forever and I'd be quite worried to
change this, but even if I could be persuaded that it could be safely
changed (that is existing documents wouldn't start getting missing glyph
symbols because some global catalogue updated the entity set used) 
I think there are some benefits in mapping it to the base plane.
<mi mathvariant="bold">x</mi> (for any variant)
is defined as a Unicode to Unicode mapping, you take the base character
and define the construct as being equivalent to the Unicode specified
mathematical bold character. If you put a styled character in the mi in
the first place mathvariant isn't supposed to have any effect.
So my understanding is that with jmath defined as it is, 
<mi mathvariant="bold">&jmath;</mi>
is a bold dotless j, but if jmath mapped to 1D6A4 then it would not be.
(Same argument for imath of course) Now MathML3 could change the rules
but that's my understanding of the rules for  MathML2.


Note these are all personal responses, any changes would require working
group agreement, 

Thanks again for your careful review.

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Tuesday, 29 January 2008 17:26:41 UTC