Re: Update to unicode.xml

Frédéric,

Thanks again for your comments,

a somewhat belated reply...

>
> - Some characters have mathclass="R?" (with a question mark)... I guess
> that's because the mathclass is not clear, but it is unexpected for
> someone who wants to process the file automatically...
>

fixed at the time, updating from version 11 to 13 of the unicode data.

> - Some Arabic Letters (U+0627-U+063A and Arabic mathematical alphabetic
> symbol) should probably have mathclass="A" since they are used as
> mathematical variables.

ftp://ftp.unicode.org/Public/math/
unicode still at revison 13, so this hasn't changed (unless I decide to 
break from exact matching of mathclass data with the Unicode data)


>
> - Some LaTeX commands map to different Unicode code points. For example
> \mathsfbf{\Alpha} maps to both the capital and small (bold sans-serif)
> alpha, which is clearly wrong (one should be \mathsfbf{\alpha}). See the
> attached diff files for details. They were generated using the attached
> XSLT stylesheet and the Unix command  "xsltproc extract.xsl unicode.xml
> | sort --key=2,2 > commands1.txt; cat commands1.txt | uniq
> --skip-chars=7 > commands2.txt ; diff -U8 commands1.txt commands2.txt >
> commands.diff".


That one is clearly wrong but it isn't necessarily wrong that you get 
duplicate mappings in the latex to unicode direction. Classic Tex fonts 
do not for example have a full Greek alphabet  so  both U+006F and 
U+03BF  ( latin o and Greek omicron) map to o.  Al the entries in your
MathLaTeX-commands.diff file are of this form as far as I can see, Greek 
letters with tonos being overloaded with latin letters with acute
which is not exactly a brilliant mapping but agrees with classical tex 
usage.


By far the biggest set of incorrect mappings wer essentially all the
lowercase greek mathematical alphabet blocks having uppercase names.
I fixed those thanks.

Comments on some others in your LaTeX-commands-diff file below.


  the first group is

  U0002D -
-U02010 -
-U02212 -


that is ascii - and the explicit hyphen and minus characters all mapping 
to - in latex, that again is the best you can do really if mapping to
classic 7bit fonts

-U0201A ,similar mapping the low quotation to a comma matches classical 
TeX usage

-U02024 .  One DOT leader, I suppose this could be \ldotp rather than . 
(although that's the same character in the default setup)

-U02019 ' again using the ascii ' for the right quotation mark is normal 
tex usage

-U00386 \'{A}  as for the mathlatex usage this is the best approximation 
you can do in classic tex font distribution

-U0212B \AA   angstrom and A ring being same character, this seems correct


-U0219D \arrowwaveright
oops thanks, that's the left arrow....


-U025EF \bigcirc
size mapping to classical fonts is a bit strained, this is
"WHITE CIRCLE" and "LARGE CIRCLE" not sure I can do better but 
suggestions welcome

-U022C5 \cdot  "DOT OPERATOR" and "MIDDLE DOT" the latter is a bit of 
abuse of the math font but again accords with 7bit cm font usage.

-U02662 \diamond  diamond/lozenge

-U02729 \ding{73} two different white stars

-U02A63 \ElsevierGlyph{225A} This macro name is not actually usable to 
anyone other than Sebastian I guess

-U00192 f f and script f.
-U00261 g same for g

-U1D716 \in
-U1D750 \in
-U1D78A \in
-U1D7C4 \in
Yes I guess I could wrap some of those in a math alphabet
(although classic tex mathalphabet commands do not affect the standard 
definition of \in)


-U0039C M
Mu=M.
-U1D6B3 M
-U1D6CD M
-U1D6ED M
-U1D707 M
-U1D727 M
-U1D741 M
-U1D761 M
-U1D77B M
-U1D79B M
-U1D7B5 M
Mu=M but I added the math alphabets \mathbf{M} etc


U02306 \perspcorrespond
-U02A5E \perspcorrespond
  Not sure either of these are really defined in classic tex setups,
I made one var... (to match unicode-math)


$ cvs commit -m "lowercase latex greek and other fixes (FW)" unicode.xml
/w3ccvs/WWW/2003/entities/2007xml/unicode.xml,v  <--  unicode.xml
new revision: 1.78; previous revision: 1.77



David

Received on Sunday, 7 September 2014 22:36:07 UTC