- From: David Carlisle <davidc@nag.co.uk>
- Date: Sun, 7 Sep 2014 23:35:37 +0100
- To: <www-math@w3.org>
Frédéric,
Thanks again for your comments,
a somewhat belated reply...
>
> - Some characters have mathclass="R?" (with a question mark)... I guess
> that's because the mathclass is not clear, but it is unexpected for
> someone who wants to process the file automatically...
>
fixed at the time, updating from version 11 to 13 of the unicode data.
> - Some Arabic Letters (U+0627-U+063A and Arabic mathematical alphabetic
> symbol) should probably have mathclass="A" since they are used as
> mathematical variables.
ftp://ftp.unicode.org/Public/math/
unicode still at revison 13, so this hasn't changed (unless I decide to
break from exact matching of mathclass data with the Unicode data)
>
> - Some LaTeX commands map to different Unicode code points. For example
> \mathsfbf{\Alpha} maps to both the capital and small (bold sans-serif)
> alpha, which is clearly wrong (one should be \mathsfbf{\alpha}). See the
> attached diff files for details. They were generated using the attached
> XSLT stylesheet and the Unix command "xsltproc extract.xsl unicode.xml
> | sort --key=2,2 > commands1.txt; cat commands1.txt | uniq
> --skip-chars=7 > commands2.txt ; diff -U8 commands1.txt commands2.txt >
> commands.diff".
That one is clearly wrong but it isn't necessarily wrong that you get
duplicate mappings in the latex to unicode direction. Classic Tex fonts
do not for example have a full Greek alphabet so both U+006F and
U+03BF ( latin o and Greek omicron) map to o. Al the entries in your
MathLaTeX-commands.diff file are of this form as far as I can see, Greek
letters with tonos being overloaded with latin letters with acute
which is not exactly a brilliant mapping but agrees with classical tex
usage.
By far the biggest set of incorrect mappings wer essentially all the
lowercase greek mathematical alphabet blocks having uppercase names.
I fixed those thanks.
Comments on some others in your LaTeX-commands-diff file below.
the first group is
U0002D -
-U02010 -
-U02212 -
that is ascii - and the explicit hyphen and minus characters all mapping
to - in latex, that again is the best you can do really if mapping to
classic 7bit fonts
-U0201A ,similar mapping the low quotation to a comma matches classical
TeX usage
-U02024 . One DOT leader, I suppose this could be \ldotp rather than .
(although that's the same character in the default setup)
-U02019 ' again using the ascii ' for the right quotation mark is normal
tex usage
-U00386 \'{A} as for the mathlatex usage this is the best approximation
you can do in classic tex font distribution
-U0212B \AA angstrom and A ring being same character, this seems correct
-U0219D \arrowwaveright
oops thanks, that's the left arrow....
-U025EF \bigcirc
size mapping to classical fonts is a bit strained, this is
"WHITE CIRCLE" and "LARGE CIRCLE" not sure I can do better but
suggestions welcome
-U022C5 \cdot "DOT OPERATOR" and "MIDDLE DOT" the latter is a bit of
abuse of the math font but again accords with 7bit cm font usage.
-U02662 \diamond diamond/lozenge
-U02729 \ding{73} two different white stars
-U02A63 \ElsevierGlyph{225A} This macro name is not actually usable to
anyone other than Sebastian I guess
-U00192 f f and script f.
-U00261 g same for g
-U1D716 \in
-U1D750 \in
-U1D78A \in
-U1D7C4 \in
Yes I guess I could wrap some of those in a math alphabet
(although classic tex mathalphabet commands do not affect the standard
definition of \in)
-U0039C M
Mu=M.
-U1D6B3 M
-U1D6CD M
-U1D6ED M
-U1D707 M
-U1D727 M
-U1D741 M
-U1D761 M
-U1D77B M
-U1D79B M
-U1D7B5 M
Mu=M but I added the math alphabets \mathbf{M} etc
U02306 \perspcorrespond
-U02A5E \perspcorrespond
Not sure either of these are really defined in classic tex setups,
I made one var... (to match unicode-math)
$ cvs commit -m "lowercase latex greek and other fixes (FW)" unicode.xml
/w3ccvs/WWW/2003/entities/2007xml/unicode.xml,v <-- unicode.xml
new revision: 1.78; previous revision: 1.77
David
Received on Sunday, 7 September 2014 22:36:07 UTC