- From: David Carlisle <davidc@nag.co.uk>
- Date: Sun, 7 Sep 2014 23:35:37 +0100
- To: <www-math@w3.org>
Frédéric, Thanks again for your comments, a somewhat belated reply... > > - Some characters have mathclass="R?" (with a question mark)... I guess > that's because the mathclass is not clear, but it is unexpected for > someone who wants to process the file automatically... > fixed at the time, updating from version 11 to 13 of the unicode data. > - Some Arabic Letters (U+0627-U+063A and Arabic mathematical alphabetic > symbol) should probably have mathclass="A" since they are used as > mathematical variables. ftp://ftp.unicode.org/Public/math/ unicode still at revison 13, so this hasn't changed (unless I decide to break from exact matching of mathclass data with the Unicode data) > > - Some LaTeX commands map to different Unicode code points. For example > \mathsfbf{\Alpha} maps to both the capital and small (bold sans-serif) > alpha, which is clearly wrong (one should be \mathsfbf{\alpha}). See the > attached diff files for details. They were generated using the attached > XSLT stylesheet and the Unix command "xsltproc extract.xsl unicode.xml > | sort --key=2,2 > commands1.txt; cat commands1.txt | uniq > --skip-chars=7 > commands2.txt ; diff -U8 commands1.txt commands2.txt > > commands.diff". That one is clearly wrong but it isn't necessarily wrong that you get duplicate mappings in the latex to unicode direction. Classic Tex fonts do not for example have a full Greek alphabet so both U+006F and U+03BF ( latin o and Greek omicron) map to o. Al the entries in your MathLaTeX-commands.diff file are of this form as far as I can see, Greek letters with tonos being overloaded with latin letters with acute which is not exactly a brilliant mapping but agrees with classical tex usage. By far the biggest set of incorrect mappings wer essentially all the lowercase greek mathematical alphabet blocks having uppercase names. I fixed those thanks. Comments on some others in your LaTeX-commands-diff file below. the first group is U0002D - -U02010 - -U02212 - that is ascii - and the explicit hyphen and minus characters all mapping to - in latex, that again is the best you can do really if mapping to classic 7bit fonts -U0201A ,similar mapping the low quotation to a comma matches classical TeX usage -U02024 . One DOT leader, I suppose this could be \ldotp rather than . (although that's the same character in the default setup) -U02019 ' again using the ascii ' for the right quotation mark is normal tex usage -U00386 \'{A} as for the mathlatex usage this is the best approximation you can do in classic tex font distribution -U0212B \AA angstrom and A ring being same character, this seems correct -U0219D \arrowwaveright oops thanks, that's the left arrow.... -U025EF \bigcirc size mapping to classical fonts is a bit strained, this is "WHITE CIRCLE" and "LARGE CIRCLE" not sure I can do better but suggestions welcome -U022C5 \cdot "DOT OPERATOR" and "MIDDLE DOT" the latter is a bit of abuse of the math font but again accords with 7bit cm font usage. -U02662 \diamond diamond/lozenge -U02729 \ding{73} two different white stars -U02A63 \ElsevierGlyph{225A} This macro name is not actually usable to anyone other than Sebastian I guess -U00192 f f and script f. -U00261 g same for g -U1D716 \in -U1D750 \in -U1D78A \in -U1D7C4 \in Yes I guess I could wrap some of those in a math alphabet (although classic tex mathalphabet commands do not affect the standard definition of \in) -U0039C M Mu=M. -U1D6B3 M -U1D6CD M -U1D6ED M -U1D707 M -U1D727 M -U1D741 M -U1D761 M -U1D77B M -U1D79B M -U1D7B5 M Mu=M but I added the math alphabets \mathbf{M} etc U02306 \perspcorrespond -U02A5E \perspcorrespond Not sure either of these are really defined in classic tex setups, I made one var... (to match unicode-math) $ cvs commit -m "lowercase latex greek and other fixes (FW)" unicode.xml /w3ccvs/WWW/2003/entities/2007xml/unicode.xml,v <-- unicode.xml new revision: 1.78; previous revision: 1.77 David
Received on Sunday, 7 September 2014 22:36:07 UTC