RE: [MathML 4] Add rules to map from non-combining to combining accents

FWIW, Microsoft Office math has always used combining marks in U+0300..036F and U+20D0..U+20F1 ranges. But translations to popular spacing accents are used in the MathML converters. Here’s part of a table used for UnicodeMath

    {0x0300, lsmservbcAccentAbove},   //grave   # COMBINING GRAVE ACCENT
    {0x0301, lsmservbcAccentAbove},   //acute   # COMBINING ACUTE ACCENT
    {0x0302, lsmservbcAccentAbove},   //flex    # COMBINING CIRCUMFLEX ACCENT
    {0x0303, lsmservbcAccentAbove},   //tilde   # COMBINING TILDE
    {0x0304, lsmservbcAccentAbove},    //macron  # COMBINING MACRON
    {0x0305, lsmservbcAccentAbove},    //overline# COMBINING OVERLINE
    {0x0306, lsmservbcAccentAbove},   //breve   # COMBINING BREVE
    {0x0307, lsmservbcAccentAbove},   //dot     # COMBINING DOT ABOVE
    {0x0308, lsmservbcAccentAbove},   //        # COMBINING DIAERESIS
    {0x0309, lsmservbcAccentAbove},   //        # COMBINING HOOK ABOVE
    {0x030A, lsmservbcAccentAbove},   //        # COMBINING RING ABOVE
    {0x030B, lsmservbcAccentAbove},   //        # COMBINING DOUBLE ACCUTE ACCENT
    {0x030C, lsmservbcAccentAbove},   //        # COMBINING CARON
    {0x030D, lsmservbcAccentAbove},   //        # COMBINING VERTICAL LINE ABOVE
    {0x030E, lsmservbcAccentAbove},   //        # COMBINING DOUBLE VERTICAL LINE ABOVE
    {0x030F, lsmservbcAccentAbove},   //        # COMBINING DOUBLE GRAVE ACCENT
    {0x0310, lsmservbcAccentAbove},   //        # COMBINING CANDRABINDU
    {0x0311, lsmservbcAccentAbove},   //        # COMBINING INVERTED BREVE
    {0x0312, lsmservbcAccentAbove},   //        # COMBINING TURNED COMMA ABOVE
    {0x0313, lsmservbcAccentAbove},   //       # COMBINING COMMA ABOVE
    {0x0314, lsmservbcAccentAbove},   //        # COMBINING REVERSED COMMA ABOVE
    {0x0315, lsmservbcAccentAbove},   //        # COMBINING COMMA ABOVE RIGHT
    {0x0316, lsmservbcAccentBelow},   //        # COMBINING GRAVE ACCENT BELOW
    {0x0317, lsmservbcAccentBelow},   //        # COMBINING ACUTE ACCENT BELOW
    {0x0318, lsmservbcAccentBelow},   //        # COMBINING LEFT TACK BELOW
    {0x0319, lsmservbcAccentBelow},   //        # COMBINING RIGHT TACK BELOW
    {0x031A, lsmservbcAccentAbove},   //        # COMBINING LEFT ANGLE ABOVE
    {0x031B, lsmservbcAccentAbove},   //        # COMBINING HORN
    {0x031C, lsmservbcAccentBelow},   //        # COMBINING LEFT HALF RING BELOW
    {0x031D, lsmservbcAccentBelow},   //        # COMBINING UP TACK BELOW
    {0x031E, lsmservbcAccentBelow},   //        # COMBINING DOWN TACK BELOW
    {0x031F, lsmservbcAccentBelow},   //        # COMBINING PLUS SIGN BELOW
    {0x0320, lsmservbcAccentBelow},   //        # COMBINING MINUS SIGN BELOW
    {0x0321, lsmservbcAccentBelow},   //        # COMBINING PALATALIZED HOOK BELOW
    {0x0322, lsmservbcAccentBelow},   //        # COMBINING RETROFLEX HOOK BELOW
    {0x0323, lsmservbcAccentBelow},   //        # COMBINING DOT BELOW
    {0x0324, lsmservbcAccentBelow},   //        # COMBINING DIAERESIS BELOW
    {0x0325, lsmservbcAccentBelow},   //        # COMBINING RING BELOW
    {0x0326, lsmservbcAccentBelow},   //        # COMBINING COMMA BELOW
    {0x0327, lsmservbcAccentBelow},   //        # COMBINING CEDILLA
    {0x0328, lsmservbcAccentBelow},   //        # COMBINING OGONEK
    {0x0329, lsmservbcAccentBelow},   //        # COMBINING VERTICAL LINE BELOW
    {0x032A, lsmservbcAccentBelow},   //        # COMBINING BRIDGE BELOW
    {0x032B, lsmservbcAccentBelow},   //        # COMBINING INVERTED DOUBLE ARCH BELOW
    {0x032C, lsmservbcAccentBelow},   //        # COMBINING CARON BELOW
    {0x032D, lsmservbcAccentBelow},   //        # COMBINING CIRCUMFLEX ACCENT BELOW
    {0x032E, lsmservbcAccentBelow},   //        # COMBINING BREVE BELOW
    {0x032F, lsmservbcAccentBelow},   //        # COMBINING INVERTED BREVE BELOW
    {0x0330, lsmservbcAccentBelow},   //        # COMBINING TILDE BELOW
    {0x0331, lsmservbcAccentBelow},   //        # COMBINING MACRON BELOW
    {0x0332, lsmservbcAccentBelow},   //        # COMBINING LOW LINE
    {0x0333, lsmservbcAccentBelow},   //        # COMBINING DOUBLE LOW LINE
    {0x0337, lsmservbcAccentAbove},   //        # COMBINING SHORT SOLIDUS OVERLAY
    {0x0338, lsmservbcAccentAbove},   //        # COMBINING LONG SOLIDUS OVERLAY
    {0x0339, lsmservbcAccentBelow},   //        # COMBINING RIGHT HALF RING BELOW
    {0x033A, lsmservbcAccentBelow},   //        # COMBINING INVERTED BRIDGE BELOW
    {0x033B, lsmservbcAccentBelow},   //        # COMBINING SQUARE BELOW
    {0x033C, lsmservbcAccentBelow},   //        # COMBINING SEAGULL BELOW
    {0x033D, lsmservbcAccentAbove},   //        # COMBINING X ABOVE
    {0x033E, lsmservbcAccentAbove},   //        # COMBINING VERTICAL TILDE
    {0x033F, lsmservbcAccentAbove},   //        # COMBINING DOUBLE OVERLINE
    {0x0340, lsmservbcAccentAbove},   //        # COMBINING GRAVE TONE MARK
    {0x0341, lsmservbcAccentAbove},   //        # COMBINING ACUTE TONE MARK
    {0x0342, lsmservbcAccentAbove},   //        # COMBINING GREEK PERISPOMENI
    {0x0343, lsmservbcAccentAbove},   //        # COMBINING GREEK KORONIS
    {0x0344, lsmservbcAccentAbove},   //        # COMBINING GREEK DIALYTIKA TONOS
    {0x0345, lsmservbcAccentBelow},   //        # COMBINING GREEK YPOGEGRAMMENI
    {0x0346, lsmservbcAccentAbove},   //        # COMBINING BRIDGE ABOVE
    {0x0347, lsmservbcAccentBelow},   //        # COMBINING EQUALS SIGN BELOW
    {0x0348, lsmservbcAccentBelow},   //        # COMBINING DOUBLE VERTICAL LINE BELOW
    {0x0349, lsmservbcAccentBelow},   //        # COMBINING LEFT ANGLE BELOW
    {0x034A, lsmservbcAccentAbove},   //        # COMBINING NOT TILDE ABOVE
    {0x034B, lsmservbcAccentAbove},   //        # COMBINING HOMOTHETIC ABOVE
    {0x034C, lsmservbcAccentAbove},   //        # COMBINING ALMOST EQUAL TO ABOVE
    {0x034D, lsmservbcAccentBelow},   //        # COMBINING LEFT RIGHT ARROW BELOW
    {0x034E, lsmservbcAccentBelow},   //        # COMBINING UPWARDS ARROW BELOW
    {0x0350, lsmservbcAccentAbove},   //        # COMBINING RIGHT ARROWHEAD ABOVE
    {0x0351, lsmservbcAccentAbove},   //        # COMBINING LEFT HALF RING ABOVE
    {0x0352, lsmservbcAccentAbove},   //        # COMBINING FERMATA
    {0x0353, lsmservbcAccentBelow},   //        # COMBINING X BELOW
    {0x0354, lsmservbcAccentBelow},   //        # COMBINING LEFT ARROWHEAD BELOW
    {0x0355, lsmservbcAccentBelow},   //        # COMBINING RIGHT ARROWHEAD BELOW
    {0x0356, lsmservbcAccentBelow},   //        # COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW
    {0x0357, lsmservbcAccentAbove},   //        # COMBINING RIGHT HALF RING ABOVE
    {0x0358, lsmservbcAccentAbove},   //        # COMBINING DOT ABOVE RIGHT
    {0x0359, lsmservbcAccentBelow},   //        # COMBINING ASTERISK BELOW
    {0x035A, lsmservbcAccentBelow},   //        # COMBINING DOUBLE RING BELOW
    {0x035B, lsmservbcAccentAbove},   //        # COMBINING ZIGZAG ABOVE
    {0x0363, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER A
    {0x0364, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER E
    {0x0365, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER I
    {0x0366, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER O
    {0x0367, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER U
    {0x0368, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER C
    {0x0369, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER D
    {0x036A, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER H
    {0x036B, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER M
    {0x036C, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER R
    {0x036D, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER T
    {0x036E, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER V
    {0x036F, lsmservbcAccentAbove},   //        # COMBINING LATIN SMALL LETTER X

    {0x20D0, lsmservbcAccentAbove},   //        # COMBINING LEFT HARPOON ABOVE
    {0x20D1, lsmservbcAccentAbove},   //        # COMBINING RIGHT HARPOON ABOVE
    {0x20D4, lsmservbcAccentAbove},   //        # COMBINING ANTICLOCKWISE ARROW ABOVE
    {0x20D5, lsmservbcAccentAbove},   //        # COMBINING CLOCKWISE ARROW ABOVE
    {0x20D6, lsmservbcAccentAbove},   //        # COMBINING LEFT ARROW ABOVE
    {0x20D7, lsmservbcAccentAbove},   //        # COMBINING RIGHT ARROW ABOVE
    {0x20DB, lsmservbcAccentAbove},   //        # COMBINING THREE DOTS ABOVE
    {0x20DC, lsmservbcAccentAbove},   //        # COMBINING FOUR DOTS ABOVE
    {0x20E1, lsmservbcAccentAbove},   //        # COMBINING LEFT RIGHT ARROR ABOVE
    {0x20E8, lsmservbcAccentBelow},   //        # COMBINING TRIPLE UNDERDOT
    {0x20E9, lsmservbcAccentAbove},   //        # COMBINING WIDE BRIDGE ABOVE
    {0x20EC, lsmservbcAccentBelow},   //        # COMBINING RIGHTWARDS HARPOON WITH BARB DOWNWARDS
    {0x20ED, lsmservbcAccentBelow},   //        # COMBINING LEFTWARDS HARPOON WITH BARB DOWNWARDS
    {0x20EE, lsmservbcAccentBelow},   //        # COMBINING LEFT ARROW BELOW
    {0x20EF, lsmservbcAccentBelow},   //        # COMBINING RIGHT ARROW BELOW
    {0x20F0, lsmservbcAccentAbove},   //        # COMBINING ASTERISK ABOVE

From: David Carlisle <davidc@nag.co.uk>
Sent: Friday, March 23, 2018 6:46 AM
To: www-math@w3.org
Subject: Re: [MathML 4] Add rules to map from non-combining to combining accents

On 23/03/2018 13:38, Frédéric WANG wrote:
>
>>
>> Yes we should say something (as it happens TeX also has difficulties
>> with combining characters)
>>
>> somewhere around
>>
>> https://w3c.github.io/mathml/chapter7.html#chars.comb-chars<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c.github.io%2Fmathml%2Fchapter7.html%23chars.comb-chars&data=04%7C01%7Cmurrays%40exchange.microsoft.com%7C3bbc50d07d86450def0708d590c4e1fc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636574097694222199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=v1gAvJjhiISxcCaLm7OsuXKpIrwERmTqqI1cAny%2B52I%3D&reserved=0>
>>
>> I guess is the place to add something.
>>
>> As you hint I may need to add some extra data to unicode.xml to specify
>> which characters are related in this way, I don't think the existing
>> Unicode data reliably says which are equivalent combining/non combining
>> forms although obviously taking the character name and deleting
>> "Combining" gives a first approximation of the mapping.
>>
>>
>> David
>
> Hi David,
>
> Indeed, I would prefer an explicit list for better interoperability. Are
> you able to come up with one list, maybe doing the change in the repo of
> unicode.xml? For you information, below are the available horizontal
> stretchy constructions available in some popular OpenType MATH fonts. I
> believe other fonts also only provide constructions for combining
> versions of accents.
>
> ...


I'll see what I can do.....


David


Disclaimer

The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business.

Received on Friday, 23 March 2018 23:06:40 UTC