[MathML4] Whitespace and attributes canonicalization in MathML VS HTML5/CSS

Hi Math WG,

Continuing on feedback for a future MathML specification, here is a
(probably non-exhaustive) list of inconsistencies between MathML and
HTML5/CSS regarding whitespace and attributes canonicalization. As a
rule of thumb, it would be better for web engines if MathML can align on
HTML5 so that we can reuse as much code as possible and avoid extra code
to handle MathML special cases. Also people familiar with HTML5 will be
less surprised when handling MathML.

1) Whitespace collapsing/trimming
   https://www.w3.org/TR/MathML/chapter2.html#fund.collapse

   Whitespace collapsing is consistent with the default CSS property
"white-space" and people are familiar with it.

   Removing "whitespace at the beginning and end of the content" is less
expected. Gecko has some code to handle this but it would be very
helpful to avoid this additional complexity. WebKit does not handle it
at the moment and it's not clear it's worth doing it... Except in the
MathML spec/test, everybody seems to just write <mo>(</mo> and not <mo>
( </mo>. Can we deprecate this behavior in MathML4? Or maybe you should
work with the HTML5 WG to define such collapsing rules during document
parsing, so that the MathML rendering code no longer need to handle it?

2) In MathML, white spaces are understood as XML spaces (U+0020), tabs
(U+0009), line feeds (U+000A), and carriage returns (U+000D) while HTML5
also includes "form feed" (U+000C).

    https://www.w3.org/TR/html5/infrastructure.html#space-character
   
3) MathML attributes are case-sensitive while HTML5 attributes are
case-insensitive. case-sensitiveness is probably not a problem for users
and it's easier for the parsing. However, WebKit developers writing or
reviewing patches have often considered doing case-insensitive
comparisons as that's consistent with the rest of the code base.

4) MathML boolean attributes take value "true" and "false". In HTML5,
the boolean value is given by the presence/absence of the attribute and
the only allowed value is the name of the attribute. This allows to get
more compact syntax like <mo largeop stretchy> instead of <mo
largeop="true" stretchy="true">. However, Web engines and authoring
tools will continue to support the true/false syntax anyway, so it's
probably not worth adding complexity here...

   https://www.w3.org/TR/html5/infrastructure.html#boolean-attributes

5) As I said in a previous message, the values "small", "normal", "big"
of mathsize do not exist for CSS font-size. Removing them will simplify
a bit the parsing code.

6) The definition of numbers is also not very accurate in the MathML
recommendation compared to HTML5. One has to check the RelaxNG schemas
and the predefined RelaxNG types to know the exact syntax. Again, it
think it would be best to rely on the HTML5 definitions. For example,
<math><mspace width="1E1em" height="10em" mathbackground="red"/></math>
draws a red square in WebKit but Gecko says "1E1em" is invalid.

   https://www.w3.org/TR/html5/infrastructure.html#numbers

Frédéric

Received on Monday, 1 August 2016 15:31:45 UTC