Re: [MathML4] Whitespace and attributes canonicalization in MathML VS HTML5/CSS

On 01/08/2016 16:31, Frédéric Wang wrote:
> Hi Math WG,

Some personal "first thought" replies ...

> Continuing on feedback for a future MathML specification, here is a
> (probably non-exhaustive) list of inconsistencies between MathML and
> HTML5/CSS regarding whitespace and attributes canonicalization. As a
> rule of thumb, it would be better for web engines if MathML can align on
> HTML5 so that we can reuse as much code as possible and avoid extra code
> to handle MathML special cases. Also people familiar with HTML5 will be
> less surprised when handling MathML.
> 1) Whitespace collapsing/trimming
>    Whitespace collapsing is consistent with the default CSS property
> "white-space" and people are familiar with it.
>    Removing "whitespace at the beginning and end of the content" is less
> expected. Gecko has some code to handle this but it would be very
> helpful to avoid this additional complexity. WebKit does not handle it
> at the moment and it's not clear it's worth doing it... Except in the
> MathML spec/test, everybody seems to just write <mo>(</mo> and not <mo>
> ( </mo>. Can we deprecate this behavior in MathML4? Or maybe you should
> work with the HTML5 WG to define such collapsing rules during document
> parsing, so that the MathML rendering code no longer need to handle it?

white space is always a problem:-) but I'd be sorry to just drop this
completely, it's a well established feature of math typesetting (in TeX
and elsewhere) that user-whitespace is ignored and the math typesetter
re-adds white space as needed.  That said, I agree that the fact that
TeX treats 1+2 like 1 + 2 doesn't necessarily mean that mathml should
treat <mo>+</mo> like <mo> + </mo>. If the trimming could happen during
text/html parsing that would simplify some things.

> 2) In MathML, white spaces are understood as XML spaces (U+0020), tabs
> (U+0009), line feeds (U+000A), and carriage returns (U+000D) while HTML5
> also includes "form feed" (U+000C).

Probably we should just change that. Either always include U+000C or
specify white space characters are XML white space in application/xml
parsing and html white space in text/html parsing or something ...

> 3) MathML attributes are case-sensitive while HTML5 attributes are
> case-insensitive. case-sensitiveness is probably not a problem for users
> and it's easier for the parsing. However, WebKit developers writing or
> reviewing patches have often considered doing case-insensitive
> comparisons as that's consistent with the rest of the code base.

Do you mean the attribute values or the attribute names? For the latter
my understanding is that it's the same as (x)html in that the text/html
parser will normalise the case of the attribute name (to lower case
except for definitionURL) so giving an appearance of case insensitivity

> 4) MathML boolean attributes take value "true" and "false". In HTML5,
> the boolean value is given by the presence/absence of the attribute and
> the only allowed value is the name of the attribute. This allows to get
> more compact syntax like <mo largeop stretchy> instead of <mo
> largeop="true" stretchy="true">. However, Web engines and authoring
> tools will continue to support the true/false syntax anyway, so it's
> probably not worth adding complexity here...

I don't think allowing stretchy=stretchy as an alternative to
stretch=true would break anything on the XML side of things, and would
potentially, as you say, allow just stretchy in text/html using its
version of the old SGML shorttag feature. You could say more than me
whether that would simplify or complicate things at implementation level.

> 5) As I said in a previous message, the values "small", "normal", "big"
> of mathsize do not exist for CSS font-size. Removing them will simplify
> a bit the parsing code.

Are these conceptually more difficult than css names like
small,medium,large,x-large? (just asking:-)

> 6) The definition of numbers is also not very accurate in the MathML
> recommendation compared to HTML5. One has to check the RelaxNG schemas
> and the predefined RelaxNG types to know the exact syntax.

Well hopefully section
is reasonably exact (but the main point that it's not exactly the same
as HTML5 is of course undeniable)

> Again, it
> think it would be best to rely on the HTML5 definitions. For example,
> <math><mspace width="1E1em" height="10em" mathbackground="red"/></math>
> draws a red square in WebKit but Gecko says "1E1em" is invalid.

Certainly scope for documenting the syntaxes there and seeing whether
any differences are giving extra functionality or just historical, I
suspect that we should be able to specify a profile of mathml for
text/html parsing that brings things more in to line with html/css
numeric syntax if that's needed.

> Frédéric



The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is:

Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Microsoft Office 365.


Received on Monday, 1 August 2016 17:03:24 UTC