Re: several messages about New Vocabularies in text/html

On Apr 3, 2008, at 02:56, Ian Hickson wrote:
> 50000 pages containing only <math> and none of the above (or at most 1
>       from the Content MathML list above).

It would be interesting to see a breakdown of what these cases *did*  
contain. In particular, we now know (see below) that if they contain  
only text content or <sup>, they are harmless.

> This is an interesting proposal, far more concrete than anything  
> anyone
> else has proposed so far. Thank you.
>
> It doesn't work because it breaks the handling of pages that exist  
> today
> that use the elements you list above. For example, take this page:
>
>   http://www.cip.es/aecan/ver_anuncio.asp?idioma=Aleman&cod_anuncio=ARC100&acceso=Busqueda
>
> ...which contains this markup:
>
>   <td width="27%" bgcolor="#FFFFFF">0 <math>m<sup>2</sup></td>

This works when converted to XHTML+MathML with a combination of my  
proposal and Simon Pieters' proposal and fed to Firefox 3b5.
http://hsivonen.iki.fi/test/moz/bad-math.xhtml

> It also fails in the case where someone (author A) using a new browser
> writes a page that uses this feature, and then someone (author B)  
> using an
> old browser copies and pastes from A's page into his page,  
> accidentally
> including a stray <svg> tag or <math> tag. His page looks fine to most
> users, but to the users of the new browser, the page is now horked.

You could use that cargo cult scenario to stop *any* proposal that  
allows MathML or SVG in text/html and has the property that the markup  
looks enough like the markup people are used to so that XML-MathML or  
XML-SVG can be pasted.

> On Wed, 2 Apr 2008, Simon Pieters wrote:
>>
>> Until I see actual pages that contain non-MathML in <math> or non- 
>> SVG in
>> <svg>, I'm not convinced that Henri's scoped parsing proposal[1]  
>> doesn't
>> work. Do you perhaps have such data at hand so I can take a look  
>> and be
>> convinced? :-)
>
> Most pages that use <math> when not using MathML seem to put LaTeX- 
> like
> markup inside the element. Here are some that put elements in <math>,
> though:
>
>   http://www.emis.de/journals/FPM/eng/k00/k001/k00126h.htm

We don't need to worry about Breaking the Web here, because Unicode  
correctness has already "broken" pages that try to use the Symbol font  
in this legacy way. (On Mac, Firefox 3b5, Safari 3.1 and Opera 9.5  
beta all "break" this page compared to the clearly intended rendering  
in Mac IE 5.)

>   http://www.freepatentsonline.com/EP0693743.html

This page contains "<p>[lots of text]<MATH></p>". The <math> element  
is *totally* harmless here.

>   http://apmath.kku.ac.kr/~seokko/notes/mathcon.htm

This page contains very simple stuff like
<MATH>
<P><FONT FACE="바탕"> ` epsilon, `` delta ``` </FONT>
</MATH>

This is harmless when converted for Gecko either applying just the  
rule I proposed or applying a combination of my rule and Simon's rule.
http://hsivonen.iki.fi/test/moz/bad-math.xhtml

>   http://www.ioffe.rssi.ru/cp866/journals/jtf/2003/12/page-1.html.ru

This page uses sub and sup. Both are harmless when the combination of  
what I and Simon proposed is applied.
http://hsivonen.iki.fi/test/moz/bad-math.xhtml

>   http://www.kougensha.net/blosxom/blosxom.cgi/tech/freebsd/index.html

This page is already broken. The rendered view clearly isn't showing  
what it is supposed to show due to lack of escaping.

Sorry, but I'm now *really* not convinced that what I suggested  
augmented with what Simon suggested were fatally flawed.

>> If there are a non-trivial amount of pages that have HTML elements in
>> <math> or <svg> (not nested in <foreignObject>/<annotation-xml>),  
>> then
>> wouldn't it be possible to special-case HTML elements in <math>/<svg>
>> and let the rest be handled as "unknown" elements in the MathML/SVG
>> namespaces (so that, e.g., <math><foo><b> is interpreted as
>> <mml:math><mml:foo><html:b>)?
>
> This wouldn't work well for SVG, where we have name clashes already  
> (i.e.
> where some element names are used in both SVG and HTML).

That's no reason not to do it for MathML--in particular considering  
that <math> is more likely to be used in a random or cargo-cult way.

>> Also, on a slightly different note, I think that for copy- 
>> pastability of
>> SVG in text/html, the parser needs to make /> self-close elements,  
>> since
>> e.g. <circle> can have contents (e.g. animation stuff, I think) and  
>> Sam
>> Ruby said that some tools emit <defs/> and <g/>. [2]
>
> Yes. I'm not conviced that we'll be able to get the ability to copy  
> and
> paste image/svg+xml content into text/html.

Why are you not convinced that the subsequent end tags would mitigate  
damage in existing browsers and, therefore, make /> workable in SVG  
subtrees?

> On Wed, 2 Apr 2008, Henri Sivonen wrote:
>>
>> The existing content landscape for <svg> may be very different from
>> random junk in <math> out there, since cargo-cult semanticists may  
>> come
>> up with <math> own but <svg> is more unlikely to occur without  
>> trying to
>> do SVG. So while scope plus HTML blacklist may be the best option for
>> MathML subtrees, scope plus camelCase-fixing whitelist may be the  
>> most
>> robust solution for SVG subtrees.
>
> I'm not sure exactly what you mean here.

s/own/on their own/

People are more likely to go: "Hmm. I have some math here, so I'm  
being a really good semantic author and wrap it in <math>." than "Hmm.  
I have content about St. Vincent & The Grenadines so I'm going to wrap  
it in <svg> for the benefit of the Semantic Web intelligent agents."  
Thus, the expected level of cargo-cult deployment is different. Also,  
the case-insensitivity issue is different. Thus, MathML and SVG don't  
necessarily call for exactly the same solution.

> I will see about doing a more detailed study to examine the  
> feasibilty of
> what you propose (especially for the SVG side).

Great!

>> Finally, breaking a handful of legacy pages isn't yet a "fatal" flaw.
>
> I believe it is.

If from billions of pages you find fewer pages whose <math> my  
suggestion plus Simon's suggestion would break than what the decision  
not to no longer special-case the Symbol font broke, I think the  
result isn't fatal.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 3 April 2008 08:28:01 UTC