Re: Supporting MathML and SVG in text/html, and related topics from Philip Taylor on 2008-04-10 (public-html@w3.org from April 2008)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Thu, 10 Apr 2008 15:29:30 +0100
To: Ian Hickson <ian@hixie.ch>
CC: public-html@w3.org, www-math@w3.org, www-svg@w3.org
Message-ID: <47FE244A.4000806@cam.ac.uk>
Ian Hickson wrote:
> ... it should be possible to just drop MathML from most MathML-capable 
> equation editors, like Microsoft Word, straight into HTML, assuming that 
> they don't use namespace prefixes.

Word 2007 does use namespace prefixes - after configuring it to use 
MathML in "When copying an equation, copy {MathML,Linear Format} to the 
clipboard as plain text", copying an equation puts the following plain 
text on the clipboard:

<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" 
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi 
mathvariant="italic">iπ</mml:mi></mml:mrow></mml:msup></mml:math>

So I would probably copy from the equation editor into HTML, then do a 
global search-and-replace from "mml:" to "", then delete the xmlns:* 
attributes if I care about conformance, and hope <m:*> isn't used.

(When pasting a plain text representation of XML into Word, obviously 
unprefixed XML works correctly - as far as I can tell, Word just 
requires well-formed XML with the root element being <math> in the 
MathML namespace, except doctypes aren't supported (the clipboard 
content gets pasted as plain text, not as an equation). Named entities 
aren't supported (except the standard XML ones).)


> On Mon, 29 May 2006, James Graham wrote:
>> I would also argue that the difficulty of providing suitable 
>> imaged-based fallback content is a massive hindrance to the adoption of 
>> mathematical markup.
> 
> The SVG and MathML solutions I've added do not support fallback 
> explicitly, but you can use the lanuguage's own features along with a 
> knowledge of legacy HTML parsing, as follows:
> 
>    <math>
>     <semantics>
>      <mrow>
>       <mn><![CDATA[2]]></mn>
>       <mo><![CDATA[+]]></mo>
>       <mn><![CDATA[2]]></mn>
>       <mo><![CDATA[=]]></mo>
>       <mn><![CDATA[4]]></mn>
>      </mrow>
>      <annotation-xml>
>       <mtext><img src="2p2e4.png" alt=2+2=4"></mtext>

(That alt text is wrong, because it's treated as an unquoted attribute.)

>      </annotation-xml>
>     </semantics>
>    </math>

That doesn't work in all legacy browsers - Opera (at least 9.2) tries to 
parse CDATA in an XML-compatible way and displays the content. But that 
is non-standard and causes compatibility problems with some HTML pages, 
so Opera should change that behaviour anyway, so maybe it won't be a 
problem by the time anyone starts seriously using <math> in text/html. 
Still, it's not an ideal legacy fallback mechanism.


> [about data-* attributes ...] we can let authors deal 
> with collision avoidance by just having informal (from our point of view) 
> rules about naming. For example, dojo could call all their attributes 
> data-dojo-*="", for various values of *.

This seems likely to encourage bugs when using the dataset API:

   <li data-foo="1" data-dojo-foo="2">
   ...
   li.dataset.foo; /* okay */
   li.dataset.dojo-foo; /* incorrect; author gets quite confused */
   li.dataset['dojo-foo']; /* okay, but more verbose, and inconsistent 
with all the examples that use the . syntax instead, and most authors 
won't realise that x['y'] is equivalent to x.y */

It would be much better to encourage authors to use the feature in a way 
that will always work, i.e. either never using characters like '-' in 
the attribute names, or always using e.dataset['a'] (or 
e.getAttribute('data-a')).


> On Sun, 9 Mar 2008, Henri Sivonen wrote:
>>  * Writing a similar document into text/html in a text editor
>>    copying and pasting the SVG figures from Inkscape XML output.
> 
> Addressed, though Inkscape really needs to stop outputting so much 
> proprietary crap, sheesh.

You can make it stop by saving as "Plain SVG" instead of "Inkscape SVG" 
- it's similar to the problem of saving a word processing document as 
.doc then serving it as text/html, i.e. the author just selected the 
wrong output format, which isn't the editing program's fault. (The only 
difference is that Inkscape SVG still works in SVG UAs, whereas .doc is 
barely readable in HTML UAs, so authors are less likely to notice their 
error.)


> On Tue, 11 Mar 2008, Jeff Schiller wrote:
>> But what about outside the "open web" platform?  Can we safely say that 
>> we don't care about name-collisions?
> 
> I am concerned with what happens outside the open Web platform as far as 
> HTML5 goes. (That also applies to general WHATWG policy, I don't know 
> about the World Wide Web consortium's opinion on things outside the World 
> Wide Web but one might assume, from the name, that it shouldn't be a top 
> priority there either.)

Do you mean "I am *not* concerned"?


> On Tue, 11 Mar 2008, Jeff Schiller wrote:
>> Not just your opinion.  There are many good reasons for not deviating 
>> from the syntax used in "canonical SVG".  Pragmatic compatibility 
>> concerns would be enough for me, but another is that I honestly think 
>> that unquoted attribute values would be ultimately more confusing to 
>> authors.
> 
> It hasn't been that confusing to text/html authors, all things considered.

It has been confusing enough for <meta charset=...> to be supported due 
to people writing <meta http-equiv=content-type content=text/html; 
charset=iso-8859-1>, and confusing enough to cause all sorts of bogus 
attributes on <meta> when people write <meta name=description 
content=some long unquoted description of the page>. (I see the latter 
problem on roughly one in every three hundred pages.)


> Cheers,

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Thursday, 10 April 2008 14:31:04 UTC