Supporting MathML and SVG in text/html, and related topics

Over 620 e-mails were sent on the topic of mixing different vocabularies 
into text/html documents over the last couple of years. Most were 
advocating particular solutions, and just assumed that the problem being 
addressed was known, so it was quite difficult to actually determine what 
the problems and use cases were.
 
After carefully reading all these e-mails, I found the following use 
cases described:

* The ability for an author to unilaterally extend the language to address 
  problems we are currently unaware of and that therefore are not covered 
  by existing functionality.

* Putting an equation in a Web page.

* Writing a document by hand, with inline diagrams imported from a 
  graphics package.

* Writing a document by hand, with vector graphic icons.

* Mixing vector graphics, mathematics, and other features.

* Writing applications that contain graphics that represent custom data, 
  while including that data for script manipulation.

* Writing highly interactive, graphically intensive sites.

* Animating Web page content (hypertext, vector graphics).

* Migrating from LaTeX to HTML.

* Including XBL2 inline in a text/html document the same way CSS and JS 
  can be included inline.

I respond in detail to the e-mails received below, but first here is a 
quick overview of each of the above use cases and whether, and how, they 
were addressed:

* The ability for an author to unilaterally extend the language to address 
  problems we are currently unaware of and that therefore are not covered 
  by existing functionality.

There are two aspects to this -- extending the language for private use, 
e.g. to encoding data for a script to manage, and extending the language 
in a way intended for use by multiple groups, e.g. to provide new features 
that user agents can implement.

The former has historically been solved by adding custom non-standard 
attributes to elements in the DOM, or by abusing other attributes 
(typically title="") or by encoding the data in painful ways in the class 
attribute (the closest thing to legitimate solution).

I've added a "standard" way to support this: the data-* attributes.

You can now add any attribute starting with "data-" to any element in the 
HTML namespace, with any value, and it'll be valid:

  <ul class="contactList">
   <li data-username="ian@hixie.ch" data-network="Jabber">Hixie</li>
   ...
  </ul>

You can then access this data using the new "dataset" DOM API:

   // get data from the element
   network = li.dataset.network;
   username = li.dataset.username;

   // set data on the element
   li.dataset.status = getStatus(username, network);

   // remove one of the attributes
   delete li.dataset.pending;


The second reason to extend the language is to extend the language in a 
manner that is intended to be used by many people, be implemented by user 
agents, and so forth. For this, though, we actively want to make sure that 
people can't willy nilly extend the language without coordination with 
anyone interested in the development of the language, and therefore I 
haven't provided any syntax mechanism to address this. The right way to 
extend the language is with the involvement of the wider community, as we 
have been doing for HTML5.

In any case, the syntax part would be a very minor part of any such 
effort. Adding new user-agent-supported vocabularies requires significant 
investment:

 * studying existing authoring behaviours
 * determining what problem needs fixing
 * designing a solution that addresses the problem
 * describing the syntax
 * describing the semantics
 * describing the behaviour
 * defining the error handling for syntax errors
 * defining the error handling for semantic errors
 * implementing the defined features
 * testing the implementations
 * writing tutorials

Only one small part of that has anything to do with the syntax. Defining a 
generic system is optimising a small part that doesn't need optimising 
(adding SVG and MathML to text/html, as noted below, only took a few days 
when we got down to it).

Now, even given all this, some people still want an extensibility 
mechanism for proprietary extensions. For this, we have XML, which is 
intended to just be a generic syntax. It is quite difficult to make a 
generic syntax out of text/html, due to the legacy content out there. I go 
into more detail on this in some of the replies below.



* Putting an equation in a Web page.

I considered a number of ways of doing this, including for example just 
having LaTeX, or some other simple ASCII model for describing mathematics. 
These have the advantage of being easy(ish) to write. However, in the end 
the benefits of having a well-defined DOM were too great, and the only 
choice for the DOM became MathML. This posed a problem with the LaTeX and 
other ASCII syntax ideas, because then we'd have to define some complex 
parsing to convert from one to the other. It also seemed like if we were 
going to support MathML, we should have some way to natively support the 
MathML syntax, and therefore I settled on just using the tags. I 
considered several ways of making even that simpler, e.g. making certain 
tags optional, but the complexity of such solutions proved too much to 
really justify.

So the text/html HTML5 language now has a way to support raw MathML 
natively in the markup. There are several aspects of this that are worth 
noting:

 * the tag names are case-insensitive
 * XML well-formedness is not enforced
 * the /> self-closing syntax _is_ supported (on all MathML elements)
 * prefixes don't work
 * xmlns="" attributes have no effect
 * Content MathML is parsed faithfully

Namespace boundaries exist on the following elements: <math> <mo> <mi> 
<ms> <mn> <mtext>. A <math> element in an HTML context automatically 
starts a MathML subtree; the other elements in a MathML context allow HTML 
elements to be embedded at those points (and indeed all elements that are 
children of those elements, other than <mglyph> and <malignmark>, will 
automatically be put in the HTML namespace). At any other time, an HTML 
element will just close the currently open <math> element. As noted below, 
The SVG element <svg> is also allowed inside <mo>, <mi>, <ms>, <mn>, and 
<mtext>, as well as inside <annotation-xml>. No other namespace (e.g. 
OpenMath) can be embedded in MathML in text/html for now.

Unfortunately, MathML is very, very verbose, and nothing has been done to 
aleviate this here. I'm not sure what to do about this. On the plus side, 
it should be possible to just drop MathML from most MathML-capable 
equation editors, like Microsoft Word, straight into HTML, assuming that 
they don't use namespace prefixes.

The biggest issue with MathML in text/html is that so far only one vendor 
supports it. While some of the other vendors have unofficially told me 
they may consider implementing MathML, others are less likely to do so 
unless MathML becomes widely used, presenting a chicken-and-egg problem 
that the MathML community is all too familiar with. I recommend that the 
MathML community make available large, comprehensive, high-quality test 
suites for presentational MathML, 



* Writing a document by hand, with inline diagrams imported from a 
  graphics package.

After briefly considering various formats, I concluded that the only 
reasonable choice for a vector graphics format in text/html was SVG. It is 
already supported to some extent by most major browsers, and it has 
reasonably well-defined DOM semantics.

A similar approach to the MathML solution was used. One major difference 
is that while MathML was carefully designed to avoid name clashes with 
HTML, SVG was not. Because of this, at least <font> elements aren't 
supported in SVG in text/html. Also, while MathML elements and attributes 
are conveniently all in one namespace and all lowercase, SVG is all over 
the map, using at least three namespaces (four if you count the xmlns="" 
attributes!) and has camelCase attributes. For this reason, SVG has a set 
of fixups to support specific elements and attributes that aren't 
lowercase in the SVG namespace. As part of this I've also made xml:base, 
xml:space, and XLink be supported on both SVG and MathML elements in 
text/html.

For SVG, <desc>, <title>, and <foreignObject> are the namespace boundary 
points -- anything inside those is treated as HTML (except <math> and 
<svg>, of course, which start new namespaced subtrees).



* Mixing vector graphics, mathematics, and other features.

As noted in the previous two bits, the support for SVG and MathML is 
designed to allow the vocabularies, and HTML, to be mixed at well-defined 
points.



* Writing a document by hand, with vector graphic icons.

This has long been possible with <img>, so I haven't changed anything to 
support this use case.



* Writing applications that contain graphics that represent custom data, 
  while including that data for script manipulation.

I haven't yet addressed this, since it requires coordination with the SVG 
(and MathML) groups, but if the data-* / dataSet idea pans out, then it 
would be interesting to see if the SVG and MathML groups like it and can 
adopt it too, getting us a consistent API across the vocabularies. For 
now, we'll have to see how it works in HTML.



* Writing highly interactive, graphically intensive sites.

SVG and SVG's SMIL ans scripting capabilities, I am assured, can handle 
this.



* Animating Web page content (hypertext, vector graphics).

This is something more for the SMIL and CSS groups, so I haven't directly 
addressed it.



* Migrating from LaTeX to HTML.

Tools exist to do this somewhat. I'm not sure what else can be done. As 
mentioned above, I did consider using LaTeX syntax for maths, but that 
turned out to not really be workable.



* Including XBL2 inline in a text/html document the same way CSS and JS 
  can be included inline.

I haven't addressed this. It's premature to consider XBL2 in text/html, as 
it has not been implemented in browsers yet.



A common theme across some of the problems listed above, and discussed in 
detail in the e-mails below, is the issue of a generic syntax for non-HTML 
namespaces in text/html. It turns out to be _exceedingly_ hard to handle 
the giant mass of weirdness found in legacy content on the Web while 
adding new features. For example, this page:

   http://www.laroseweb.com/calcs/fans.php

      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml" >
      <math xmlns="http://www.w3.org/1998/Math/MathML">
      <math xmlns="&mathml;">

      <head>
      [... continues with just a normal XHTML page]

...or this page:

   http://www.cocopahrv.com/map.html

      [...]
      <table width="592" border="0" cellspacing="0" cellpadding="0">
        <tr align="left" valign="top"> 
          <td width="592"><svg width="200" height="200" viewBox="0 0 
      200 200"><img name="r1_c1" src="map/resortmap.jpg" width="600" 
      height="465" border="0" alt="Cocopah RV and Golf Resort" 
      usemap="#r1_c1Map"><map name="r1_c1Map"><area shape="rect" 
      coords="4,2,298,230" href="map/lmp1a.jpg"><area shape="rect" 
      coords="1,230,296,460" href="map/lmp1b.jpg"><area shape="rect" 
      coords="297,4,598,231" href="map/lmp2a.jpg"><area shape="rect" 
      coords="297,231,599,462" href="map/lmp2b.jpg"></map><rect 
      x="100" y="100" width="10" height="20" style="fill: green"/> 
          </svg> </td>
        </tr>
      </table>
      [...]

Both of these pages work fine today. Sadly the second one above will break 
(well, it'll have a big white gap) in browsers that support the new 
feature. The biggest problem in developing the syntax for MathML and SVG 
(adding hundreds of elements to text/html) was doing so in a way that 
limited the number of pages that would fail to a bare minimum.

In my research, most text/html pages that I have found that use SVG in a 
way that doesn't immediately improve the rendering if we support the new 
proposal are pages that include the markup in a discussion (e.g. forum 
posts, blog comments, etc) without escaping it. This will lead to all 
kinds of XSS problems, which might be interesting. For example:

   http://kr.blog.yahoo.com/yunneo2000/1111673

There are also cases where SVG is used in a way that will affect the page, 
but where the HTML elements bailing out of SVG will make the page render 
far better than if we just waited for </svg> or some such. For example:

   http://puysl.com/view.htm
   http://albren.blogspot.com/2007/02/l05-traductor.html

There are basically two types of pages using <math> that I found -- those 
using MathML, which will improve when browsers support MathML in 
text/html, and those that use <math> just as a wrapper around some LaTeX 
or other random text.

Assuming <math>foo</math> doesn't show an error in MathML-in-text/html, we 
should be ok with backwards-compatibility. I've added text to the spec to 
imply that such text should be shown (as if it was wrapped in an <mtext>).



And now, replies to the e-mail flood.

I have omitted from this e-mail all but the first e-mail in any series 
that repeated facts, opinions, or arguments. If I have omitted an e-mail 
that you think is _not_ redundant with one of those below, and that 
therefore should get its own reply, please let me know. I have also 
omitted comments that were orthogonal to the discussion, for example 
arguing about particular user agent bugs, or the details of how 
mathematical fonts might handle the PUA, as well as comments that have 
been overtaken by events in an obvious way, e.g. the relationship between 
the WHATWG and the W3C, or the likelihood of validators existing without 
DTDs. Even though I have omitted much from the responses below, I assure 
you I did read all 30000+ lines of e-mail on the subject. Again, if you 
think I missed something material to the discussion, please let me know.


On Fri, 26 May 2006, Henri Sivonen wrote:
> 
> I am pretty convinced that the granularity of markup needed for math and 
> the verbosity of XML necessarily lead to an XML syntax for math that is 
> not suitable for direct human authoring. However, I think it does *not* 
> necessarily follow that an XML syntax for math is an inherently bad 
> idea.
> 
> For example, the XML syntax for RELAX NG is rather inconvenient for 
> human authoring. You really want to write the Compact Syntax instead. 
> However, being able to process RELAX NG as XML can be useful in some 
> situations.
> 
> Likewise, one could consider LaTeX or the Mathematica language as 
> compact authoring syntaxes for MathML. (Though that could work better if 
> MathML was a straight bijective XML mapping of the default LaTeX math 
> macros.)

I considered this, but in the end decided this approach was not tractable. 
Introducing yet another syntax style would likely not make things easier 
for authors overall, especially considering they would have to work out 
what it meant for CSS and scripting.


> XML entities on the Web are b0rked. Since MathML is not human-writable 
> anyway, let's get rid of the entities.

I've actually just added all the thousands of MathML entities to 
text/html. It seemed easy enough to do.


On Mon, 29 May 2006, Michel Fortin wrote:
> 
> One thing I know however is that the next time I'll have to put an 
> equation on a web page, I won't go looking for a MathML editor just to 
> be able to generate the markup, convert the page to XHTML served as 
> application/xhtml+xml (so that it works with MathML) and ask my users to 
> install the required plugin or web browser just to see my equation. I'll 
> use an image: it'll be a lot simpler.

Would you use an equation editor if you could just plop the markup into 
the Web page and stop there?


> What Juan propose, about adding a limited number of elements to HTML for 
> maths, actually makes sense to me, especially if you can get not-too-bad 
> results with CSS. HTML is designed to be easy to learn and write; if we 
> had a markup like that for mathematics which integrates easily in HTML 
> it'd be much more used than MathML, I'm sure.

Unfortunately designing yet another language for maths seems like 
reinventing a wheel that has been invented plenty of times already. I'm 
not convinced we have the expertise to invent a mathematical markup 
language that's enough better than MathML to be worth it.


> But I think it would be better to develop that as a microformat[1] 
> first, then, once it works and is well defined, see if the WhatWG is 
> interested in integrating the microformat into HTML5 by giving it 
> specific elements and attributes.

Well, that was in 2006, and it's 2008 now and there's no sign of such a 
microformat, so that's not an option.


On Mon, 29 May 2006, James Graham wrote:
> 
> In this situation, I imagine most scientists will simply write LaTeX and 
> use a tool to produce the output format that they desire. (La)TeX has 
> the advantage of being a well designed, domain-specific language that 
> allows for a very compact representation of most mathematical 
> constructs. So, in that sense, the complexity of the language is of 
> secondary importance to the ability to map between LaTeX and the 
> language. This is just as well because /any/ XML based language for 
> maths is going to suffer from substantial verbosity. For MathML, there 
> is already a reasonable story here since Itex2MML exists, although it 
> really needs to be integrated with tools like hyperlatex if it is ever 
> to be widely used.


> I would also argue that the difficulty of providing suitable 
> imaged-based fallback content is a massive hindrance to the adoption of 
> mathematical markup.

The SVG and MathML solutions I've added do not support fallback 
explicitly, but you can use the lanuguage's own features along with a 
knowledge of legacy HTML parsing, as follows:

   <math>
    <semantics>
     <mrow>
      <mn><![CDATA[2]]></mn>
      <mo><![CDATA[+]]></mo>
      <mn><![CDATA[2]]></mn>
      <mo><![CDATA[=]]></mo>
      <mn><![CDATA[4]]></mn>
     </mrow>
     <annotation-xml>
      <mtext><img src="2p2e4.png" alt=2+2=4"></mtext>
     </annotation-xml>
    </semantics>
   </math>

...or:

   <svg viewBox="0 0 20 20">
    <desc>
     <img src="circle-blue.png" alt="o">
    </desc>
    <circle cx=10 cy=10 r=10 fill=blue/>
   </svg>


> The problem is that it's not nearly so easy to do as he suggests. Look 
> at the test page - some of the rendering is awful (the radical signs in 
> particular stand out here). And, despite being sold as a simpler 
> solution than a MathML implementation, it works in about 1% of UAs (by 
> number of users) compared to > 95% that have a story for native or 
> plugin-based MathML. The language that they have used is also overly 
> simplistic. For example one would expect most text in a formula to be in 
> italics except where actual words were being used in which case the text 
> should be roman. So you need an additional element to distinguish text 
> from ordinary numbers. Add a few more considerations like that and you 
> soon have a language that's just as painful to hand-author as MathML 
> (which, I agree, is far from perfect) and little support among end 
> users.

Indeed.


On Mon, 29 May 2006, Mihai Sucan wrote:
> 
> IMHO, a good implementation for any math-related web technology must not 
> ask the user to download fonts, to install some plugin or anything 
> similar. I do not like Gecko for the fact it asks me to download 
> mathematical fonts. It should just do as Opera does with their voice 
> interactivity module: it asks for permission to download the required 
> files for that specific functionality. It does automatically download 
> and install everything required. Same should be done by Gecko.

Yeah, I agree. That's rather out of scope here though.


> Math WebSearch - A semantic search engine
> http://search.mathweb.org/

I sure hope that search gets more usable. I don't think you could get a 
highschool kid to find maxwell's equations using that. Meanwhile:

   http://search.yahoo.com/search?p=maxwell%27s+equations
   http://www.google.com/search?q=maxwell's+equations

...seems to work ok.


On Tue, 30 May 2006, Henri Sivonen wrote:
> 
> I think the following could be technically feasible:
> 
> 1) Author writes iTeX code as the text content of an <f> element for 
> inline formulae (and <df> for display formulae; two elements to cut down 
> on verbosity of attributes).
> 
> 2) The browser takes the textContent of the <f> element and the computed 
> style for the list of fonts, the text color and the font size.
> 
> 3) The browser feeds this data to a sandboxed, pruned and heavily 
> macro-deprived pdfLaTeX engine that renders the formula using the font 
> properties as if the formula was a $...$ formula on an infinite line. 
> (This means that line breaks cannot occur in formulae. Consider this a 
> good enough approach for feasibility.)
> 
> 4) The pdfLaTeX engine hands back a PDF with a bounding box indicating 
> the bounds of the rendering and the position of the baseline.
> 
> 5) The browser's CSS formatter replaces the box of the <f> element with 
> a replaced element box of the size of the PDF bounding box and aligns 
> the baseline of the replacement box with the baseline of the surrounding 
> CSS line box.
> 
> 6) When it is time to draw, the browser draws the drawing operations 
> encapsulated in the PDF into its underlying vector graphics pipeline.
> 
> Obviously, architecturally this would be a departure from DOM and CSS, 
> but it could work. And because I guess the implementation would take 
> more than a couple of days, I am not volunteering to prototype this. :-)

I think this would qualify as a "hack". And not in the good sense. :-)


On Mon, 29 May 2006, Henri Sivonen wrote:
> 
> Adding math to HTML5 would broaden the scope of HTML5. A broader scope 
> translates to more editorial work and more implementation work.

From the spec's point of view, that's one of the advantages of reusing 
MathML.


On Thu, 1 Jun 2006, James Graham wrote:
>
< The majority of
> people who publish mathematical content know one or zero languages for 
> typesetting mathematics. Invariably that one language is LaTeX (they may 
> [know] zero if they rely on a tool such as Microsoft Word. For example 
> Blackwell publishing will accept submissions only in LaTeX or Microsoft 
> Word format. arXiv.org will only accept submissions in LaTeX format. The 
> Astrophysics Journal will accept LaTeX (recommended), MS Word or 
> Wordperfect. I could go on but at least in academic fields, LaTeX is 
> either the only format accepted for publication or the preferred format.
> 
> Note also that very very few people have the slightest interest in the 
> publishing process itself. They simply wish to achieve high quality 
> results at a minimum of effort. This means that they will not be 
> prepared to invest any time in learning a new language, particularly one 
> that is not already widely accepted (chicken and egg problem) or is 
> harder to use than the language they are familiar with. You may think I 
> am overstating this but I disagree - bear in mind that a significant 
> fraction of astronomical (chosen merely because it is the field I know 
> best) software is written in Fortran 77. For many of these people almost 
> 30 years of language design has never happened.
> 
> So, in general the people likely to be publishing mathematical content 
> to the internet have _no_ interest in writing their content in any 
> format other than LaTeX and especially not to a verbose format of the 
> type that fits the XML data model. This is why the web is liberally 
> sprinkled with the ugly gif output of latex2html. If we want this 
> situation to change, the _only_ solution is to allow LaTeX as a document 
> creation format. If, or whatever reason MathML is a poor target language 
> for TeX->foo converters then maybe we should talk about improving the 
> situation. But authors _will_not_ learn anything other than LaTeX.

Since we cannot make these authors use anything but LaTeX, and since HTML 
isn't LaTeX, we have to assume the existence of a tool to solve this at 
some point anyway. Is relying on tools for LaTeX to MathML conversion 
acceptable? Do such tools exist? Are they good?


> Given that we cannot expect authors to create mathematical content 
> directly in an XML language, the verbosity of the output language is 
> almost irrelevant - it should be easy to process by the computer rather 
> than easy to read (in the same way that postscript, say, is suboptimal 
> for direct authoring but is still a useful output format).

Fair enough.


On Wed, 7 Jun 2006, Martin Atkins wrote:
> 
> It seems to me that a good path would be to fix up CSS's shortcomings 
> (which have been discussed at length in this thread) so that it is 
> possible to specify math rendering with CSS.

That's out of the scope of this group, and unfortunately the CSS group has 
failed to make any progress in this area in the past few years.


> I agree that it'd be worthwhile to do tag implication though, which 
> admittedly is not an option if MathML support is just a stylesheet.

I tried this, but couldn't really get a good result without obscene levels 
of complexity.


On Fri, 9 Jun 2006, Henri Sivonen wrote:
> 
> If the WHAT WG defines a way for serializing MathML as text/html, I 
> think it should be (at least for conforming cases) a pure alternative 
> infoset serialization for a subset of possible infosets (dirty words, I 
> know) as opposed to just being something similar but subtly different. 
> That is, I think conforming documents should be losslessly 
> reserializable as XML *1.0* and the DOM nodes for the math stuff should 
> report the uncolonified element name in lower case as localName and the 
> MathML namespace URI as namespaceURI. (I guess tagName should do what is 
> most compatible with MathPlayer.)

Does what the spec says satisfy this?


On Sun, 4 Jun 2006 juanrgonzaleza@canonicalscience.com wrote:
> 
> Not so simple if you need add maintenance, search, storage, printing, 
> and accessibility items to the list of requirements.

MathML can be searched, stored, printed, and is apparently accessible. It 
isn't clear what can be done to address maintainability in any math 
language that is tag-based.


> With more sophisticated design of CSS stylesheet and with more powerful 
> CSS engines and good Unicode fonts one could achieve TeX quality output.

I do not believe this is realistic, and a number of experts in math 
typography have reported similar concerns.


> I proven that MathML code generated by IteX tool is very bad in several 
> occasions (in my weblog and in the official MathML mailing list).

Are these bugs, or design failings? i.e. could this be fixed?


> The approach was designed to be minimalist. Of course it can be 
> improved. Moreover, radicals (looking better than in Firefox with native 
> MathML support) could be best rendered via future CSS embellishments for 
> math.

Unfortunately such CSS improvements are not forthcoming.


> Original approach works in many rendering engines including off-line 
> engines as Prince. The approach has been recently generalized to work 
> also with several XSL-FO formatters (MathML does not work in FO).

XSL:FO is a non-issue on the Web, and Prince supports MathML natively. The 
concern of the CSS approach working in fewer deployed seats than native 
MathML support is very real.


> Prince developers fixed bug in some few days, whereas they were unable 
> to integrate MathML in the rendering engine in despite of many efforts.

This is apparently no longer an issue.


On Sun, 4 Jun 2006 juanrgonzaleza@canonicalscience.com wrote:
> 
> 1) MathML is incompatible with HTML, CSS, and DOM and do not achieve 
> none of original goals. Support in browsers is weak because 
> specification is not solid.

These statements appear to be untrue, as far as I can tell.


> 2) We can generate alternative mathematical markup. What one? XML-MAIDEN 
> is one option, try to offer a HTML version of SGML 12083 math is other, 
> a mixture of both, et cetera. That is to be debated.

None of those options have fared any better than MathML, it seems. 
Certainly MathML seems to be the most common tag-based mathematical markup 
language on the Web.


> After 10 years most of browsers ignore MathML because difficulties with 
> specification.

Actually it seems to be more a matter of prioritisation than anything 
else. I see no reason to believe any other markup language or set of CSS 
properties for maths would fare any better.


> Moreover, MathML is not completely DOM compatible, needs of namespaces, 
> and of xml and deal WS in a different way that rest of XML parsing 
> engines.

This doesn't appear to be the case; if you could clarify your concerns it 
would be very helpful.


On Thu, 8 Jun 2006, Øistein E. Andersen wrote:
>
> Let me highlight a few requirements that probably must be satisfied in 
> order to avoid a new failure.
> 
> 1) Do not encode every tiny semantic detail explicitly
> 
> As Henri Sivonen put it: «[I]t is futile to insist on semantics that 
> you can't pull out of LaTeX as it is normally authored.» I would like 
> to use a slightly different wording: It is futile to insist on encoding 
> anything that does not change the appearance of a formula as it is 
> written on a blackboard or printed in a book.
> 
> This point can be illustrated by the two similar-looking formulae a) 
> $\sin^2 (2p+1)x$ and b) $f^2(2p+1)x$. Mathematicians whom I know would 
> evaluate a) as {sin[(2p+1)*x]}^2 and b) as f[f(2p+1)]*x.
> 
> Rules to map between meaning and form cannot be made to work reliably in 
> all cases. Brevity is queen, and conventions may differ between 
> different fields of mathematics.
> 
> The point already made about $dt^2$ meaning (dt)^2 being encoded to mean 
> d(t^2) in MathML is just another example of what happens when there is 
> lack of agreement between what the author wants (i.e. to make the 
> formula look nice on the web so that other people can read it) and what 
> the format tries to enforce (i.e. to assure that encoding of a formula 
> be sufficiently semantical for a computer theoretically to be able to 
> evaluate it).
> 
> No semantics is clearly better than wrong semantics, and correct 
> semantics combined with nice presentation is probably not feasible 
> without encoding everything twice. After all, most authors would clearly 
> want to use the same encoding for the superscripts in e.g. $x^2 = 
> x\times x$, $f^2(x) = f[f(x)]$, $f^{(4)}(x) = \frac{d^4}{dx^4}f(x)$, and 
> $x_i^{(j)} = x_{i,j}$. Anything else will undoubtedly lead to erratic 
> encoding, and what is a poor author supposed to do when he wants to use 
> e.g. a superscript in a way for which no encoding exists?
> 
> Finally, the encoding of semantics in mathematical formulae probably 
> does not feel more necessary to most people who write and read them than 
> the encoding of the particular meaning of an ambiguous word like `can', 
> albeit a differentiation between the modal verb meaning `be able to', 
> the noun denoting a container, and the transitive verb meaning `put into 
> a can' could potentially be helpful for applications like grammar 
> checking and text retrieval.
>
> 2) Fight verbosity
> 
> The reasons for TeX's undeniable success are many, but one of them might 
> be its concise syntax. People who appreciate the aesthetics of 
> mathematics and are used to make the distinction between $f$ and $F$ or 
> between $\phi$, $\varphi$ and $\Phi$ certainly have no difficulty 
> distinguishing \big from \Big. More importantly, the amount of mark-up 
> needed to encode a line of mathematics is enormous compared to what is 
> necessary for a line of running text. Consequently, each mark-up element 
> must be kept as short as possible.
> 
> It may be true that there will always be more <p>'s than <formula>'e out 
> there; however, those using mathematics are likely to use it quite 
> heavily, which makes <m> (for mathematics, or <f> if the <m> tag cannot 
> be reassigned), <frac>2<den>3</frac> and <root>3<of>125</root> clearly 
> better suited than <formula>, <fraction>2<denominator>3</fraction> and 
> <radical>3<radicand>125</radical>.

While I agree with the sentiment here, I'm not sure what to do about it. I 
think that the disadvantages of MathML are outweighed by its benefits 
relative to the alternatives, and I don't think we should profile MathML o 
resolve this particular issue, since that would bring costs of its own.


On Fri, 2 Jun 2006, Håkon Wium Lie wrote:
>
> Also sprach White Lynx:
> 
> > Making decision is up to WHAT WG, you can follow W3C line that so far 
> > brought nothing good to scientific web (which turned into bunch of 
> > PDF/PS/DJVU files) or (even without much afforts) you can solve 
> > longstanding problem of embedding mathematics in HTML. If WHAT WG will 
> > pay attention to interests of mathematical community, we are ready to 
> > do essential part of technical work needed to incorporate mathematical 
> > markup in HTML 5, like writing DTDs, default style sheets, 
> > documentation, test cases etc.
> 
> I think you make a compelling case for adding math to HTML the simple 
> way. Personally, I'm open to adding it to HTML5. How much would it add 
> to the specification?

A whole new math language would probably add quite a lot. Adding MathML, 
as it happens, adds very little to the spec. :-)

The same cannot be said of implementations, of course.


On Fri, 2 Jun 2006, Håkon Wium Lie wrote:
>
> The goal is not to replace MathML. At this point there isn't much to 
> replace -- MathML isn't found in HTML in the wild. Aslo, at the spec 
> level, I believe the two can co-exist just like WebForms and XForms can 
> co-exist.

I think MathML is actually found more than you might expect.


On Fri, 2 Jun 2006, White Lynx wrote:
> 
> In ISO 12083 Electronic Manuscript Format description of mathematical 
> markup occupies about 7 pages, in our case I expect specification to be 
> slightly larger (let's say about 10-15 pages). It should not take too 
> much space as presentational burden is passed to style sheets, so 
> detailed formatting instructions will not be part of specification. Size 
> of extra DTD will be about 4K in case of plain DTD, if modularized in 
> XHTML 1.1 fashion it will be larger (about 10K).

Sadly, even if we went this route, we could not simply leave it up to 
the style sheet for the rendering. We would still need to define the 
detailed rendering semantics for non-CSS UAs, and for ATs. We also have to 
define error handling. I think "15 pages" is optimistic.


On Fri, 2 Jun 2006, White Lynx wrote:
>
> It is the matter of implementation, we could draw radical signs with SVG 
> but using borders only is more reliable at the moment. In any case 
> presentation should not be hardcoded in specification.

SVG radicals aren't typographically acceptable either. You really want to 
use fonts for this.


On Fri, 2 Jun 2006, Michel Fortin wrote:
>
> But it'd certainly be a lot easier for browser implementors to add some 
> math-specific CSS properties for the missing parts than to create a full 
> MathML implementation.

It's not clear that this is indeed the case. CSS is a generic lanugage 
that would end up applying to all vocabularies, whereas MathML is a 
specific language with a specific domain. CSS properties interact with 
each other in ways that make new CSS features far more complex than MathML 
features need be.


On Fri, 9 Jun 2006, Stefan Gössner wrote:
>
> I would highly appreciate a lightweight, pragamatic solution for doing 
> math on the web in a convenient way.

We now have MathML in text/html. Does this resolve your desire?


> This solution could parallel MathML the same way as Canvas parallels 
> SVG. And that does not necessarily mean, it should be javascript or 
> Latex based -- though it might be.

I'm not sure what an immediate-mode math API would look like!


> Personally I like a minimal vocabulary, which allows declarative math markup
> and integrates well with HTML.

I'm not sure that it is possible to have a declarative, DOM-based language 
for maths that is minimal.


> The proposal from George, refined by Michel currently being discussed 
> here, is definitely a good start.

It isn't clear that that proposal was any more minimal than MathML, 
really.



On Fri, 9 Jun 2006 juanrgonzaleza@canonicalscience.com wrote:
> Ian Hickson wrote:
> 
> > I would be very cautious about introducing an entirely new language to 
> > do this (even if it is "just" an extension of HTML4). For something as 
> > big as Mathematics, we want to simply re-use an existing language, not 
> > invent a new one. Inventing a new language for encoding content with 
> > as wide a problem-space as mathematics would require months, as well 
> > as the time of domain experts, etc. This work has already been done, 
> > e.g. in ISO12083, MathML, LaTeX, and other such languages.
> 
> Nobody want reinvent the wheel, but people reuse languages when these 
> *work*.

It seems to me that MathML has had more success than your own proposal. I 
wouldn't argue against MathML on the basis of success in the market if I 
were you. :-) After all, even Microsoft Office supports MathML.



On Fri, 9 Jun 2006 juanrgonzaleza@canonicalscience.com wrote:
> Ian Hickson wrote:
> > I am not at all convinced that it makes any sense to rely on CSS to 
> > render mathematics. CSS simply doesn't have the expressive power to 
> > obtain acceptably good mathematical typography, and adding features to 
> > CSS to obtain this level of expressiveness would require a huge 
> > specification with such a small target domain than nobody would 
> > implement it.
> 
> MathML versions changed to embrace more and more CSS code whereas 
> deprecating early MathML markup. A pair of months ago, one of big guys 
> of MathML asked that changes would be done in MathML for CSS compliance, 
> and exists interest in a special mathematical module in CSS.
> 
> So far as I know the Math CSS module has got problems due to incorrect 
> design of MathML and apparently has been stopped (maybe waiting a mayor 
> third revision of MathML markup?).
> 
> However, for you, CSS do not makes any sense...

I stand by my comments above, yes. Your statements above don't rebut them.


> You claim that does not has expressive power, but you do not support 
> your claims with facts, and you simply fails to show us where IS the 
> MathML good typography.

MathML doesn't intrinsically have good or bad mathematical typography, as 
it describes presentational intent, not precise rendering rules. While 
relying on CSS cannot achieve good results, a native MathML renderer would 
be technically capable of good results.


> I have provided you a sample of many test (even the most basic) from the 
> official MathML test suite are not passed by Mozilla Firefox 1.0, the 
> only with native suport after of 10 years.

As others have pointed out, your comparison was based on a flawed setup.


> The adding feature are still needed and there is planned a special CSS 
> math module for usage with MathML. You worry about adding feature to CSS 
> for math but this has been scheduled in MathML but you simply ignore 
> it...

When I wrote those comments, I was aware of the plans for a CSS module, as 
well as the likelihood of those plans ever bearing fruit. I think the past 
two years with zero progress speak for themselves on this matter.


But this is all moot. The whole point is that we _cannot_ rely on CSS, as 
not all browsers support CSS. CSS is intended to be an optional 
technology, one that can be overridden, disabled, changed by users. We 
need a language that works whether CSS is supported or not.


> I do not see tools rejecting it masively, authors ignoring it. I see 
> browser developers implementing it. I see it working in websites. Could 
> you point a single site using MathML (and I said "using" not 
> "misusing").

Free Patents Online is probably the biggest user.


On Sat, 10 Jun 2006 juanrgonzaleza@canonicalscience.com wrote:
> 
> Also never Microsoft did an official position regarding his lack of 
> interest in mathematics, but many folks are claiming the contrary as 
> excuse on why MathML is not being popular.

Microsoft seems to be supporting MathML, at least as an interchange 
format on the Windows clipboard.


On Mon, 19 Jun 2006 juanrgonzaleza@canonicalscience.com wrote:
>
> [Problems with using MathML:]
> 
> 1) One recovers all difficulties and errors of MathML desing.

These seem to have mostly been resolved, and MathML undergoes continuous 
maintenance, so it will likely continue to get better.


> 2) One recovers a CSS, DOM, and even XML unfriendly markup language.

I don't see how MathML is CSS, DOM, or XML unfriendly. It fits well 
within the platform, as far as I can tell.


> 3) The main official guidelines of the WHATWG are violated.

I don't see how. Indeed, compatibility was a key component of the design 
decisions, as described above.


> 4) Ian specific proposal is MathML incompatible and probably would add a 
> exponential difficulty to realistic implementation in browsers than 
> original MathML has been. Just for illustration let me do some 
> annotations:
> 
> What when 1 is not <mn>1</mn>?
> 
> What about 0xFFEF, MCMLXIX, and twenty one? are parsed to numbers or not?
> 
> What when + or = are not operators?
> 
> How would <mrow>d x</mrow> be parsed? And <mrow>dx</mrow> and <mrow>D
> x</mrow>?
> 
> How is 3,14 parsed? And 3 , 14? And 3, 14?
> 
> How is supposed that we encode \dot{q} in Ian proposal?
> 
> How is parsed the string "maps to"?
> 
> <msqrt>- 1</msqrt>
> 
> and
> 
> <msqrt><mrow>- 1</mrow></msqrt>
> 
> are to be treated as equivalent or no?
> 
> "- 1" and "-1" are equivalent or no?
> 
> What about <mrow>a b</mrow> and <mrow>a <mi>b</mi></mrow>? And
> <mrow>a<mi>b</mi></mrow>?
> 
> How would one deal with " T" vs. "&nbsp;T"?
> 
> Ian said "Maybe we could imply one of [&#x2062;] between pairs of terms in
> <mrow>s that don't have any <mo>s." Then would <mrow>L &rho;</mrow> work
> as <mrow>L &#x2062; &rho;</mrow>?
> 
> and <mrow> 5 </mrow>? Equivalent to <mn>5</mn> or to <mrow><mn>5</mn></mrow>?

These problems are all resolved in the current proposal.


> How is the problem of entities solved?

They are supported in the current proposal.


On Mon, 19 Jun 2006 juanrgonzaleza@canonicalscience.com wrote:
> 
> Ok, people would freely choose languages in function of strengs and 
> weakness. XML-MAIDEN, MathML, OpenMath, and others would compete in an 
> equal footing (my experience is many people think that MathML is the 
> _only_ approach to mathematical markup).

Well, they've had the past two years to compete... and MathML came in 
second, after LaTeX. So. MathML it is (given that LaTeX doesn't really 
work in the Web platform environment).


On Mon, 19 Jun 2006, Michel Fortin wrote:
> 
> So I propose that HTML 5 adds fractions, and only fractions. I think 
> there is a good consensus on how to markup a fraction. I believe 
> fractions can also be somewhat useful outside the realm of mathematical 
> formulas. And a fraction construct would encourage implementors to fix 
> their inline-block and vertical alignment CSS bugs, opening the door to 
> more CSS-based mathematical markup in the future.

This became:

   <math><mfrac><mn>1</mn><mn>2</mn></mfrac></math>

...which isn't as simple as it could be, but it's not as bad as it could 
be either (e.g. it doesn't have ugly namespace declarations).


On Tue, 20 Jun 2006, White Lynx wrote:
> Ian Hickson wrote:
> > 
> > There is nothing to read between my lines. I am being as honest and 
> > candid as possible. There is no conspiracy here. I have given you the 
> > exact reasoning I have used, I have suggested how you can move 
> > forward. I am being quite sincere.
> 
> Very well. In this case please make one small step to show that WHATWG 
> HTML is open to scientific content and add just four elements to HTML5 
> (feel free to use different element names if nesessary): "formula", 
> "fraction", "num", "den". It will not take much time and will work even 
> in MSIE:
>  http://www.geocities.com/chavchan/frac/fractions.xml
> 
> If this step will be made we will assume that WHATWG is ready for 
> constructive dialog, if however we will hear arguments like automated 
> splitting of long fractions across lines (never seen something like that 
> even in TeX) is not supported therefore markup for fractions is useless 
> or something similar then I think discussing issue with WHATWG further 
> does not make any sense.

We ended up with full MathML support, so the above is now redundant.


On Tue, 20 Jun 2006, Robert O'Callahan wrote:
>
> Since we already have a MathML implementation --- which works fairly 
> well in my experience --- I think it makes more sense from our point of 
> view to fix/improve MathML than to deal with new CSS extensions to get 
> decent rendering. MathML's purported incompatibilities with DOM and CSS 
> are not serious from an implementor's point of view, at least no worse 
> than lots of other CSS-unfriendly content we have to deal with. I hope 
> that the fonts issue gets better when comprehensive STIX fonts are 
> freely available online and we can automatically download them whenever 
> they're needed.

This looks like it is going to be ready relatively soon, which will be 
great.


> I strongly agree with Hixie and others that a new math dialect for HTML 
> needs to be proven before it can be standardized as *the* preferred 
> solution for mathematics in HTML. If MathML is as bad --- and CSS2.1 as 
> adequate --- as some say, then it should be easy to create a microformat 
> that becomes more popular than MathML. At that point there is a much 
> stronger case for inclusion.

Indeed. Sadly it seems that this was not carried out, and so we have 
nothing to base an evaluation of that proposal on. It's somewhat of an 
argument against the proposal, in fact. This is one reason I have 


On Tue, 20 Jun 2006, Alexey Feldgendler wrote:
> 
> For math, verbosity of a language is very important. Because a 
> microformat will be significantly more verbose than an HTML extension 
> would be, it's doubtful whether such microformat can become popular, 
> even if an HTML extension would.

Sure, but that could be taken into account in the evaluation.


On Tue, 20 Jun 2006 juanrgonzaleza@canonicalscience.com wrote:
> >
> > Yes. Canvas was only added once Apple proved it was implementable and 
> > popular, and had documented it.
> 
> Popular in the web?

Popular with vendors, in this particular case.


> Ok, but I was able to read betweeen lines. For example, your initial 
> silence when the thread on math was launched indicated to me that maybe 
> you were waiting it automatically closed, but did not and obtained mayor 
> audience obligating to you to add messages.

Since I reply to every e-mail sent to the list, ignoring e-mails doesn't 
work very well for me. :-)


On Wed, 21 Jun 2006, White Lynx wrote:
> 
> Basically there is nothing in proposal that could be difficult to 
> implement, morover on first stage XHTML with fallback style sheets can 
> work without any kind of native support. So implementation costs, that 
> in any case are much lower then alterenatives, are unlikely to be an 
> obstackle.
> 
> Look at proposal once again:
> 
> 1. formula, dformula, dformgrp - just containers, no problems.
> 2. sub, sup - already exist nothing to add
> 3. stack - requires support for inline-blocks. No problems in MSIE, Opera, Prince. 
> Safari and Mozilla will have to fix bug affecting inline-blocks.
> 4. fraction, num, den - the same as stack.
> 5. over, obase, top, overbrace - the same as stack
> 6. under, ubase, bottom, underbrace - if content of ubase is restricted to PCDATA then the same as stack,
> otherwise either inline-tables (work in Opera, Prince, require small bug fix in Safari) or 
> inline-blocks and CSS3 inline-block-align properties are needed. So in case inline-tables will 
> be considered unrealistic elements still can be retained provided that content of ubase is restricted
> to PCDATA, in future this constraint can be easily eliminated.
> 7. opgrp, op, uli, llim, limits - the same as stack
> 8. radical, radix, radicand - requires inline-tables. Element is safe to omit as equivalent fuctionality
> is available in power notations. So in case inline-tables will be considered unrealistic element may be omitted.
> 9. sqrt - requires inline-blocks, maybe image borders, or SVG backrgound image. Element is safe to omit as equivalent fuctionality
> is available in power notations.
> 10. fence, fenced, marker, submark - the same as stack. Support for image borders or SVG backrgound images would be useful.
> 11. matrix, det, choose, cases, case, row, entry, cell, value, scope - formally it requires inline-tables, but necessary functionality 
> exists in all browsers in a form of (X)HTML table with display set to inline, inline-table or -moz-inline-box.

This is no simpler than MathML for the same level of support, as far as I 
can tell.


On Mon, 9 Oct 2006, Simon Pieters wrote:
> 
> If I understand it correctly Ian's proposal is to have a list of 
> elements that are put in the MathML namespace when parsing. I have a 
> slightly different proposal which I'll describe below:
> 
> Instead of having a list of elements that are put in the MathML 
> namespace, <math> is a sort of namespace- and parsing scoping element so 
> that it and all of its descendants are in the MathML namespace, and also 
> that all tags inside <math> are parsed as any other <xyz> tags except 
> for start tags for empty MathML elements. E.g., <none> inside <math> is 
> an empty element but outside it is not; <br> is not an empty element 
> inside <math>.
> 
> For setting .innerHTML on MathML elements, "root" would be a new <math> 
> instead of a new <html>.

With two exceptions (supporting /> syntax instead of hardcoding empty 
elements; having a list of HTML elements bail out of <math>) this is what 
the spec has now ended up saying.


On Sat, 4 Nov 2006, Elliotte Harold wrote:
> Ian Hickson wrote:
> > > > I'm not saying don't add MathML to HTML. I'm saying don't add namespace
> > > > syntax to HTML.
> 
> As I've said elsewhere, I find this viewpoint simply incomprehensible. 
> Namespaces are ugly, but they're not that ugly or that problematic.

I guess we disagree on the extent of the ugliness and the problems. :-)


> They are also at the core of XML processing. Throwing out namespaces 
> makes it nearly impossible to process HTML with the very large and 
> powerful set of XML tools.

The namespaces would still exist in the DOM. They would just not be 
visible at the syntax level.


On Sat, 4 Nov 2006, Elliotte Harold wrote:
> Henri Sivonen wrote:
> > Note that what you quoted above was not about throwing away namespaces 
> > but about not introducing namespace *syntax* to the text/html 
> > serialization. In fact, HTML5 requires UAs to put HTML elements in the 
> > XHTML namespace.
> 
> If all we're doing is HTML, fine. However people are now in this thread 
> talking about putting MathML into this. In other forums people are 
> discussing adding XForms. If I'm wrong and people are not suggesting 
> that MathML and XForms and similar non-HTML things end up in the HTML 
> namespace, then please tell me that. I;'d be very happy to know that.
> 
> However what I'm hearing is that people do want to mix in different 
> vocabularies such as MathML and XForms without using namespaces, and 
> that seems to me to be very unwise.

Again, you are confusing namespace *syntax* (the bits on the wire) with 
namespaces in the DOM.


On Sat, 4 Nov 2006, Elliotte Harold wrote:
> 
> Then this is what I feared, and it's not sensible. Enabling mixed 
> HTML/MathML to be processed with generic tools and MathML specific tools 
> requires that it have the namespace, including all the necessary 
> namespace attributes.

Yes, in the DOM it would have the namespace. Stick an HTML5 parser on the 
front of your XML pipeline, and your XML tools would be none the wiser.


> See
> 
> http://www.elharo.com/blog/software-development/xml/2006/10/26/chameleon-schemas-considered-harmful/
> 
> for some more thoughts on this.

We are not proposing the chameleon namespace idea, which I agree is a very 
misguided concept that completely misses the point.


> Folks, adding a few xmlns:math or xmlns attributes to a document just 
> isn't that bad. It is in fact one of the least complex things anyone 
> trying to write MathML will have to deal with. MathML has a lot worse 
> than this.

We don't have to add the attributes to switch the namespace, though.


> This is like making the driver's seat in a car a little more comfortable by
> ripping out the steering wheel. It makes no sense. It is a phobia that leads
> to misjudgement the relative problems with different options.

No, it's like ripping off the label on a 100% cotton sweater. It's no less 
cotton for having lost its label. The molecules still know they're cotton, 
the laws of physics still apply to the cotton in the way they did before 
the label was removed, the only difference is that the label is no longer 
there to itch the wearer.


On Sat, 4 Nov 2006, Elliotte Harold wrote:
>
> A DOM doesn't travel over the network. The serialized form does. The DOM 
> is one possible local model used to process the document on one system. 
> My tools may or may not be based on the DOM, but they're going to start 
> by receiving an actual XML instance. We can use TagSoup to fudge HTML, 
> but if you I want to handle MathML, SVG, and other things in it, That 
> instance had better be namespace well-formed, and it had better use the 
> right names for the right things, both local and qualified.

We're not talking about putting XML in text/html. It's still a custom 
text/html syntax, it's just that it maps to XML syntax that does use 
namespaces. That is, this text/html document:

   <!DOCTYPE HTML>
   <html><head><title>Test</title></head><body>
   <p><math></math><svg></svg></p>
   </body></html>

...has the exact same namespaces as this application/xml document:

   <html xmlns="http://www.w3.org/1999/xhtml"
   ><head><title>Test</title></head><body>
   <p><math xmlns="http://www.w3.org/1998/Math/MathML"
   ></math><svg xmlns="http://www.w3.org/2000/svg"></svg></p>
   </body></html>

...which has the exact same namespaces as this text/x-mythical-lang:

   SKGHJQ#$KJTGWWEGH43Yq34YKJGRWAKGVHJSDFKGHJ@3423tWQ#KJGSDB

All three of these documents (the last being in a mythicial language for 
demonstrative purposes only) are (with some minor and non-namespace- 
related exceptions) identical. They all have a <math> element in the 
MathML namespace. They all have an <svg> element in the SVG namespace. The 
fact that the _syntax_ doesn't show that explicitly doesn't in any way 
change the fact that the document has to have that interpretation. In the 
"text/x-mythical-lang" you can't even see the word "math" but it still has 
a <math> element. It's just a matter of how the parser is defined.

Similarly, in text/html, the following valid HTML4 document fragment:

   <p>a<ol><li>b</ol>

...is identical to this valid HTML4 document fragment:

   <p>s</p><ol><li>b</li></ol>

That is, the </p> and </li> tags, despite not being present in the syntax 
of the first fragment, are still effectively present. The </p> is implied 
by the <ol>, and the </li> is implied by the </ol>.


On Sat, 4 Nov 2006, Elliotte Harold wrote:
> > 
> > Anne is talking about the text/html serialization, which is supposed 
> > to be parsed using an HTML5 parser. It is a special-purpose 
> > alternative serialization for a subset of possible infosets--like 
> > RELAX NG Compact Syntax. Please ignore the superficial syntactic 
> > similarity to XML 1.0.
> 
> Does that subset include MathML?

At this point, yes.


> I can accept a simple, subset designed purely for backwards 
> compatibility an that can be processed with a special purpose parser.
> 
> However if the plan is to mix in entire additional languages, then I 
> think this is driving off a cliff. MathML and MathML tools are designed 
> under the assumption that they can rely on well-formedness and 
> namespaces.

The concept of "well-formedness" doesn't apply to text/html, it's an XML 
technical term.

The namespaces are present in text/html content.

MathML tools will need new parsers (which will likely be available 
off-the-shelf just like XML parsers) to handle MathML in text/html, yes. 
However, the rest of MathML is unaffected -- the parsers return 
predictable, coherent DOMs with full namespace annotation, with all the 
error handling done in a manner consistent between conforming 
implementations of the spec.


On Sun, 5 Nov 2006, Henri Sivonen wrote:
> 
> You wouldn't be able to feed MathML-enabled HTML5 to MathML tools that 
> use an XML parser. You'd either have to use an HTML5 to XHTML5 converter 
> for creating an intermediate XML 1.0 serialization that can be fed to an 
> XML parser or you could optimize away the serialization and plug an 
> HTML5 parser into the XML processing pipeline the way TagSoup is used.

Right.


On Sun, 5 Nov 2006, Elliotte Harold wrote:
>
> Anne van Kesteren wrote:
> 
> > Well, the problem is that they would mean different things. Consider the
> > following fragment:
> 
> Meaning is in the eye of the beholder.

Not with HTML5. We define the meaning for any byte stream sent as 
text/html, conforming or not.


> The syntax matters. If you give me the right well-formed syntax, I can 
> do what I need to do with it, as can others. If you give me malformed 
> syntax, working with the document gets a lot more complicated.

The concepts of "well-formed" and "malformed" don't apply to text/html. 
HTML5 has "conforming" and "non-conforming" byte streams, and has defined 
processing for both, including how you go from bytes to a 
namespace-annotated infoset representation (though we do it in terms of a 
DOM, not an infoset, but the idea is the same).

Given any byte stream sent as text/html, valid or not, you can do what you 
need to do with it, as can anyone else using a conforming HTML5 parser. 
Consistent text/html processing is not limited to valid text/html 
documents the way that consistent XML processing is limited to well-formed 
XML documents.


On Sun, 5 Nov 2006, Elliotte Harold wrote:
> 
> The specific syntax is important because there's a huge, useful 
> toolchain for processing XML and there's essentially zilch for 
> processing this strange HTML 5 thing.

This is rapidly changing.


> If there ever is any software to process it, I expect it will just be an 
> adapter that feeds the HTML 5 into the XML tools. Why not ditch the 
> HTML5 layer completely and simply allow the XML tools direct access?

You are welcome to do so, HTML5 defines an XML serialisation too.

However, it seems many people actually don't want to use XML, so that's 
why we're continuing to define a variant of HTML for text/html.


On Sun, 5 Nov 2006, Elliotte Harold wrote:
>
> You're getting this backwards. There's no reason for HTML 5 to be 
> compatible with existing *documents*, existing browsers and tools sure; 
> but other documents can be handled on their own.

Browsers have to handle existing documents. Browsers, by and large, don't 
want to implement more than one HTML processor. Thus whatever browsers do 
for processing of HTML5 documents has to work for _any_ HTML document. 
Thus HTML5 has to be compatible with the processing of legacy HTML.


> > Anyone for whom interoperability in processing real world content is 
> > important.  This includes, among others:
> > 
> > * Browser vendors that have to deal with real world content.
> 
> Browser vendors can handle XHTML now. It's a non-issue for them.

This doesn't match what browser vendors are telling me.


> > * Users who like to surf the web in any browser they choose.
> 
> As long as any browser they choose is some browser released in this 
> millennium, XHTML is fine.

IE7 proves you wrong. :-)


> I'm sorry. The use cases so far just don't hold water. I reput the 
> question: who does HTML serialization help? What problems does this 
> solve?

Whether you agree with us or not doesn't really change matters... the use 
cases you were given are indeed the ones we are taretting, whether you 
believe we are right to do so or not.


On Sat, 4 Nov 2006, William F Hammond wrote:
>
> >> Ian Hickson wrote:
> >>>>> I'm not saying don't add MathML to HTML. I'm saying don't add 
> >>>>> namespace syntax to HTML.
> 
> Yes, I believe Ian Hixie's main point was not to use namespace prefixes 
> in html5.

Or default namespace declarations, for that matter (though if they are 
present, they should not be considered errors, for ease of copying and 
pasting from XML to text/html).


> If a content provider attempts a namespace declaration using something 
> like 'xmlns="http://www.w3.org/1998/Math/MathML"', it should "work" in a 
> user agent that handles the document under the current xhtml+mathml 
> regime (perhaps cued by an xml declaration [an unknown PI for html5] or 
> an xmlns attribute on the root 'html' element) and also in a user agent 
> that follows the new html5 way (which presumably would ignore 'unknown' 
> attributes).

Unfortunately I haven't seen any proposal for how to handle xmlns="" 
attributes in text/html in a way that is compatible with the legacy of Web 
content (much of which is strewn with xmlns="" attributes with bogus 
values on any manner of elements, both known and unknown), and I have 
totally failed to come to any workable solution myself that supports 
namespaces in a generic fashion.

Without a working model, we really can't support namespaces in HTML 
syntax.


> This is the way to keep as much as possible of presently circulated 
> html-with-MathML content alive.

True, but it's not the only way, as demonstrated by the new spec text.


On Sat, 4 Nov 2006, Elliotte Harold wrote:
>
> Ian Hickson wrote:
> 
> > > > Some pages even have completely bogus namespaces on the root 
> > > > <html> element, which would make the entire page screw up.
> 
> Let the page screw up. The author will notice it and fix it.

In some cases, yes. In most cases, the author is long gone and the content 
is no longer maintained, or the author has no idea how to fix the problem. 
Browsers don't make changes that break significant numbers of pages.


> That's like saying some people mistype table as tabel and therefore we 
> shoudl accept both spellings.

Ironically no, because <tabel> doesn't do anything today so misspellings 
of it are actually relied upon to do nothing.


> > > > Even worse, Office HTML, of which there is a LOT on the Web, uses 
> > > > namespaces in a way to trigger IE to do one thing, but relies on 
> > > > the other browsers *not* handling the namespaces to make sure it 
> > > > all works everywhere.
> 
> Can you elaborate on the specifics?

If you look at the source of any HTML document saved by Word in the past 
decade or so, it should be clear.


> If it's their custom office namespaces, I'm not too surprised.

It is.


> > > Are there any reasons besides ease of use and misuse in tag-soup 
> > > content that XML's namespace syntax shouldn't be added to HTML?
> > 
> > I can't think of any other reasons off-hand, no. But those reasons are 
> > so big that I find it difficult to think of anything but those 
> > problems when I consider namespaces, so it might just be that I'm not 
> > thinking clearly enough to see the other problems.
> 
> I'm sorry. Those reasons are *TRIVIAL*. They are easily handled, and easily
> fixed. 

How? Please do elaborate on this.


On Sat, 4 Nov 2006, Paul Topping wrote:
>
> Elements whose namespaces aren't known should be handled like any other 
> unknown HTML element. I believe the common way for user agents to handle 
> an unknown element is basically to ignore the tag and its attributes and 
> treat any text between start and end tags as if the tags weren't there. 
> Namespaces do not present any new challenge in this area. "Bogus 
> namespaces" are no more of a security risk than bogus HTML tags. It is 
> only the ones that ARE processed by the user agent that represent 
> potential security risks.

The problem is legacy content like:

   <html>
    <foo xmlns="bogus namespace">
     ...rest of HTML document...

We don't want to make the whole document get ignored.


On Mon, 6 Nov 2006, Chris Chiasson wrote:
>
> If there are no element name collisions between html, mathml, and 
> whatever else is proposed to be added to html 5, then all elements could 
> just be included into one schema/DTD without prefixing.

Sadly, there are collisions.


> The browsers could then do the job of filtering all the elements into 
> the appropriate namespaces.

That's basically what the spec now does.


On Mon, 6 Nov 2006, Bruce Miller wrote:
> 
> Proposal: Include MathML as xml "islands" with XML rules for namespaces, 
> but within sgml-ish html.
>
> Based on my current understanding:
> This works in IE _today_ (provided MathPlayer is installed). Roger 
> seemed to say that it is just as easy to implement in mozilla as w/o 
> namespaces, and had perhaps already implemented it.

Could you elaborate on this? I don't understand what you mean by "xmml 
islands" in terms of precise changes to the HTML5 parser model.


On Mon, 6 Nov 2006, Neil Soiffer wrote:
>
> I have been reading the discussion and at times I am perplexed.  Being a 
> user and implementer of MathML software, the key point to me is that 
> HTML5 accept valid MathML.  Without trying to restart the argument, 
> MathML tools produce text, not DOM trees, so it is crucial for 
> compatibility that HTML5 accepts (ie, does not reject/error on) MathML 
> syntax.

Indeed. This is (with some caveats) now possible.


On Sat, 24 Feb 2007, Henri Sivonen wrote:
>
> Where should the svg element from the SVG namespace be allowed in an 
> XHTML5 host document? (My expectation: It should be allowed at least 
> everywhere where the img element would be allowed. There may be good 
> arguments for allowing svg as block as well.)

Specified.


> Where should the math element from the MathML namespace be allowed in an 
> XHTML5 host document? (My expectation: It should be allowed where 
> strictly inline level content is allowed. My understanding is that also 
> display math has inline semantics even though it has a blockish 
> presentation.)

Specified.


> Where should the RDF element from the RDF namespace be allowed in an 
> XHTML5 host document?

Specified (though the RDF is just mentioned as an example in a more 
generic statement).


> Should an XHTML5 conformance checker allow arbitrary foreign elements as 
> children of the head element in order to allow free experimentation with 
> invisible non-RDF metadata in a way that doesn't encourage experimenters 
> to put their stuff inside comments or something equally ugly?

Arbitrary namespaces shouldn't be allowed, as they aren't well-known 
vocabularies that all users can expect to handle properly.

On Mon, 9 Apr 2007, Maciej Stachowiak wrote:
> On Apr 8, 2007, at 11:12 AM, Henri Sivonen wrote:
> > 
> > I think it would be worthwhile to add an attribute for script-private 
> > data to common attributes, so that scripters who need one and want to 
> > be conforming don't need to abuse e.g. title.
> 
> The class attribute can already be used for script-private data. I think 
> the time script authors go for made-up attributes is when they need a 
> set of key-value pairs. Class is not so great for that, but I'm not sure 
> any new attribute would be either, unless it provided some sort of 
> built-in key-value parsing.

Now available, using data-*="", for most values of *.

On Mon, 9 Apr 2007, Jon Barnett wrote:
>
> I can think of two possibilities.
> 
> One would be to allow the param element as a child of any element (or 
> any block level element?) 
> http://www.whatwg.org/specs/web-apps/current-work/#param

I considered this but it seems exceedingly verbose, and has parsing 
implications for some elements (e.g. how do you annotate <br> in an 
scripted editor application?)


> And then make an attribute of HTMLElement called params
> readonly attribute HTMLCollection params;

I've provided a dataset API which basically handles this.

If anyone is actually reading this 3363 line e-mail, I'm
impressed. Please do let me know that you read this.


> The only other possibility I can think of would be an HTML attribute 
> called "params" that would be a list of tokenized name value pairs, but 
> that sounds even hairier to implement.

Agreed.


> This would have simplified something I did last week involving the 
> Google Maps API, where I did, as mentioned, make up a fake attribute.  
> There may be better ways to do this.

Making up attributes, all prefixed with data-, seems like the simplest 
solution.


On Tue, 10 Apr 2007, Simon Pieters wrote:
> 
> Or allow any attribute that starts with "x_" or something (to prevent 
> clashing with future revisions of HTML), as private attributes.
> 
>    <div id="foo" x_answer="42">Some more content</div>
> 
>    var foo = document.getElementById("foo");
>    if(foo.getAttribute("x_answer") == 42) {
>      // it is!!
>    }

That's basically what is now in the spec.


On Tue, 10 Apr 2007, Sam Ruby wrote:
> 
> Instead of "starts with x_", how about "contains a colon"?

I considered that, but unfortunately, it forms a dichotomy of semantics, 
where when used in XML the DOM shows one set of names, and when used in 
text/html, it shows another (at least in legacy UAs, though we could 
hard-code behaviour for new UAs).


On Tue, 10 Apr 2007, Maciej Stachowiak wrote:
> 
> I think the problem here isn't compatibility with existing content, but 
> rather ability to use the feature in new web content while still 
> gracefully handling existing user agents. We wrote up some design 
> principles for the HTML WG based on the WHATWG's working assumptions 
> which might make this point more clear: 
> <http://esw.w3.org/topic/HTML/ProposedDesignPrinciples>. While "Don't 
> Break The Web" is a goal, so is "Degrade Gracefully".
> 
> To give a specific example: say I make my own "mjsml" prefix with 
> namespace "http://example.org/mjsml". In HTML4 UAs, to look up an 
> "mjsml:extension" attribute using getAttribute("mjsml:extension"). In 
> HTML5 UAs, I'd have to use getAttributeNS("http://example.org/mjsml", 
> "extension"). And neither technique would work on both (at least as I 
> understand your proposal).

Indeed.


> Now, we could extend getAttribute in HTML to do namespace lookup when 
> given a name containing a colon and when namespace declarations are 
> present, but then we would want to do it in XHTML as well. And using the 
> short getAttribute call instead of a longer getAttributeNS with a 
> namespace prefix might be unacceptable to XML fans.

Changing DOM Core is somewhat outside of HTML5's scope.


On Wed, 11 Apr 2007, Sam Ruby wrote:
> > > 
> > > http://intertwingly.net/stories/2007/04/10/test.html
> > 
> > In Safari 2.0.4: Processed as HTML, it says "data" and then "". Processed as
> > XHTML, it says "null" and then "data".
> > In Opera 9.00: Processed as HTML, it says "data" and then "null". Processed
> > as XHTML, it says "null" and then "data".
> > In Firefox 2.0.0.3: Processed as HTML, it says "data" and then "". Processed
> > as XHTML, it says "data" and then "data".
> > In IE/Mac 5.2: Processed as HTML, it says "data" and the second alert does
> > not appear. Processed as XHTML, neither alert appears.
> 
> The first thing that is apparent to me is that, when processed as HTML, 
> element.getAttribute('mjsml:extension') works everywhere.  So it is 
> probably fair to say that allowing it does not run afoul of either the 
> "Don't Break the Web" or "Degrade Gracefully" design principles.

...except that that would change if the parser annotated the DOM with the 
namespaces properly (as in XML).

(Also, it's architecturally bad to make the prefix values matter. That is, 
in XML, you'd want <foo xmlns:a="x" xmlns:b="x" a:q="" b:r=""/> to be an 
element with two attributes in the same namespace. You wouldn't want 
getAttribute('a:q') to pass but getAttribute('a:r') to fail.)


> Per HTML5 section 8.1.2.3, however, such an attribute name would not be 
> considered conformant.  Despite this, later in document, in the 
> description of "Attribute name state", no parse error is produced for 
> this condition.  Nor does the current html5lib parser produce a parse 
> error with this data.

I think the spec changed since you wrote this, but I'm not sure.


On Wed, 11 Apr 2007, Sam Ruby wrote:
> 
> 1) re: "prefix_name" - how are prefixes registered?  Henri is free to 
> correct me if I am wrong, but I gathered that the requirement was for a 
> bit of decentralized extensibility, i.e., the notion that anybody for 
> any reason could defined an extension for holding private data; and 
> furthermore could do so without undo fear of collision.

Well since collisions aren't much of a concern (this is for scripting, 
primarily, which is rarely syndicated, and when syndicated, there is a 
bigger problem just with the scripts themselves) we can let authors deal 
with collision avoidance by just having informal (from our point of view) 
rules about naming. For example, dojo could call all their attributes 
data-dojo-*="", for various values of *. Certainly I wouldn't say there's 
a need for formal registration of prefixes beyond just the general "data-" 
prefix for the feature as a whole.


> 2) I assert that the existing DOM standard already defines a mechanism 
> for decentralized extensibility.  Most relevant to the discussion at 
> hand is the getAttributeNS method.  It may not be defined as clearly as 
> it could be, but there does seem to be some clues which suggest what the 
> original intent was, and the beginnings of an agreement that if more 
> browsers were to conform to that intent, that would be a GOOD THING(TM).

I agree that if all we wanted to deal with was XML, and that we didn't 
have a concern over ease of authoring, that using namespaced attributes 
would be the most obvious and clear solution here.


On Wed, 11 Apr 2007, Michael A. Puls II wrote:
> 
> What would happen with <embed script-private="something">? Would the 
> data be passed to the plug-in as a script-private param or would 
> script-private be reserved; causing any plug-in using script-param not 
> to get the data? (A prefix could possible avoid this.)

This problem still exists with data-*. As written, the spec makes the 
data-* attributes be sent to the plugin as well as being available to 
script. Does this cause any problems? I'm not especially concerned with 
supporting proprietary plugins, but it seems like authors could just 
avoiding setting the data-* attributes on <embed>, or avoid breaking on 
them in their plugins, if this was an issue.


On Wed, 11 Apr 2007, Kevin Marks wrote:
>
> How about defining an attribute that is the name of the js variable for 
> use with that element? Then you can define the variable in a <script> 
> tag, and use pure JSON cleanly.

String-only support seems good enough... what are the use cases that 
involve more comprehensive data?

As far as mapping straight to a variable goes, though, that's basically 
what the .dataset object now does.


On Wed, 11 Apr 2007, Kevin Marks wrote:
> 
> No, what I'm suggesting is that you have, say, a 'localdata' attribute
> that names the associated variable:
> <script>myparams={"foo":"bar","bish":"bash"};</script>
> <div class="mydiv" localdata="myparams">
> 
> mydiv.localdata.foo =="bar"; // it is
> 
> I think making this work in current browsers would be doable by having
> a script that creates the DOM elements by looking for the 'localdata'
> parameters.

That's close to what the spec does now, except with multiple attributes.


On Fri, 6 Jul 2007, Henri Sivonen wrote:
>
> [The Namespaces] section is fine as far as HTML elements go. For reasons 
> of backwarads compatibility, the we have only one namespace we can use 
> and this section correctly designates exactly that namespace.
> 
> However, the spec should give guidance on integration of elements from certain
> well-known foreign namespaces into the XML serialization. This guidance
> doesn't need to be in this particular section, though.

Done.


> For further opinion about foreign namespaces in the XML serialization, 
> please refer to 
> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-February/009631.html

Sending it was enough. :-) (Replied to that one above.)


On Sun, 9 Mar 2008, Henri Sivonen wrote:
>
> Use cases:
> 
>  * Converting a typical LaTeX paper to text/html such that
>    everything that wouldn’t get bitmapped in a pdfLaTeX workflow
>    does not get bitmapped.

Addressed.

>  * Writing a similar document into text/html in a text editor
>    copying and pasting the SVG figures from Inkscape XML output.

Addressed, though Inkscape really needs to stop outputting so much 
proprietary crap, sheesh.


>  * Making Flash-like visually “high-impact” (sorry about the
>    marketing BS term) sites using the openly specified Web
>    platform but without the Draconianness of XML in such a
>    way that the whole thing uses retained-mode graphics and
>    lives in one DOM for easy scripting (i.e. no need for
>    scripts to deal with object or iframe sub-DOMs).

If HTML+SVG+SMIL+JS+CSS addresses this, then, addressed.


>  * Publishing the kind of content that is published on
>    http://golem.ph.utexas.edu/~distler/blog/ using a
>    legacy PHP content management system that is not XML-ready.

Addressed.


> The technical requirements that arise out of the above use cases are:
> 
>  * Establishing a pseudo-XML parsing scope for <svg> and math.
> 
>  * Putting elements in the SVG and MathML namespaces in the
>    DOM in <svg> and math scopes, respectively.
> 
>  * Establishing a nested "in body"-like parsing scope in
>    foreignObject and annotation-xml.
> 
>  * CDATA sections in the pseudo-XML parsing scopes (have the
>    tree builder toggle a flag in the tokenizer).
> 
>  * Special-case XLink attributes.

The solution basically has these, yes.


>  * MathML entities in the pseudo-XML parsing scope (have
>    the tree builder toggle a flag in the tokenizer).

I just added them to all modes. It would be really confusing to be able to 
do <math><mi>&entity;</mi></math> but not <var>&entity;</var>.


> For the use cases I mentioned above, it wouldn’t be necessary to able to 
> bind prefixes with an explicit syntax. Magic scoping on svg, math, 
> foreignObject and annotation-xml would be enough. However, for the use 
> cases to be satisfied, pasting in XMLNS-style default namespace 
> declarations for svg and math should be allowed.

Agreed.


> > Do we need to actually support arbitrary namespaces, or can we satisfy 
> > all our use cases by providing explicit support for a finite set of 
> > elements?).
> 
> For the use cases I mentioned above, support for arbitrary namespaces is 
> not necessary. However, for forward compatibility, I think the mechanism 
> should be scope-based rather than based on a finite list in order to 
> handle future expansions of MathML and SVG.

Agreed.


On Sun, 9 Mar 2008, Sam Ruby wrote:
>
> > Establishing a pseudo-XML parsing scope for <svg> and math.
> 
> IE8’s approach seems to be “establish a pseudo-XML parsing scope for 
> unknown elements which contain an attribute named xmlns that happens to 
> match a list of known values”, where the list of known values may vary 
> by user agent or installation.  IMHO, that approach merits exploration.

Unfortunately, exploration basically shows that IE8 doesn't do anything 
(at least, I couldn't reproduce it) with xmlns="", other than screw up the 
attribute parsing), and a quick study of existing documents suggests that 
there's a great deal of content that wouldn't be handled very well at all 
by anything triggering off xmlns="", even if it were limited to unknown 
elements or known namespaces (e.g. see the examples at the top of this 
e-mail).


> Secondly, if you look at a typical inkscape produced document[3], you 
> will see a number of other namespaces defined and used.  At a minimum, 
> such should cause no harm.  It would also be nice if such elements were 
> not to inhibit the potential future evolution of SVG.

I think the solution proposed handles them safely, if not especially much 
as they were originally intended.


On Mon, 10 Mar 2008, Julian Reschke wrote:
> 
> For the record: I disagree with that direction. Trying to integrate SVG 
> and MathML into HTML seems to be inferior to having a generic solution.

As noted in the introduction to this e-mail, a generic solution seems 
elusive for text/html, and furthermore seems unnecessary given the actual 
cost distribution for supporting new vocabularies. That is, a generic 
syntax would be a premature optimisation. Indeed, a generic syntax might 
actively harm future development, since it would strongly suggest that 
future developments be based on that syntax instead of having syntax 
designed for them, which in some cases may be better.


On Mon, 10 Mar 2008, Henri Sivonen wrote:
> 
> Building above-DOM support for SVG or MathML takes more effort than 
> building the below-DOM part. Chances are that if there ever is a need to 
> add another substantial chunk of element vocabulary to the Web platform 
> so that it doesn't fall under XHTML, SVG or MathML, implementing the 
> above-DOM part will take more effort than tweaking the below-DOM parsing 
> algorithm. That's why generalizing away the need to tweak the below-DOM 
> part in the future seems like the wrong optimization especially if it 
> the generalization requires the SVG and MathML cases carry more 
> boilerplate.

Exactly.


On Mon, 10 Mar 2008, Jeff Schiller wrote:
>
> - Inclusion of SVG in HTML should not require a change to the SVG
> language.

(I assume you mean language as in syntax, as opposed to language as in the 
vocabulary and associated semantics.)

> Specifically:
> 
>   a) I should be able to copy & paste the inline SVG document into a
> new standalone document and it be valid SVG
> 
>   b) I should be able to copy & paste the inline SVG document into a
> XHTML document and it still be valid XHTML+SVG

I don't think we can ever guarantee those invariants without somehow 
preventing people from making typos in their SVG in text/html, which is 
something we've already failed to do, even before adding SVG support. So 
this may be a battle in vain.

Still, in theory, if you are careful, you can do this now, yes.

If the content isn't under your control, then using Firefox's "copy XML" 
feature, or equivalents in other UAs, is probably the best way to solve 
this.


> My preference:
> 
> <html ...>
> <body>
>   <svg xmlns="http://www.w3.org/2000/svg"
> xmlns:xlink="http://www.w3.org/1999/xlink" ...>
>     <a xlink:href="foo.svg"><circle .../></a>
>   </svg>
> </body>
> </html>

This now works, per spec.


> Another option:
> 
> <html ... xmlns:svg="http://www.w3.org/2000/svg"
> xmlns:xlink="http://www.w3.org/1999/xlink">
> <body>
>   <svg:svg ...>
>     <svg:a xlink:href="foo.svg"><svg:circle .../></svg:a>
>   </svg:svg>
> </body>
> </html>

This does not (no prefix support).


On Tue, 11 Mar 2008, Henri Sivonen wrote:
> 
> Use case: Including a contextual icon whose height is synced with text 
> size and that degrades into nothingness.

HTML5 already supported that: <img src="icon.svg" alt="">


On Tue, 11 Mar 2008, Chasen Le Hara wrote:
> 
> Henri, I don't understand this part of the use case. As an author, I 
> *would* want to be able to have a backup icon or text displayed if SVG 
> is not supported. I can't think of a circumstance in which I wouldn't 
> want at least a backup image, if not backup text (at the least) 
> displayed if SVG wasn't supported (at all, or in text/html).

<object> is what you would need if you want multilevel fallback.


On Wed, 12 Mar 2008, Chasen Le Hara wrote:
> 
> Ian, is this a fitting use case? I want to be able to include MathML in 
> a text/html document (instead of a text/xml document because the syntax 
> may have errors) (such as a blog with HTML comments allowed).

Yup, though I'd question why MathML is explicitly decided on as the 
solution in the use case. :-)


On Tue, 11 Mar 2008, Jeff Schiller wrote:
>
> But what about outside the "open web" platform?  Can we safely say that 
> we don't care about name-collisions?

I am concerned with what happens outside the open Web platform as far as 
HTML5 goes. (That also applies to general WHATWG policy, I don't know 
about the World Wide Web consortium's opinion on things outside the World 
Wide Web but one might assume, from the name, that it shouldn't be a top 
priority there either.)


> See the XAML <Path/> element:
> http://www.longhorncorner.com/UploadFile/mahesh/XamlPath06102005084852AM/XamlPath.aspx
> 
> Case insensitivity would be a problem because of that.  Microsoft would 
> have had to rename their element to something else because SVG got their 
> first?

I don't understand how Windows Presentation Framework object field names 
serialised to XML are relevant to text/html.


> HTML and XUL and XAML and OpenLaszlo and maybe others have a <button>
> element too...

(XAML doesn't, technically, though WPF has a Button class that can be 
serlialised to XAML.) I don't think these proprietary frameworks should be 
a concern when developing an open standard for the Web.


> Not that you'd necessarily be inter-mixing these, and maybe I'm 
> "cheating" by using standards that aren't considered part of the "open 
> web" strata - but namespaces do provide a way to specify in a 
> non-ambiguous way which vocabulary you're using.

Sure. But we don't need them in text/html -- we have the MIME type to do 
that for us.


On Tue, 11 Mar 2008, Adam van den Hoven wrote:
> 
> It seems that there is no easy way to include other vocabularies into
> text/html assuming that:
> 1) You want the resulting docs to conforming
> 2) You want the resulting docs to be testable for conformance (using code, not
> manually)
> 3) You want to be able to support emerging technologies

Right -- that, to a large extent, is intentional. We don't want to 
encourage people to send non-well-known vocabularies over the wire, as it 
is liable to result in lack of interoperability.


>  Henri's point that we've only seen two additional element sets is not
> entirely useful. In the broader web, I agree that new languages are likely to
> be infrequent. However in corporate settings, where all the browsers are
> controlled, other technologies could very well become common.

Corporate settings are not the concern of a Web specification (or at 
least, not an important concern).



On Tue, 11 Mar 2008, Adam van den Hoven wrote:
>
> > On 11-Mar-08, at 8:37 AM, Jeff Schiller wrote:
> > 
> > 
> > The way I understand the problem is that the HTML5 parser would
> > require too many "special cases" to handle XML 'islands' like this.
> 
> Really?
> Is it not simply a matter of:
> Do I recoginize the namespace that is declared?
>  If -Yes- then parse that using my known code
>  If -No- then ignore it

Sadly not, due to the mess that is the legacy content on the Web.


> It strikes me as the only way to handle the situation with anything like
> scalability.

I agree (though in this case, I don't think we need the solution to scale 
much, as, as others have pointed out, the vocabulary space of the Web 
grows only very slowly).


On Tue, 11 Mar 2008, Doug Schepers wrote:
> 
> I also want to emphasize, however, that the same situation obtains in 
> reverse here.  SVG already has a wide deployment base on mobiles, and in 
> legacy viewers, that demands strict content (with the odd exception in 
> the case of namespace declaration in Adobe's viewer).  The network 
> effect would be critically lessened if an incompatible serialization of 
> SVG were deployed, as I said before.

I don't see any way to leverage the installed base of SVG renderers in 
handset interfaces to introduce vector graphics to text/html. Could you 
elaborate on how you would see that working?


On Tue, 11 Mar 2008, Jeff Schiller wrote:
>
> In fairness, unlike XHTML, Henri's suggestion allows the existing 
> deployed SVG content (i.e. conforming XML) to be used directly in HTML5, 
> it just doesn't condem the content if it's non-conforming XML. My only 
> fear is that this will eventually allow tutorials out there to drop some 
> of the XML aspects and end up with non-conforming SVG fragments that 
> cannot be read into tools as standalone documents.
> 
> If all SVG viewers and editors could be magically updated to support a 
> text/html serialization of SVG then I'd be less concerned.

I don't think it would take magic -- it's not like SVG UAs implement their 
own XML parsers either, so it's just a matter of plugging in a new backend 
parser library, replacing, or rather augmenting, the XML one with an HTML 
one. Off-the-shelf text/html HTML5 parsers are already being written and 
deployed.


On Tue, 11 Mar 2008, Anne van Kesteren wrote:
>
> SVG fragments in XHTML can already not be imported in SVG tools without 
> issues. Consider references to elsewhere in the document (outside the 
> fragment), dependencies on ECMAScript functions defined in an XHTML 
> <script> block that resides in the <head> element, et cetera. In the end 
> you want tools that support both and not just the graphic. After all, if 
> all you care about is the grahpic, you might as well use <object> or 
> <img>.

This is also true and is another reason to not consider this issue a 
design constraint.


On Mon, 10 Mar 2008, Jeff Schiller wrote:
>
> Sorry, I mis-understood something earlier.  Can you please help me
> understand why the following:
> 
> <html ...>
>   <body>
>   <svg xmlns="http://www.w3.org/2000/svg"
> xmlns:xlink="http://www.w3.org/1999/xlink" ...>
>     <a xlink:href="foo.svg"><circle .../></a>
>   </svg>
> </body>
> >  > </html>
> 
> wouldn't work (i.e. copy-paste the SVG section into a <?xml 
> version="1.0"?> doc?  To my knowledge it would work just fine (and has).

If you manually extract the SVG part, yes. I think Henri is saying that 
raw text/html wouldn't work (whatever we end up doing).


On Mon, 10 Mar 2008, Henri Sivonen wrote:
> 
> We cannot break output from Microsoft Office foremost. Also, as far as
> colonless xmlns processing goes, we cannot let an xmlns on root to have an
> effect because there all sorts of bogus declarations out there.

Indeed.


On Tue, 11 Mar 2008, Doug Schepers wrote:
> 
> The fact is, there is a very large deployed base of SVG UAs on mobile 
> devices, many hundreds of millions, that cannot easily be upgraded or 
> revised.

If they can't be changed at all, they aren't really relevant, since they 
will never support text/html anyway.


On Tue, 11 Mar 2008, Henri Sivonen wrote:
> 
> The conformance definition is not an effective way to achieve it, 
> because with non-Draconian error handling, there *will* be a non-trivial 
> proportion of instances where the embedded SVG island does not adhere to 
> the conformance criteria and, therefore, won't parse as XML if extracted 
> on the source text level.

Indeed, such content already existed even before we got involved!


> I am assuming non-Draconian error handling for two reasons:
> 1) Experience with XML has shown that Draconian error handling is worse than
> defined error recovery for Web formats.

Indeed, defined graceful error handling is a given for anything in HTML5 
or any WHATWG spec.


> 2) Draconian error handling would be bad strategy since it would work as 
> an implementation deterrent: If a new browser can throw a fatal error on 
> a page that legacy browsers parse to the end (even if the SVG parts 
> don't render), the new browser risks losing market share, and browser 
> vendors generally avert changes that would lead to market share loss.

Indeed. And this is especially relevant here given the existing bogus 
SVG-like content in text/html documents.


> I think there are two ways of addressing the editing use case more
> effectively:
> 1) Making the browser serialize a DOM range into XML (roughly like View
> Selection Source serializes from the DOM in Firefox) and copying and pasting
> the result as opposed to copying and pasting source.
> 2) Adding a text/html parser to the SVG editor.

Both seem reasonable, indeed.


On Tue, 11 Mar 2008, Jeff Schiller wrote:
> 
> FWIW, I wouldn't mind giving up the xmlns declarations on SVG.  Since 
> I've been using it so long I have the namespaces memorized.  It's really 
> not that hard to remember: XHTML, XLink are 1999, SVG is 2000.

...and MathML is 1998, but has an extra bit and uppercase letters...


> I end up looking up spec details on properties and attributes more often 
> than looking those up.  I do recognize that not everyone will be able to 
> do that.  What would be required to make the years optional in the 
> namespace? :)

Actually for new namespaces that was resolved already. e.g. XBL2 has the 
namespace "http://www.w3.org/ns/xbl".


> On the other hand, I don't want to give up case-sensitive 
> elements/attributes nor quoted attributes since those are much harder to 
> correct when importing SVG snippets into a standalone document for 
> editing purposes...

Well, the case, as well as correct end tags, etc, can be kept if the 
author is disciplined. But we can't guarentee it would be correct, and it 
would end up being quite complicated to be case-sensitive for SVG but not 
for HTML. It could be done, though... The question is, is it worth keeping 
the case, when things like quote marks on attributes aren't going to be 
required? (I did end up making <![CDATA[]]> case-sensitive.)


> Not just your opinion.  There are many good reasons for not deviating 
> from the syntax used in "canonical SVG".  Pragmatic compatibility 
> concerns would be enough for me, but another is that I honestly think 
> that unquoted attribute values would be ultimately more confusing to 
> authors.

It hasn't been that confusing to text/html authors, all things considered.


On Sun, 16 Mar 2008, Ben Boyle wrote:
> 
> Scenario:
> 
> I am using text/html. You've sold me on not deploying web content as 
> xhtml. I've embedded that html snippet, and I've made that error in the 
> code. Unlike me to forget quotes of all things, but there you go. I'm 
> seeing the red circle stroked blue, it's all cool.
> 
> Suddenly I am enlightened that I should use an SVG file and link to it, 
> rather than embed and serve this same snippet of SVG on every page. I 
> dutifully move the SVG code into logo.svg, link it with an img tag 
> and... what the? The circle went black!
> 
> These things are authoring nightmares. Just don't do it. Consistency 
> please!

Actually it wouldn't go black, it would just say "syntax error", and if 
you used an <img> element, would display the alt text instead.


On Sun, 16 Mar 2008, Ben Boyle wrote:
>
> You want to make a best effort to render SVG, even broken SVG. You know, 
> I'm not against that. It's how XHTML should be handled (in the browser, 
> for the end-user). If you are going to render *something* when that SVG 
> is embedded in the page, then yes the red circle with blue stroke is the 
> obvious intent of the author. Can you render the same when that SVG is 
> placed in logo.svg and loaded in via <img>, for all the same reasons 
> (what users wants to see a broken image? none!), and for the sake of 
> consistency.

Changing XML parsing rules is out of scope for this group.


On Sat, 15 Mar 2008, Doug Schepers wrote:
>
> No, there's a third way.  Have non-draconian error handling that does 
> not cause the parser to halt, but which does ensure that SVG that 
> wouldn't work in existing SVG UAs doesn't render in HTML5 UAs.  It would 
> still be parsed, put into the DOM, but attributes with unquoted values 
> (and the rest of that element) aren't rendered.  That way SVG isn't 
> fractured, and it doesn't break the error recovery of HTML for non-SVG 
> elements.

The SVG group tried that with SVG 1.0 and 1.1, and nobody implemented it 
(SVG used to require that erroneous content not be rendered without an 
error message; SVG 1.2 changes this to an "ignore errors" model more like 
CSS and, to some extent, HTML). Why would we be any more able to get 
people to implement it in text/html?


On Sat, 15 Mar 2008, Maciej Stachowiak wrote:
>
> HTML has the feature of two serializations: a classic serialization that 
> is error-tolerant, and an XML-based serialization that has draconian 
> error handling. These have different costs and benefits, ultimately it 
> is a benefit to HTML authors that they have a choice. I think SVG 
> deserves to have this feature as well, there's no reason it should fall 
> short of HTML in this regard. Supporting SVG inline in text/html seems 
> like a good opportunity to add this feature to SVG.

That seems like a reasonable request.


On Sun, 16 Mar 2008, Charles McCathieNevile wrote:
>
> On Sat, 15 Mar 2008 22:12:34 -0700, Maciej Stachowiak <mjs@apple.com> wrote:
> >
> > HTML has the feature of two serializations: a classic serialization 
> > that is error-tolerant, and an XML-based serialization that has 
> > draconian error handling. These have different costs and benefits, 
> > ultimately it is a benefit to HTML authors that they have a choice. I 
> > think SVG deserves to have this feature as well, there's no reason it 
> > should fall short of HTML in this regard. Supporting SVG inline in 
> > text/html seems like a good opportunity to add this feature to SVG.
> 
> Perhaps. The cans of worms are different though. HTML elements are 
> basically content - in principle, the text tree is reasonably useful 
> (unless you have images). SVG is about images - having parts of an image 
> not render can drastically alter the semantics of the image.

More so than having part of a text document not render?


> SVG has a mechanism for handling broken subtrees, which involves showing 
> that it is broken.

This was changed in SVG 1.2, errors are not just ignored, or handled in 
some other default way, except for XML syntax errors.


On Sat, 15 Mar 2008, Boris Zbarsky wrote:
>
> Doug Schepers wrote:
> >  so a new edition of XLink could, in theory, define an additional
> > "human-friendly" namespace name, like "http://w3.org/xlink".
> 
> This would be a huge step in usability for all the W3C specs.
> 
> Heck, it could even be: "w3c:xlink", right?  And "w3c:xhtml", "wc3:svg", 
> etc. As an occasional XML author, that would save me from having to 
> google things like "xhtml namespace" every time I need to write an XHTML 
> document (and similar for svg, etc) or having to find some document to 
> copy-paste from.
> 
> I would love for this to happen.

For text/html, I've sidestepped this by making the declarations implicit 
and thus optional in the serialisation.


On Mon, 10 Mar 2008, Sam Ruby wrote:
> 
> There are two ways to get a generic solution: top down and bottom up.  
> By the former, I mean trying to specify a syntax for a generic solution, 
> and then seeing if it applies to a number of specific use cases.  By the 
> former, I mean taking a number of use cases, addressing them, and then 
> seeing what (if anything) can be generalized.
> 
> SVG has mixed case attributes, an element named title, and makes use of 
> xlink.  MathML has foreign object.  Both continue to evolve, and any 
> HTML5 solution should be prepared for such evolution.  Anything that 
> completely (or even mostly) satisfies the above constraints likely can 
> be generalized. If focusing first on such a set of requirements in a 
> bottoms-up fashion enables progress to be made faster, then I'm all for 
> it.

Unfortunately I couldn't find a top-down approach that was compatible with 
existing documents, and the bottom-up approach ended up being specific to 
the vocabularies we were considering (and still wasn't perfect, especially 
for handling of broken <svg> fragments today, though the problem at least 
isn't as fatal as losing the remainder of the page).


> Microsoft's implementation is also reasonably general (albeit with a 
> number of restrictions).  Since compatibility with IE to the extent it 
> makes sense is a goal, this may also help.

I wasn't able to reproduce anything useful with Microsoft's 
implementation.


On Tue, 11 Mar 2008, Doug Schepers wrote:
> 
> I'm very glad to see this subject receiving serious attention in this 
> WG. I'll have more to say in subsequent emails, but I just wanted to 
> note here that the SVG WG has also discussed this quite a lot, and we 
> prepared a list of use cases and possible solutions that we intended as 
> a conversation starter. I put them in the HTML Wiki, for reference. 
> [1][2]
> 
> [1] http://lists.w3.org/Archives/Public/public-html/2007Dec/0013.html
> [2] http://esw.w3.org/topic/HTML/SVGInTextHTML

Thanks for this, it was useful to check that I hadn't missed anything 
unintentionally.


On Sat, 15 Mar 2008, Chasen Le Hara wrote:
> 
> I agree that inline SVG in text/html could be a great opportunity to 
> bring non-draconian parsing to SVG. But, I agree with Ben that a 
> "higher" group should give us the go-ahead to allow for error-tolerance 
> authoring.

I'm not clear why we would need go-ahead for this... the strategy of "take 
initiative, ask for forgiveness" is generally more productive than "ask 
for permission".


On Wed, 12 Mar 2008, Anne van Kesteren wrote:
>
> Below are some use cases for SVG in text/html from Erik Dahlström. He's 
> one of Opera's SVG monkeys.. implementors and is also on the SVG WG. I 
> reworded some of them in the process so you can blame me for typos. 
> (FWIW, I agree with the use cases :-))

They're not really use cases, more reasons why a particular soluton ("it") 
are a good thing. :-)


> Use case #1
> -----------
> It makes it easier to write scripts that operate on the document as a whole,
> for example so that you don't have to worry about getElementById() being
> called in the right content document.
> 
> Use case #2
> -----------
> It makes it possible to have selfcontained SVG+HTML documents without the need
> for data: URIs. data: URIs are not sufficient as they can be limited in size,
> require additional processing, and will be in a separate frame (see also #1).
> 
> Use case #3
> -----------
> It would eliminate the need for using XHTML when using SVG. This is important
> as it lowers the bar for deploying SVG on the Web significantly. The different
> media type and well-formedness constraints are proving to be problematic.
> 
> Use case #4
> -----------
> It makes it possible to do retained-mode graphics without using external
> files.
> 
> Use case #5
> -----------
> It makes it easier to do interactive graphics than with using <canvas>. SVG
> uses the DOM so paths, shapes, etc. can have event listeners registered and
> can be moved/interacted with/etc. on their own.
> 
> Use case #6
> -----------
> It makes it possible to reference resources such as gradients, arbitrary
> clip-paths, symbols etc, from other SVG fragments in HTML. This makes
> documents more compact.
> 
> Example:
> <html>
>   <svg>
>     <linearGradient id="somegradient" ...>
>     ...
>     </linearGradient>
>   </svg>
>   more html
>   <svg>
>     <rect fill="url(#somegradient)" .../>
>   </svg>
> </html>

The above are all now supported.


On Sat, 15 Mar 2008, Ben Boyle wrote:
>
> I expect the HTML document to be recovered, and that's my top priority. 
> I expect the SVG/MathML to be rendered as an error, like a broken 
> image/broken object. If I referenced an image (jpeg, gif, etc.) that was 
> broken, the browser wouldn't attempt to render it. I'd like to see 
> SVG/MathML handled in the same way.

Actually, browsers ignore errors in JPEGs, GIFs, even SVG images already, 
for many classes of errors.


> Let's say I create logo.svg to use on a website. I've made a mistake in 
> my logo.svg code, but I've gone ahead and used it anyway (I know, this 
> highly unrealistic scenario almost never happens in the real world, but 
> bear with me!) Let's say I included it in my HTML using the img 
> element... when I view my page in the browser -- oops! I see a broken 
> image. I will have to fix my SVG.

That depends on the error. A syntax error that makes the document 
malformed will cause a different error than an error in the syntax of an 
attribute or with a stray or misnamed element or the use of the wrong 
namespace for some of the elements or attributes.


> Let's say I embed that SVG in the HTML code, rather than use the img 
> tag. I want the exact same thing to happen: browser shows a broken 
> image, but the rest of the HTML document "works". Are the implementers 
> groaning over my naivety here? I know the situation is different from a 
> programming perspective ...

Making the image display as a broken image would be relatively hard, since 
SVG doesn't have that as a concept. We would have to extend SVG itself to 
support that state.

I'm not sure it makes sense to port the XML error handling to SVG in HTML. 
That's a syntax-level error handling concern, it seems like the HTML 
syntax should keep its error handling behaviour. (Indeed, Maciej 
explicitly requested that as a feature.) SVG itself doesn't have draconian 
error handling, it has error handling of the "ignore and continue" 
variety.


On Wed, 19 Mar 2008, Erik Dahlström wrote:
>
> I find it an acceptable trade-off that not every single SVG ever created 
> can be copy-pasted into an HTML document. However, any valid SVG 
> document fragment that doesn't depend on having XML PI:s or DOCTYPEs 
> prior to the root svg element to be a fully XML-wellformed SVG document, 
> should be copy-pastable into HTML and should work IMHO. Or in other 
> words: if you can copy-paste the svg root (and its children) into a new 
> empty document and that is still a valid SVG document, then it should 
> also be work when pasted into an HTML document.
> 
> This is similar to how SVG inline in any other XML markup would work 
> anyway, that is: you can't put XML PI:s in the middle of the document. I 
> think it's a non-goal to allow that. If you need that then you should 
> use an external file, plain and simple. The same argument holds for 
> DOCTYPE as well.
> 
> So, given these constraints, is "nothing" still too strong? :)

Well, do you accept the lack of support for prefixes? i.e. that the SVG 
tag names have to be without colons?


On Wed, 19 Mar 2008, Sam Ruby wrote:
>
> Examples of a few things worthy of further discussion:
> 
> a) Psychotic use of namespaces[1]
> [1] http://lists.xml.org/archives/xml-dev/200204/msg00170.html

Not supported in the proposal.

> b) <![CDATA[ ]]> (I'd say nice to have)

Supported.

> c) Attribute Value Normalization[2]
> [2] http://www.w3.org/TR/REC-xml/#AVNormalize

Not supported.

> d) xml:space

Supported.


On Sun, 16 Mar 2008, Philip Taylor wrote:
>
> I have a Python script that provides a web interface to some particular 
> application.
> 
> It responds to HTTP requests by executing the application (with certain 
> input parameters coming from the request) to generate some data. It 
> creates an HTML table displaying the data, and executes Graphviz to 
> produce a pretty picture of the same data.
> 
> To avoid executing the application more times that necessary, and to 
> avoid adding some complex caching logic into the script, it has to 
> return the table and graph in a single HTTP response.
> 
> To allow decent-quality zooming of the graph, it is generated with 
> Graphviz's SVG output option. (This also keeps the file size down - the 
> uncompressed SVG is about 20% of the size of an equivalent PNG.)
> 
> To have the SVG automatically take up the right amount of space on the 
> page, and to allow easy 'view source' debugging of the SVG, it is output 
> inline in the HTML response (instead of in a base64 data: URI).
> 
> The Graphviz SVG output is first parsed into a DOM in Python. The <svg> 
> element is extracted (hence removing the <!DOCTYPE> etc), then modified 
> a bit (removing some links and styles, changing some title text, etc), 
> then serialised (using the DOM library's standard XML serialiser) and 
> concatenated into the XHTML response string.
> 
> 
> 
> A later version of this system added some client-side interactivity. The 
> HTML code contains something like:
> 
>     <script><![CDATA[
>     function mouseover() {
>         this.childNodes[1].setAttribute('class', 'hovered');
>     }
>     window.onload = function () {
>         var a = document.getElementsByTagNameNS(
>             'http://www.w3.org/2000/svg', 'a');
>         for (var i = 0; i < a.length; ++i) {
>             a[i].onmouseover = mouseover; // plus similar for mouseout
>         }
>     }
>     ]]></script>
>     <style>
>     ellipse.hovered { fill: #ff6 !important; }
>     path.hovered { stroke: #f00 !important; stroke-width: 2px; }
>     </style>
> 
> For SVG in text/html, it would be good if the same code still worked - I don't
> want to have to remember two different ways of processing DOMs, and I don't
> want to learn about CSS namespaces.
> 
> 
> The HTML response can contain multiple SVG graphs. Since they are all 
> generated independently by Graphviz, they all have a <g id="graph0" ...> 
> (plus lots of other common IDs). It would take far too much effort to 
> uniquify the IDs and fix up all the references (particularly since the 
> references can be in style="url(#...)" too).
> 
> Currently I never use the IDs, so that doesn't matter, but this will 
> break if the SVG starts containing ID references (e.g. for <defs>) or if 
> I start using getElementById, so it would be good if this overlapping ID 
> situation could be handled well. (E.g. references inside an <svg>-rooted 
> subtree could preferentially match IDs in the same subtree, before 
> looking outside to the whole HTML document.)
> 
> 
> The interactive script also needs to extract some data from graph nodes (a
> list of paths leading to that node) when the user selects them, to display in
> an HTML table. Currently it does something very much like:
> 
>   <svg xmlns:custom="custom-data" ...>
>     ...
>     <g custom:paths="[[0,1,2],[3,4,2]]" .../>
>     ...
>   </svg>
> 
> where the Python script adds the custom:paths attribute, and the onmouseover
> script does getAttributeNS('custom-data', 'paths') and parses and processes
> the data. I don't much care about conformance, but I need some way to attach
> arbitrary data to elements, and it shouldn't be harder than adding an
> attribute.
> 
> 
> This would all be slightly nicer if I didn't have to spend time looking 
> up the XHTML namespace and making sure the whole HTML part of the 
> response was well-formed XML. It isn't a compelling case for supporting 
> SVG in text/html, since it works alright as application/xhtml+xml, but 
> if SVG is supported to some extent then it would be nice to support this 
> case too, rather than only supporting simple cases and telling people to 
> go away and use XHTML if they want something fancier.

I didn't find a solution for the ID problem -- use unique IDs e.g. using 
GUIDs.

The custom data thing on SVG won't work in HTML5 right now, but if the 
data-* concept (the same but for HTML) takes off, then I think we should 
ask the SVG and MathML groups if they mind supporting the same syntax.

I don't think we should ask them until it is a proven technology, though, 
as that would risk polluting those languages with no benefit.


On Sun, 16 Mar 2008, Henri Sivonen wrote:
>
> Here's one way of solving this:
>  * Removing the requirement for ID uniqueness in XML and HTML.
>  * Making a change in parent vs. child element namespace establish a scope.
>  * Making the presence of the xml:base attribute establish a scope.
>  * Making ID matching take to arguments: the ID and a context node (i.e.
> moving gEBI on Node so that 'this' becomes the context node).
>  * Making ID references prefer matches in the same scope moving up the scope
> tree if there is no match is the inner scope.
> 
> All in all, solving this problem would require some drastic changes. The 
> problem may be too costly to solve. :-(
> 
> It seems to me that the least expensive fix is to change SVG generators 
> to use GUID-style IDs to make the probability of an ID clash negligible.

I agree with your conclusion (too costly, use GUIDs).


On Tue, 25 Mar 2008, Jeff Schiller wrote:
> 
> Here's another use case:
> 
> 5. The ability to animate elements in a web page (hypertext, vector
> graphics) without using script.
>   * sizes, positions, opacities, colors, transforms (basically most
> attributes and properties)
>   * time-based and DOM event triggering for begin/end
>   * linear, spline interpolation
>   * inlined (for simple web pages) and non-inlined animations (to promote
> separation of content and presentation)

Do the SMIL parts of SVG handle this satisfactorily?


On Tue, 25 Mar 2008, T.V Raman wrote:
>
> To animating vectors, add the same for math equations.
> 
> the cases written down so far give you static mathematics --- but
> it would be nice to 
> be able use the live hypertext features of the Web to
> expand/collapse equations to explain a complex proof. This can be
> done today with careful tweaking of CSS bits if you really know
> what you're doing, but writing it is difficult. 

Does MathML's <maction> address this enough?


On Tue, 25 Mar 2008, Erik Dahlström wrote:
> >      Priorities:
> >       * Compatibility with existing graphics packages
> 
> Reading that sounds to me like that should cover the inclusion of 
> namespaces, since many/most svg editors put data in custom namespaces in 
> the svg files, and that data should not be lost when parsed (doctype and 
> XML PI:s to be excluded). Also for things like <svg:metadata> I'd say 
> that it's pretty much a requirement that data inside it isn't lost when 
> it's parsed to DOM. Furthermore it's not that uncommon that people want 
> to put custom data in namespaced attributes in svg, and I would expect 
> that to continue to work even if used inline in HTML.

<svg:metadata> isn't supported by this proposal, and the namespaced 
attributes end up in the wrong namespace in HTML UAs (and are 
non-conforming anyway, so they'll get flagged as errors).

Do graphics really depend on this data?


>   Use-case: enable new svg/mathml features as they become available, so that
> people can use them even if inline in HTML.

Supported.


>   Use-case: ability to access custom (namespaced) data specified in e.g.
> svg:metadata.

Could you elaborate on this use case? You have it more phrased as a 
requirement than a realistic use case. I don't understand what graphics 
would rely on proprietary namespaces like this.


On Tue, 25 Mar 2008, James Graham wrote:
> 
> Just to complete the circle, I should mention the use cases for 
> html-inside-maths and html-inside graphics. As an example of the first 
> consider a mathematics tutor with questions like
> 
> 4/9 = ?/3
> 
> Where the ? is to be filled in by the student. This seems like a good 
> use for a html input element inside maths content.

Supported (though MathML maybe should be explicit about how to handle 
HTML in the various elements that text/html now supports HTML in).


On Tue, 25 Mar 2008, Maciej Stachowiak wrote:
> 
> * Include "bindings" in the style of XBL, XBL2 or Windows HTCs inline in 
> a text/html document, much as scripts and stylesheets can be included 
> inline.
> 
> I'm not sure this is very strong (external files only for bindings in 
> text/html doesn't seem like a huge problem) but XBL2 can be supported 
> inline in application/xhtml+xml, so presumably this isn't considered 
> completely pointless. I can imagine this being handy during exploratory 
> programming, or in cases where it's desirable to deliver as much as 
> possible in a single resource so everything goes inline in the HTML.

I haven't tried to support this. Supporting XBL2 in HTML5 seems premature, 
since XBL2 hasn't yet been proved in the market.


On Sun, 30 Mar 2008, David Carlisle wrote:
> 
> Unlike the html case where you can try to specify full application 
> behaviour even in error situations, mathml is intended primarily to be 
> hosted by some other language (most mathematical expressions live in 
> some wider context) and the application behaviour of xyz+mathml has to 
> be mainly influenced by the application behaviour of the host language 
> xyz.
> 
> So basically the current situation is that the above isn't MathML so if 
> you give it to a MathML (only) system it will generate an error, but if 
> you give it to a system that defines some language (such as html+mathml) 
> that isn't defined by the mathml spec, it may do something else such as 
> silently ignore the error.
> 
> In an HTML5 context you are not going to want (the equivalent of) a 
> validity error on parsing which kills the entire document, that is 
> clear. But the fixup should only be, that an implied merror (or mtext, 
> perhaps) is inserted
> 
> <math>1+2</math>
> 
> couuld perhaps parse as (preferably)
> 
> <math><merror><mtext>1+2</mtext></merror></math>
> 
> rendering typically as 1+2 in a red border
> 
> or perhaps we could consider whether it should parse as
> 
> <math><mtext>1+2</mtext></math>
> 
> redering as 1+2 with no mathematical spacing refinements.

For compatibility with existing Web content (a lot of which just has 
<math>...</math> with text) I have made HTML5 specify that text runs in 
MathML fragments must be treated like <mtext> would be.


On Sun, 30 Mar 2008, David Carlisle wrote:
> > 
> > What are the error handling rules for standalone MathML? Wouldn't they 
> > want to be the same as for MathML in HTML?
> 
> Well first we need to make sure that "MathML in HTML" bears some 
> resemblance to MathML, which is something that you seem to be suggesting 
> is most definitely not the case. If the Markup language is 
> unrecognisable as MathML it makes no sense to ask if the error behaviour 
> is the same.

The proposal is that they be the same.


> But assuming that the final markup is recognisably MathML, you may still 
> want different error behaviour. In particular the case of embeded html 
> elements. If you stick an html element in the middle of a mathml 
> expression, that clearly is an error condition in a pure mathml 
> application. However you may (or may not) want to make this a defined 
> non-error behaviour in html+mathml.

I haven't defined anything at the semantics level, though the parse level 
does allow HTML in <mn>, <mo>, etc, and the rendering level will likely 
define this more (possibly by reference to CSS).


On Mon, 31 Mar 2008, David Carlisle wrote:
> 
> Silently fixing up a three argument fraction so that the second two 
> arguments are arbitrarily concatenated into a denominator is simply the 
> wrong thing to do whatever syntax is chosen.  In an HTML context you 
> want to be able to fix things up so you can complete the parse of the 
> document, but the result shouldd be flagged as an error either visually 
> or in the dom or somewhere.

I've made this a requirement of HTML5+MathML processors.


On Mon, 31 Mar 2008, Justin James wrote:
> 
> Right now, we are only accounting for MathML. Even when we discuss this 
> issue as not MathML-specific, we are thinking only in that context. 
> What's to happen if I come up with my own DOM and want to embed it? Do I 
> also have to account for, in my DOM, these kinds of issues? Do I submit 
> it to the W3C and hope that the handling rules make it into HTML 6?

If you want browsers to support your language, then yes, you need to get 
the buy-in from the wider Web community. That's the case regardless of 
whether we have a generic syntax system like XML, or a language-specific 
syntax like text/html's.


> Allow the HTML author to embed an OBJECT reference to a parser that 
> handles their format. Problem solved.

In what form does this parser come? Will it work on all platforms?


> In other words, what makes MathML (or *ML) so special that we feel the need
> to cater to it in HTML 5?

Mathematics are an important part of humanity's legacy and culture, and an 
absolutely key part to almost any future progress in technology, science, 
and any art that relies on new technology. Music, video, printing, space 
travel, medicine, it all relies on mathematics. Given the importance of 
mathematics to our society, we should absolutely support it on the Web, 
and the Web is 99.8% HTML.


On Mon, 31 Mar 2008, Justin James wrote:
> 
> Which is really my stance. :) If people really want to be able to embed 
> MathML (or any other format), they can write browser plugins that do 
> this.

How would I view a MyMathPluginML-enabled Web page on my iPod Touch? I 
cannot install Web plugins on that platform, and it is highly likely that 
even if I could, no plugin would be available for that platform anyway.

Maths is a key part of our heritage, it deserves to be a first-class 
citizen on the Web.


On Mon, 31 Mar 2008, Bruce Miller wrote:
> 
> My concern was that the browser, after parsing whatever form into a DOM, 
> would be required (or _very_ strongly encouraged) to allow exporting 
> that DOM as XML.  Then, _any_ MathML (ditto SVG) application could use 
> the result, including old, strict MathML applications on the one hand, 
> to HTML5 browsers on the other.

I've added encouragement to this effect.


On Tue, 1 Apr 2008, Sam Ruby wrote:
> Ian Hickson wrote on 03/31/2008 08:43:16 PM:
> >
> > I would expect that we would allow the xmlns="" attribute on <math> to 
> > have the MathML namespace, in the same way as we allow xmlns="" on 
> > <html> to contain the XHTML namespace. It wouldn't have any effect, 
> > though.
> 
> Such attributes will have an effect in IE8; and, in fact, as I 
> understand it be necessary for proper interoperation (with what for now 
> is hypothetical) IE8 plugins that might support MathML.

Actually I haven't been able to reproduce this IE8 behaviour.


On Sat, 29 Mar 2008, William F Hammond wrote:
> > 
> > One of the use cases is the mixing of graphics and form controls into 
> > equations. Is it possible to extend MathML to allow specific HTML5 
> > phrasing-level elements (like <em>, <img>, <input>, also maybe the 
> > <svg> element) wherever the <mglyph> element is currently allowed, or 
> > something along those lines?
> 
> Not anywhere <mglyph> is allowed, but maybe a small number of these 
> inside <mtext> where the recursive content is presumably ignored under 
> pasting into computer algebra systems.

Right now I've allowed it anywhere <mglyph> is allowed, so that, e.g., SVG 
can be used to draw an operator, or HTML can be used to put an input 
control instead of a number in an equation solving question.


On Tue, 1 Apr 2008, Neil Soiffer wrote:
>
> That's a nice list of references.  As David said in a separate email 
> thread about &phi -- you can't win no matter which mapping you choose.

Well, at the moment we're doing whatever MathML3 does, by using the same 
source file for the entity table.


On Sun, 30 Mar 2008, Dailey, David P. wrote:
> 
> I don't think anybody is saying that HTML5 should disallow VML.

Well, it does, and always has, so I guess they don't have to...


On Tue, 1 Apr 2008, Julian Reschke wrote:
> Philip Taylor wrote:
> > 
> > Lots of existing content uses attributes named "xmlns" -- see e.g. 
> > <http://philip.html5.org/data/xmlns.txt>. (Only 45 pages in the data 
> > set had XML content-types, so this is pretty much all text/html). It 
> > seems that compatibility requirements will make it impossible to base 
> > anything on the "xmlns" attribute.
> 
> Well, a similar argument can probably applied to *any* change in the 
> language, such as adding new elements.

Not really. There are some proposals that lead to much worse handling of 
older pages than others.


On Thu, 3 Apr 2008, Eirik Mikkelsen wrote:
> 
> What I'm suggesting is: Standardize and extend IE's conditional comments 
> syntax, where we introduce a new "operator" (accept), that takes a 
> mime-type string as operand

That's some pretty ugly syntax. :-)


On Tue, 1 Apr 2008, Jim Jewett wrote:
>
> I think everyone agrees that MathML benefits from the strictness 
> requirements in XML.

Actually no, many of us think that the strictness of XML is a bug. (Whence 
the "XML5" effort.) See e.g. Maciej's e-mail above.



On Tue, 1 Apr 2008, Julian Reschke wrote:
>
> Jim Jewett wrote:
> > ...
> > The decision needs to be specified.  And in html, the short "invalid"
> > form should probably be acceptable, representing the simpler case.
> > (You can always be explicit to get the other case.)  If mathml needs
> > to keep it invalid (to ensure quality), then maybe the two languages
> > aren't ready for tight integration.
> > ...
> 
> For the record: I totally disagree that it needs to be specified. If it's not
> specified in MathML, so be it.
> 
> All of this smells like an attempt to impose "WHATWG" design on MathML.

If MathML is going to be in text/html, it has to play by the text/html 
rules, yes. That means well-defined behaviour, including error handling. 
Now having said that, I would encourage any serious spec writer today to 
define error handling; it's clear that leaving error handling undefined 
just leads to poor interoperability and shows bad editorship.


On Tue, 1 Apr 2008, Robert Miner wrote:
>
> My own preference would be to do the repair in place with an merror (as 
> opposed to fixing things up with mtext and then wrapping the whole 
> equation).  However, I could live with a user agent choosing to render 
> merror as a standard mrow.  In other words, merror would be there in the 
> DOM, but there wouldn't be any visual indication of the error in the 
> rendering.
> 
> Example:
> 
> <math>
>   <mfrac> <mn>1</mn>  <mn>2</mn> <mn>3</mn> </mfrac>
> </math>
> 
> would appear in the DOM as
> 
> <math>
>   <mfrac> 
>     <mn>1</mn>  
>     <merror> <mn>2</mn> <mn>3</mn> </merror> 
>   </mfrac>
> </math>
> 
> and render on screen as 1/23.  (Obviously it would really render as a 
> case fraction, and 1/23 is just my ASCII art approximation of it.)
> 
> I presume that would also allow one to add a <style> .merror 
> {background: red} </style> or whatever you want to the document and have 
> errors highlighted in red if you want to see them.

The HTML5 spec requires it to appear in the DOM as:

  <math>
    <mfrac> <mn>1</mn>  <mn>2</mn> <mn>3</mn> </mfrac>
  </math>

...but to render as if it was really:

  <math>
    <merror> (something) </merror>
  </math>


On Wed, 2 Apr 2008, Bruce Miller wrote:
> 
> _Surely_, no one out there is writing HTML using <whatevertag/>
> when they _dont_ mean to close the element?!?!?!
> (rolling my eyes :> )

It's actually really common, in particular e.g. with <a>:

   <a href="hello.html"/>Hello</a>

...is quite common.


On Wed, 2 Apr 2008, David Carlisle wrote:
> 
> But I think you need to look at that again, that rule makes it virtually 
> impossible to embed other languages (even as annotations) without having 
> to have the language known in advance to the editor of the html spec.
> 
> Even in mathml, it's not at all uncommon to see <mrow/> for example, 
> just as in TeX you often see {} when generating MathML (or TeX) you 
> often add a group "just in case" and end up with nothing there.

MathML and SVG elements now support /> syntax; HTML doesn't. Other 
languages aren't supported at all in text/html.

On Wed, 2 Apr 2008, David Carlisle wrote:
> 
> It's odd that earlier in the the thread we were told that proper 
> handling of html5 would require a real html5 parser (of which several 
> ought to be available) but in the same thread there is the repeated 
> requirement that html5 "work" with the existing html4 parsers. (Which 
> presumably doesn't go as far as saying what the HTML spec (by reference 
> to sgml) says it should do for /> which is to treat the > as character 
> data.

Parsers in real deployed user agents have far more in common with the 
HTML5 parser spec than with the requirements imposed by the HTML4 spec in 
terms of parsing.


On Thu, 3 Apr 2008, Jeff Schiller wrote:
> On Wed, Apr 2, 2008 at 10:07 PM, Ian Hickson <ian@hixie.ch> wrote:
> > 
> > Say the trigger is <newsyntax>. Now assume someone writes:
> >
> >  <p>foo <newsyntax> ... </newsyntax> bar </p>
> >
> > ...and that such a page works well in new browsers. Given how people 
> > copy and paste content on the Web, especially how people copy and 
> > paste _new_ syntax on the Web, even before it is implemented, it is 
> > very likely that someone will copy just the "foo" part, accidentally 
> > including the <newsyntax> bit:
> >
> >  <p>bla bla foo <newsyntax> bla bla </p>
> >
> > This will now effectively "poison" the <newsyntax> idea, since the 
> > pages that result from this cargo-cult copy-and-paste attitude will 
> > render badly in browsers that support the new syntax.
> 
> Now I understand where you are coming from.  I don't think there's any 
> way to avoid the 'rendering badly' for all cases, I'm sorry.

Well, the proposal given handles it for many cases.


> <!DOCTYPE HTML>
> <html><body>
> <video ...>
>   <p>This is fallback content.</p>
> </video>
> </body></html>
> 
> Now if somebody copies only part of this document into their own 
> document (and somehow gets the DOCTYPE right):
> 
> <!DOCTYPE HTML>
> <html>
> <p>I am teh HTML genius
> <video ...>
>   <p>This is fallback content
> <p>And don't you forget it
> </html>
> 
> Is there any browser that won't render the above 'badly'?

Getting the DOCTYPE right doesn't matter (it has no effect).

The difference between this and the more generic namespaced syntax issue 
is that the single element case is easier to fix if it becomes a problem. 
So we can take more risks. With the namespace thing, there's no way to fix 
it if we do run into the problem (which we have; see examples earlier).


On Fri, 4 Apr 2008, William F Hammond wrote:
> 
> So as things are now for xhtml+mathml in some user agents, with the 
> markup
> 
>           <p>The value is <math><mn>5</mn></math>.</p>
> 
> the author risks having the period stranded at the beginning of the next 
> line.

I think this is a bug with those UAs' line wrapping algorithms. It should 
be filed as such.


On Fri, 4 Apr 2008, David Carlisle wrote:
> > 
> > <image>
> > 
> > It's not valid in HTML, but existing content requires it to be magic 
> > in parsing.
> 
> Hmm we should probably take that up internally in the Math WG.
> 
> In MathML2 <image/> can (informally) be replaced by 
> <csymbol>image</image> and in MathML3 we're working on a formal 
> specified correspondence between these forms (for all MathML content 
> elements not just image), so it would be possible to add a note to the 
> MathML3 spec deprecating <image/> or warning not to use it in text/html 
> contexts (with no loss of functionality) if it's going to cause 
> problems. (Personal response, not a considered WG position)

I think that would be great. I unfortunately was unable to determine if it 
would be an issue, because my parser (which I used to study the existing 
Math-in-HTML content) already implements the img->image substitution and 
so any MathML <image> elements were treated as HTML <img> elements.

Congratulations and thanks to the MathML group on keeping the MathML 
vocabulary free of name clashes, by the way.


On Fri, 4 Apr 2008, Julian Reschke wrote:
> >
> > If we're willing to consider solutions that _don't_ take the existing 
> > legacy content into account, we're better off doing a more thorough 
> > job and going with something like XHTML2, XML, XML Namespaces, and so 
> > forth.
> 
> There's a gray area between "all" existing content and "most" existing 
> content.
> 
> If "all" content needs to be considered, we essentially can't do 
> anything new.

I don't buy that. But even if it's "most" (and I only said we had to 
consider the legacy content, not support "all" of it), that's still a 
_lot_ of legacy content.


On Sat, 5 Apr 2008, Simon Pieters wrote:
>
> SVG can contain HTML fragments in foreignObject, e.g. iframe. MathML can 
> also contain HTML, e.g. form controls.
> 
> In order to have source-level consistency between text/html and XML, the 
> xmlns talisman should be allowed on any HTML element that has an SVG or 
> MathML namespaced parent.

I have allowed it on any MathML or SVG content, and any HTML node that is 
a child of a node that isn't an HTML node, but it must have the right 
value.


I've marked ISSUE-37 (html-svg-mathml, "Integration of SVG and MathML into 
text/html", HTML 5 spec) as closed. Please feel free to reopen it if 
further feedback is sent on this issue.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 10 April 2008 09:52:16 UTC