Re: SVG and MathML in text/html

On Mar 10, 2008, at 19:03, Jeff Schiller wrote:

> Whatever direction is taken (generic extensibility or specific
> allowing of SVG and MathML inline with HTML 5), I would like to add
> the following point:
> - Inclusion of SVG in HTML should not require a change to the SVG
> language.

That kind of statement is certain to lead to people talking past each  
other when one set of people speaks of a DOM language and another set  
of people speaks of the character stream that appears in the  

I think we should require that that the DOM representation doesn't  
undergo changes. However, I'd be ready to allow the serialization  
stream syntax to be both extended and limited as long as certain  
common cases appear to work as perceived by authors.

> Specifically:
>  a) I should be able to copy & paste the inline SVG document into a
> new standalone document and it be valid SVG
>  b) I should be able to copy & paste the inline SVG document into a
> XHTML document and it still be valid XHTML+SVG

Personally, I don't consider that kind of copy & paste a requirement.  
The kind of copy & paste I'm interested in is taking a standalone SVG  
document where elements don't use namespace prefixes and pasting it  
into text/html.

Note that enabling source-level copy & paste from text/html to XML  
would require a lot of special-casing in the tokenizer in conformance  
checkers and would make the syntax rules of SVG subtrees inconsistent  
with the surrounding HTML elements. Still, authors who don't use  
conformance checkers would notice any of it, since they could still  
violate the rules. (Non-Draconianness is a major feature of text/html  
compared to XML. Introducing Draconian error handling in the SVG bits  
in text/html would miss the point big time.)

> In both cases, I might need to copy & paste some xml namespace
> attributes from one element to another, but otherwise there is no
> change to the SVG language i.e. it's still XML with the requirements
> that:
>  - attributes be quoted

What should, in your opinion, happen when they aren't? (In my opinion,  
attribute values should be tokenized like HTML attribute values. Then  
it's only a matter of deciding which cases are parse errors and I'd  
prefer to be consistent with the surrounding HTML.)

>  - elements be closed

I agree that this should be required for conformance. What should, in  
your opinion, happen when they aren't closed? (In my opinion, the  
stack of open elements should be searched for a matching element in  
SVG scope. If one is found, the stack should be popped until that  
element comes off the spec. If a matching element is not found, the  
end tag token should be ignored.)

>  - elements are case-sensitive

I agree.

> The reason I think this is important is that, if we do not ensure it's
> valid XML SVG, then we'll have interoperability problems in existing
> browsers and viewers that support SVG and SVG content tools.

Allowing inline SVG in text/html is an interop problem with existing  
UAs no matter what. The justification for introducing such an interop  
problem is the assumption that the non-inlineability of SVG in text/ 
html is holding SVG back and enabling SVG in text/html would allow SVG  
(in the sense it exists as a DOM language) take off.

> SGMLizing SVG would require more work from developers to ensure that
> that flavour of SVG allowed in HTML works in their tools.

(There's no SGML here.) And yes, SVG tools wishing to support SVG in  
text/html would need to include an HTML5 parser and a serializer.  
These are expected to be off-the-shelf components plugging into  
popular XML APIs.

> Further
> elaboration here:

(Quoting from blog post.)
> Lots of talk these days about allowing SVG inline with text/html  
> content.  I thought I’d try and put some thoughts down.
> Start with Doug’s excellent post on this topic.  I don’t have any  
> opinion on the aria-specific elements of the debate.  I’m fine with  
> either adding a namespace to these attributes when used in XML or  
> letting those attributes attach themselves to the SVG language  
> without a namespace if it simplifies things, I don’t see a need to  
> update the SVG specification for this though.

I disagree that the blog post casting me as a Capulet is on this  
topic. It was on the ARIA topic, which is a different topic (new  
attribute bits without parser changes vs. namespaced elements with  
parser changes).

> I also don’t see a need to reinvent or create a new namespacing  
> mechanism using underscores or dashes, this seems silly and/or  
> dangerous to me.

The whole point of the aria-* scheme is that it is *not* a namespacing  
mechanism as far as the parser and the DOM are concerned. It is a  
naming convention for people.

> But there’s something I’m not getting about the recent discussion  
> of allowing inline SVG in text/html (or HTML5).  Anne seems to be of  
> the opinion that it would be a good opportunity to simplify the SVG  
> language - maybe eliminate namespaces, allow upper-case SVG  
> elements.  Kind of an “if you want to play in the HTML playground,  
> you have to wear the right kind of sneakers” attitude.

The XML playground sure gets worked up if one doesn't wear the right  
namespace sneakers. :-)

> This is kind of like HTML abusing its monopoly, isn’t it?

I disagree. It's about making stuff work for authors. That's not  
abuse. (I disagree with Anne's case-insensitivity bit, though. I think  
SVG and MathML scopes should put the tokenizer into a case-preserving  

> I don’t think this is a good idea.  If you allow <CIRCLE CX=40> to  
> be the same thing as <circle cx=”40″/> eventually we’ll start  
> to see people producing non-compliant SVG in the wild.

We can define it as compliant SVG in text/html.

> Then we’ll have people creating inline SVG for HTML that won’t  
> work in the many SVG tools and viewers that are already out there  
> and we’ll just have frustrated authors.

As pointed out above, SVG in text/html won't work in existing SVG  
tools and viewers even if the syntax inside HTML was a well-formed XML  
island. After realizing that I won't work anyway, there's no point is  
sticking strictly to XML tokenization.

> Then some tools might feel forced to accommodate the lax HTML-style  
> of SVG,

Indeed. If we enable SVG in text/html, this will be the case even if  
we insist that only XML-looking stuff is conforming.

> just like the mess we have now with browsers trying to understand as  
> much content as they can in order to compete.

It's not a mess where there's a detailed parser spec.

> Then we’ll have to rewrite the SVG spec so that SVG has two  
> serializations (like we’re having to do with HTML5/XHTML5).

No, we can spec the text/html serialization in HTML5 and leave the SVG  
spec as it defines stuff above the DOM unchanged.

> It just seems to be going at it backwards, since SVG was designed  
> from the ground up as an XML technology.

And, yet, XML is consistently failing on the Web. XML is succeeding in  
enterprise system integration. But the moment people try to produce  
XHTML or RSS, it is revealed that XML is too hard for the kind of mass  
authoring that text/html works for.

> Must we rewrite all XML specifications into HTML5-style languages in  
> order to get inlining?  I don’t think so.

No, we just need to define text/html parsing and serialization for SVG  
and MathML.

> I guess I don’t fully understand why the HTML5 parser can’t just  
> have the ability to hand off the character stream to another parser  
> when it encounters some “special” elements like <svg> or <math>.   
> Why does everything have to be in the hands of the HTML5 parser?
> Upon encountering the characters “<svg “, the parser should back  
> up five characters and send the bytes to the browser’s parser that  
> handles content with the MIME type image/svg+xml.  When that parser  
> is complete, those elements can be injected into the DOM in the  
> proper namespace and the HTML5 parser can pick up after </svg>

That would lose the non-Draconianness property of text/html. Draconian  
error handling, namespaces and DTDs are the three major failings of XML.

> I see some problems, none of which seem insurmountable to me:
>    1. If the browser has no SVG parser, then what should happen?

SVG and MathML support should be made part of the HTML5 parsing  
algorithm, so the DOM output of the parser would be the same  
regardless of whether the layer above the DOM implements SVG or  
MathML. Obviously, we cannot change shipped browsers, so what happens  
in existing browsers can't be stated as "should happen".

> My proposal here is that all HTML5 browsers must also include a bog- 
> standard XML parser to handle inline content that is known to be  
> XML.  This should be pretty straightforward, since XML parsing is  
> actually much simpler than HTML5 parsing.

XML parsing when implemented fully (with DTD processing) is not  
simpler than HTML5 parsing. The XML tree builder is much simpler than  
the HTML5 tree builder but the XML tokenizer isn't simpler than the  
HTML5 tokenizer.

>    2. I hear that some browser don’t properly handle namespaced  
> content or colons or something.  Can someone clarify which  
> browsers?  Can someone further clarify what exactly the problems  
> are?  Can someone confirm if that browser will have fixed itself,  
> say, next year would we be good to go?

IE does its own thing in text/html with the colon. OTOH, other  
browsers don't do anything in particular with the colon in text/html  
and existing content expects that they don't. We cannot specify a  
colon-based mechanism that would cause spectacular breakage with  
existing content that expects browsers other than IE not to do  
anything special on the colon.

>    3. Maybe the biggest problem with this idea is defining what  
> happens in error scenarios - i.e. when the SVG is malformed, then at  
> what point does the SVG parser return the character stream back to  
> the HTML parser.  In other words, maybe the challenge here would be  
> in defining how parsers need to behave towards each other when  
> mixing MIME types.  Anybody have a suggestion here?  Is this the  
> deal-breaker?

Not a deal breaker. Suggestions already posted:

> As for namespace removal - why?

I recommend anyone who still thinks xmlns is a good idea to look at  
the 10th anniversary threads on xml-dev.
Even well-knows XML people say that namespaces are
  * "controversial"
  * "done badly"
  * presumably needing "fixing"

Also check out the comment from David Megginson over at Tim Bray's blog:

Considering that even XML folks admit Namespaces in XML is bad, it  
would be silly for us not to try to shield hapless HTML authors from  
the badness to the extent possible.

> Seriously just because it’s hard to remember it?

It's cruft we could live without.

> If we’re trying to get to a “cut-and-paste” environment for  
> some web authors, then they can just cut and paste the whole thing  
> (namespace definitions in <svg> element and all).

We definitely should allow xmlns attributes as talismans. I'm saying  
we shouldn't require them.

> Maybe it’s because I’m used to writing SVG, but I really don’t  
> have a problem with the concept of mixed namespace content.

I have a problem with namespace URIs every single time I need to deal  
with XHTML, SVG, etc. I always have to waste time looking up and URI  
to copy and paste because trying to go by memory and getting it wrong  
(which year? trailing slash?) would waste even more time.

> Sam’s off-the-cuff solution seems to favor even skipping the <svg>  
> element,

Which solution is that?

> which would seem to me to cause a mess of problems.  Where would you  
> define the viewBox?

On <svg>.

> Where would you define the version of the SVG language?

Nowhere, preferably.

> It seems like the belief that XHTML being a failure is a reflection  
> on XML-on-the-web in general.

RSS, too.

> In fact, all browsers except for IE can handle application/xhtml+xml  
> MIME type these days, so it really seems to me that the verdict’s  
> still out on whether XHTML is a good technology or not.

Even if XHTML is good technology as far as pure technology value goes  
with network effect considerations, XHTML flunks Technology Strategy  
101 by failing to plug into the existing the network of the text/html  
installed base.

> Some people out there still think that XML has a place on the web.   
> People like Shelley, who also shares her thoughts on SVG in text/ 
> html here.

I've seen the YSoD over at her site as well. (This isn't meant as a  
personal remark but as an anecdotal statistic; I've seen the YSoD in a  
lot of places, including

> I think we should explore relaxing the draconian error handling on  
> XML on the web, but I don’t agree with re-inventing changing XML  
> languages into HTML-style languages one after another.

We also need XML5.

(Quoting from email again.)
> Is what I'm asking at all possible?  Can I get a list of the problems
> this might cause ?

In general, it would introduce deliberate brittleness or inconsistency  
with the surrounding HTML where we could go for robustness and  
consistency with the surrounding HTML.

> My preference:
> <html ...>
> <body>
>  <svg xmlns=""
> xmlns:xlink="" ...>
>    <a xlink:href="foo.svg"><circle .../></a>
>  </svg>
> </body>
> </html>
> Another option:
> <html ... xmlns:svg=""
> xmlns:xlink="">
> <body>
>  <svg:svg ...>
>    <svg:a xlink:href="foo.svg"><svg:circle .../></svg:a>
>  </svg:svg>
> </body>
> </html>
> What other options are there?

<html ...>
    <a href="foo.svg"><circle .../></a>

(With xmlns and xlink: cruft permitted but not required.)

Henri Sivonen

Received on Monday, 10 March 2008 19:03:59 UTC