Re: More on SVG within HTML pages

On Sep 6, 2009, at 23:27, Shelley Powers wrote:

> I had an interesting twitter exchange with Henri[1], about the  
> validator.nu's handling of an HTML5 document with inline SVG.
>
> I have two documents, both HTML5, one served up as HTML[2], the  
> other as application/xhtml+xml[3].
>
> The HTML document throws several errors in validator.nu. One is the  
> presence of an SVG element in a paragraph. According to Henri, this  
> error came about because it's some form of warning since no browser  
> currently supports SVG in HTML.

No, according to me, it's "to discourage authors from using them  
before browsers are ready" (http://twitter.com/hsivonen/status/3797858288 
).

> However, the Firefox nightly will support SVG in HTML5, if you set  
> the html5.enabled configuration option to true. Regardless, this  
> isn't an error, but it's also not what concerns me.

Yes, I'm aware of that pref. :-)

Having to have a pref means that HTML5 parsing in Firefox isn't "ready".

> The SVG is valid SVG/XML, copied as one would find SVG in the wild.  
> It references several vocabularies with given namespaces, including  
> Dublin Core, Creative Commons, etc. All of the RDF annotation is  
> within an SVG metadata element. Again, nothing in what I described  
> is unusual.

(Agreed.)

> Henri's validator ignores the RDF/XML in the XHTML document, which  
> is fine. But the validator throws several errors related to the RDF/ 
> XML in the HTML document. When I asked him about it in Twitter, he  
> responded with, "Also, the dc:foo stuff is not even supposed to be  
> valid in text/html".

This is indeed so.

> Yet there's nothing that I could find in the HTML5 specification  
> that states this.

It's HTML5--not RSS. Everything that isn't allowed is forbidden.  
(Contrast with http://archive.scripting.com/2003/06/13#When:8:12:30AM .)

HTML5 defines how to parse text/html into a DOM. It also specifies  
above-DOM conformance requirements for elements in the http://www.w3.org/1999/xhtml 
  namespace. It doesn't specify conformance requirements for elements  
in the http://www.w3.org/2000/svg or http://www.w3.org/1998/Math/ 
MathML namespaces, so those are non-conforming as far as HTML5 itself  
goes. To make them conforming, you have to create an HTML5 + SVG 1.x +  
MathML y.0 profile by invoking the "other applicable specifications"  
extension point. Note that HTML5 defines requirements that apply to  
MathML and SVG if you do invoke this extension point but it does not  
require you to invoke this extension point.

In Validator.nu, I have opted to invoke the extension point for XHTML5  
using SVG 1.1 and MathML 2.0, because out of the top four browser  
engines, three target supporting the SVG 1.1 feature set (Gecko,  
WebKit and Opera; Opera also targets 1.2 Tiny) and two target the  
presentational part of MathML 2.0 (Gecko and Opera). (I brought in  
semantic MathML only because I was too lazy to split MathML according  
to implementation reality once I was importing part of it using an off- 
the-shelf third-party schema.)

For the time being, I have opted not to invoke the extension point for  
HTML5, although I intend to invoke it in due course.

Now, if one does invoke the extension point for HTML5 (the text/html  
serialization) with SVG 1.1, the SVG 1.1 spec governs what's allowed  
as descendants of the metadata element in the http://www.w3.org/2000/ 
svg namespace but HTML5 governs what's possible to have there.

The SVG 1.1 spec says:
"The contents of the 'metadata' should be elements from other XML  
namespaces, with these elements from these namespaces expressed in a  
manner conforming with the "Namespaces in XML" Recommendation [XML-NS]."
(http://www.w3.org/TR/SVG/metadata.html#MetadataElement)

So the SVG 1.1 spec allows other specs to extend the content model of  
<metadata> as long as the content model is extended in a manner  
conforming with the Namespaces REC as long as the content model  
extensions use *other* namespaces (presumably other than http://www.w3.org/2000/svg) 
.

(Note that validator developers need to calibrate the validation- 
sensitive meaning of MUST and SHOULD on a per-spec basis, because  
different spec writers use them differently.)

Now, let's consider what HTML5 makes possible to have there  
considering the requirement. The Namespaces REC rules out non-NC local  
names and namespaces other than http://www.w3.org/2000/svg. The HTML5  
parsing algorithm doesn't make it possible to satisfy this requirement.

If you write the tag <rdf:RDF> in text/html after the <metadata> start  
tag, you get an element whose local name is "rdf:rdf" whose namespace  
is http://www.w3.org/2000/svg. This element has a non-NCName local  
name, so it's not conforming per Namespaces so it's not a permitted  
extension per SVG 1.1. It's also in the http://www.w3.org/2000/svg  
namespace, which isn't permitted per SVG 1.1.

QED.

> More importantly, this is a HTML5 failure in waiting, because if  
> people inline SVG, chances are they will inline whatever SVG they  
> find in the wild, which may or may not include RDF/XML. Validly  
> include, may I add, in fact recommended when it comes to annotating  
> Creative Commons license info.

I agree. This problem has no good solutions, as far as I can tell.

  1) Leave RDF/XML-looking stuff non-conforming. Bad because copy- 
pasting leads to a lot of errors about stuff that browsers will  
ignore--just like they ignore the contents of <metadata> in XML.
  2) Perform full Namespace processing in <metadata> subtrees. Bad  
because this would introduce considerable complexity in order to  
shuffle around namespaces of stuff that browsers (and so far even  
validators!) end up ignoring. Adding a lot of complexity to tweak the  
DOM only so that it can be ignored doesn't make sense.
  3) Leaving the DOM building as-is but proclaiming the RDF/XML- 
looking stuff that infoset-wise isn't RDF/XML as conforming. Bad  
because it would make authors believe that they are actually using RDF/ 
XML and worse because if someone wanted to consume that data as RDF,  
they'd need to have dual code paths for text/html and XML (and the DOM  
Consistency Design Principle is all about avoiding that situation).

> I had assumed that the SVG would be turned over to the SVG parsing  
> engine for the browser, which would operate the same regardless of  
> whether the SVG is in an HTML document, or an XHTML document.

Nope.

> We know what's supposed to happen if crappy markup gets embedded in  
> SVG: the user agents are supposed to provide a facility that allows  
> the user to extract valid XML, which means they will have to correct  
> the crappy markup.

You'd apply the coercion to infoset algorithm, so <rdf:RDF> in text/ 
html would turn into <rdfU0003A0rdf xmlns="http://www.w3.org/2000/ 
svg"> in XML. It's equally useless in both, but the latter is  
namespace-well-formed.

> I decided to see what happens with a browser that actually supports  
> HTML5 with inline SVG. I downloaded the latest Firefox nightly and  
> enabled HTML5. I then added script to both the HTML and the XHTML  
> versions of the files that will access the red SVG circle in the  
> document, which doesn't uses the default SVG namespace, and also  
> access all dc:title elements. I then printed out several key values  
> related to HTML/XHTML differences when it comes to elements/ 
> attributes and namespaces.
>
> If you load the XHTML page in any SVG enabled browser,  and click  
> the circle, you'll see that attributes such as namespaceURI et al  
> are set. To be expected. Load the HTML document, though, in the FF  
> nightly, and again, you see what is expected in an HTML document at  
> this time, which doesn't acknowledge namespaces: the namespaceURI  
> value is set to the SVG's default namespace, the prefix is null, and  
> the localName is set to "dc:title".
>
> This is the DOM difference that Henri discusses. At the same time,  
> though, you can use the same functionality to access "dc:title",  
> regardless of whether the document is XHTML or HTML. In fact, the  
> DOM differences are pretty trivial -- no more than what we've had to  
> deal with when it comes to Ajax applications and differences with  
> XMLHttpRequest, or how we still have to manage differences in event  
> listeners -- test for a value, and act accordingly.

The differences may seem trivial now. However, from experience from  
dealing with the {}lang, {}xml:lang, {http://www.w3.org/XML/1998/namespace 
}lang mess, I can assure you that such trivial differences lead to  
bugs. See point #3 in the above list of bad solutions.

> My preferences would be to elegantly manage namespaces for both  
> XHTML and HTML in such a way that we don't have these DOM  
> differences. However, evidently this causes other problems, so I'm  
> not going to push the issue.

This is point #2 on the above list of bad solutions.

> Regardless, within the SVGDocument, we should allow XML without  
> throwing conformance errors.  Given that the Firefox nightly has no  
> problems with RDF/XML within the SVG element,

Firefox nigthlies also have "no problems" with other fantastic cruft  
one could throw at them. (However, the dc:foo stuff does make the  
nightlies run slightly more code per tag and allocate slightly more  
memory than they run or allocate per useful tag, so "no" is an  
approximation but a correct one to a useful precision.)

> and given that differences in the DOM are not significant between  
> the two,

I disagree. What could be more significant than having totally  
different namespaces and totally different local names, too?

> I don't know why we would not support "dc:foo" in SVG.

Support != making validator silent.

> In fact, I think not supporting valid XML within the SVG is  
> sufficient to trigger very serious concerns about support for inline  
> SVG in HTML, and hence very serious concerns about HTML5.

I disagree. What really matters is that SVG elements cause vector  
graphics to be drawn on screen.

> Perhaps what the validator should do is throw an informative  
> warning, telling the person that there are DOM differences between  
> how namespace is handled in an HTML document as compared to XHTML  
> documents.

I expect this is where this will end up even though it's a violation  
of our Design Principles. :-( However, this is only a problem when  
someone tries to actually consume the stuff that looks like RDF/XML  
but isn't.

> Still, I am hesitant about this, because this really only matters to  
> those who need to access the DOM, and that's not the majority of web  
> authors.

Well, it doesn't matter to authors if the author expectation is that  
<metadata> is a talisman anyway. :-/ It does matter to authors who'd  
actually care about the robust processability of their markup. And it  
that case, the author needs to know if the target is a real triple  
processor or a regexp hack.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 7 September 2009 08:22:37 UTC