- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 7 Sep 2009 11:21:55 +0300
- To: Shelley Powers <shelleyp@burningbird.net>
- Cc: public-html@w3.org
On Sep 6, 2009, at 23:27, Shelley Powers wrote: > I had an interesting twitter exchange with Henri[1], about the > validator.nu's handling of an HTML5 document with inline SVG. > > I have two documents, both HTML5, one served up as HTML[2], the > other as application/xhtml+xml[3]. > > The HTML document throws several errors in validator.nu. One is the > presence of an SVG element in a paragraph. According to Henri, this > error came about because it's some form of warning since no browser > currently supports SVG in HTML. No, according to me, it's "to discourage authors from using them before browsers are ready" (http://twitter.com/hsivonen/status/3797858288 ). > However, the Firefox nightly will support SVG in HTML5, if you set > the html5.enabled configuration option to true. Regardless, this > isn't an error, but it's also not what concerns me. Yes, I'm aware of that pref. :-) Having to have a pref means that HTML5 parsing in Firefox isn't "ready". > The SVG is valid SVG/XML, copied as one would find SVG in the wild. > It references several vocabularies with given namespaces, including > Dublin Core, Creative Commons, etc. All of the RDF annotation is > within an SVG metadata element. Again, nothing in what I described > is unusual. (Agreed.) > Henri's validator ignores the RDF/XML in the XHTML document, which > is fine. But the validator throws several errors related to the RDF/ > XML in the HTML document. When I asked him about it in Twitter, he > responded with, "Also, the dc:foo stuff is not even supposed to be > valid in text/html". This is indeed so. > Yet there's nothing that I could find in the HTML5 specification > that states this. It's HTML5--not RSS. Everything that isn't allowed is forbidden. (Contrast with http://archive.scripting.com/2003/06/13#When:8:12:30AM .) HTML5 defines how to parse text/html into a DOM. It also specifies above-DOM conformance requirements for elements in the http://www.w3.org/1999/xhtml namespace. It doesn't specify conformance requirements for elements in the http://www.w3.org/2000/svg or http://www.w3.org/1998/Math/ MathML namespaces, so those are non-conforming as far as HTML5 itself goes. To make them conforming, you have to create an HTML5 + SVG 1.x + MathML y.0 profile by invoking the "other applicable specifications" extension point. Note that HTML5 defines requirements that apply to MathML and SVG if you do invoke this extension point but it does not require you to invoke this extension point. In Validator.nu, I have opted to invoke the extension point for XHTML5 using SVG 1.1 and MathML 2.0, because out of the top four browser engines, three target supporting the SVG 1.1 feature set (Gecko, WebKit and Opera; Opera also targets 1.2 Tiny) and two target the presentational part of MathML 2.0 (Gecko and Opera). (I brought in semantic MathML only because I was too lazy to split MathML according to implementation reality once I was importing part of it using an off- the-shelf third-party schema.) For the time being, I have opted not to invoke the extension point for HTML5, although I intend to invoke it in due course. Now, if one does invoke the extension point for HTML5 (the text/html serialization) with SVG 1.1, the SVG 1.1 spec governs what's allowed as descendants of the metadata element in the http://www.w3.org/2000/ svg namespace but HTML5 governs what's possible to have there. The SVG 1.1 spec says: "The contents of the 'metadata' should be elements from other XML namespaces, with these elements from these namespaces expressed in a manner conforming with the "Namespaces in XML" Recommendation [XML-NS]." (http://www.w3.org/TR/SVG/metadata.html#MetadataElement) So the SVG 1.1 spec allows other specs to extend the content model of <metadata> as long as the content model is extended in a manner conforming with the Namespaces REC as long as the content model extensions use *other* namespaces (presumably other than http://www.w3.org/2000/svg) . (Note that validator developers need to calibrate the validation- sensitive meaning of MUST and SHOULD on a per-spec basis, because different spec writers use them differently.) Now, let's consider what HTML5 makes possible to have there considering the requirement. The Namespaces REC rules out non-NC local names and namespaces other than http://www.w3.org/2000/svg. The HTML5 parsing algorithm doesn't make it possible to satisfy this requirement. If you write the tag <rdf:RDF> in text/html after the <metadata> start tag, you get an element whose local name is "rdf:rdf" whose namespace is http://www.w3.org/2000/svg. This element has a non-NCName local name, so it's not conforming per Namespaces so it's not a permitted extension per SVG 1.1. It's also in the http://www.w3.org/2000/svg namespace, which isn't permitted per SVG 1.1. QED. > More importantly, this is a HTML5 failure in waiting, because if > people inline SVG, chances are they will inline whatever SVG they > find in the wild, which may or may not include RDF/XML. Validly > include, may I add, in fact recommended when it comes to annotating > Creative Commons license info. I agree. This problem has no good solutions, as far as I can tell. 1) Leave RDF/XML-looking stuff non-conforming. Bad because copy- pasting leads to a lot of errors about stuff that browsers will ignore--just like they ignore the contents of <metadata> in XML. 2) Perform full Namespace processing in <metadata> subtrees. Bad because this would introduce considerable complexity in order to shuffle around namespaces of stuff that browsers (and so far even validators!) end up ignoring. Adding a lot of complexity to tweak the DOM only so that it can be ignored doesn't make sense. 3) Leaving the DOM building as-is but proclaiming the RDF/XML- looking stuff that infoset-wise isn't RDF/XML as conforming. Bad because it would make authors believe that they are actually using RDF/ XML and worse because if someone wanted to consume that data as RDF, they'd need to have dual code paths for text/html and XML (and the DOM Consistency Design Principle is all about avoiding that situation). > I had assumed that the SVG would be turned over to the SVG parsing > engine for the browser, which would operate the same regardless of > whether the SVG is in an HTML document, or an XHTML document. Nope. > We know what's supposed to happen if crappy markup gets embedded in > SVG: the user agents are supposed to provide a facility that allows > the user to extract valid XML, which means they will have to correct > the crappy markup. You'd apply the coercion to infoset algorithm, so <rdf:RDF> in text/ html would turn into <rdfU0003A0rdf xmlns="http://www.w3.org/2000/ svg"> in XML. It's equally useless in both, but the latter is namespace-well-formed. > I decided to see what happens with a browser that actually supports > HTML5 with inline SVG. I downloaded the latest Firefox nightly and > enabled HTML5. I then added script to both the HTML and the XHTML > versions of the files that will access the red SVG circle in the > document, which doesn't uses the default SVG namespace, and also > access all dc:title elements. I then printed out several key values > related to HTML/XHTML differences when it comes to elements/ > attributes and namespaces. > > If you load the XHTML page in any SVG enabled browser, and click > the circle, you'll see that attributes such as namespaceURI et al > are set. To be expected. Load the HTML document, though, in the FF > nightly, and again, you see what is expected in an HTML document at > this time, which doesn't acknowledge namespaces: the namespaceURI > value is set to the SVG's default namespace, the prefix is null, and > the localName is set to "dc:title". > > This is the DOM difference that Henri discusses. At the same time, > though, you can use the same functionality to access "dc:title", > regardless of whether the document is XHTML or HTML. In fact, the > DOM differences are pretty trivial -- no more than what we've had to > deal with when it comes to Ajax applications and differences with > XMLHttpRequest, or how we still have to manage differences in event > listeners -- test for a value, and act accordingly. The differences may seem trivial now. However, from experience from dealing with the {}lang, {}xml:lang, {http://www.w3.org/XML/1998/namespace }lang mess, I can assure you that such trivial differences lead to bugs. See point #3 in the above list of bad solutions. > My preferences would be to elegantly manage namespaces for both > XHTML and HTML in such a way that we don't have these DOM > differences. However, evidently this causes other problems, so I'm > not going to push the issue. This is point #2 on the above list of bad solutions. > Regardless, within the SVGDocument, we should allow XML without > throwing conformance errors. Given that the Firefox nightly has no > problems with RDF/XML within the SVG element, Firefox nigthlies also have "no problems" with other fantastic cruft one could throw at them. (However, the dc:foo stuff does make the nightlies run slightly more code per tag and allocate slightly more memory than they run or allocate per useful tag, so "no" is an approximation but a correct one to a useful precision.) > and given that differences in the DOM are not significant between > the two, I disagree. What could be more significant than having totally different namespaces and totally different local names, too? > I don't know why we would not support "dc:foo" in SVG. Support != making validator silent. > In fact, I think not supporting valid XML within the SVG is > sufficient to trigger very serious concerns about support for inline > SVG in HTML, and hence very serious concerns about HTML5. I disagree. What really matters is that SVG elements cause vector graphics to be drawn on screen. > Perhaps what the validator should do is throw an informative > warning, telling the person that there are DOM differences between > how namespace is handled in an HTML document as compared to XHTML > documents. I expect this is where this will end up even though it's a violation of our Design Principles. :-( However, this is only a problem when someone tries to actually consume the stuff that looks like RDF/XML but isn't. > Still, I am hesitant about this, because this really only matters to > those who need to access the DOM, and that's not the majority of web > authors. Well, it doesn't matter to authors if the author expectation is that <metadata> is a talisman anyway. :-/ It does matter to authors who'd actually care about the robust processability of their markup. And it that case, the author needs to know if the target is a real triple processor or a regexp hack. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 7 September 2009 08:22:37 UTC