- From: Shelley Powers <shelleyp@burningbird.net>
- Date: Mon, 07 Sep 2009 09:21:36 -0500
- To: Henri Sivonen <hsivonen@iki.fi>
- CC: public-html@w3.org
Henri Sivonen wrote: > On Sep 6, 2009, at 23:27, Shelley Powers wrote: > >> I had an interesting twitter exchange with Henri[1], about the >> validator.nu's handling of an HTML5 document with inline SVG. >> >> I have two documents, both HTML5, one served up as HTML[2], the other >> as application/xhtml+xml[3]. >> >> The HTML document throws several errors in validator.nu. One is the >> presence of an SVG element in a paragraph. According to Henri, this >> error came about because it's some form of warning since no browser >> currently supports SVG in HTML. > > No, according to me, it's "to discourage authors from using them > before browsers are ready" > (http://twitter.com/hsivonen/status/3797858288). > That's not the place of a validator. If people want to try a syntax before browsers are ready, it should be their prerogative. By providing an incorrect error message, you are confusing people, not protecting them. A validator should be nothing more than a way to check syntax and usage. It most definitely should not be a nanny. >> However, the Firefox nightly will support SVG in HTML5, if you set >> the html5.enabled configuration option to true. Regardless, this >> isn't an error, but it's also not what concerns me. > > Yes, I'm aware of that pref. :-) > > Having to have a pref means that HTML5 parsing in Firefox isn't "ready". > Does not matter. >> The SVG is valid SVG/XML, copied as one would find SVG in the wild. >> It references several vocabularies with given namespaces, including >> Dublin Core, Creative Commons, etc. All of the RDF annotation is >> within an SVG metadata element. Again, nothing in what I described is >> unusual. > > (Agreed.) > >> Henri's validator ignores the RDF/XML in the XHTML document, which is >> fine. But the validator throws several errors related to the RDF/XML >> in the HTML document. When I asked him about it in Twitter, he >> responded with, "Also, the dc:foo stuff is not even supposed to be >> valid in text/html". > > This is indeed so. > >> Yet there's nothing that I could find in the HTML5 specification that >> states this. > > It's HTML5--not RSS. Everything that isn't allowed is forbidden. > (Contrast with http://archive.scripting.com/2003/06/13#When:8:12:30AM .) > And it's SVG, not HTML5. Validate the HTML5 all you want, but leave the contents of the SVG that are not specific to HTML5 alone. Just ignore it. Simple to do: anything between opening and closing SVG tags that isn't HTML5, doesn't exist to the HTML5 validator. > HTML5 defines how to parse text/html into a DOM. It also specifies > above-DOM conformance requirements for elements in the > http://www.w3.org/1999/xhtml namespace. It doesn't specify conformance > requirements for elements in the http://www.w3.org/2000/svg or > http://www.w3.org/1998/Math/MathML namespaces, so those are > non-conforming as far as HTML5 itself goes. To make them conforming, > you have to create an HTML5 + SVG 1.x + MathML y.0 profile by invoking > the "other applicable specifications" extension point. Note that HTML5 > defines requirements that apply to MathML and SVG if you do invoke > this extension point but it does not require you to invoke this > extension point. > The SVG specification defines requirements for SVG. It is not HTML5. As for parsing into the DOM, how it was parsed in HTML5 is exactly as I would have expected it to be parsed: default namespace of container element, no prefix, each namespaced element treated as prefix:element, as in dc:title. I haven't tried it against _every_ browser, but this is a default, a de facto standard. Yes, the DOM differs between the XHTML and HTML implementations, but that's because namespace is different between the two. It is not an insurmountable problem. In fact, dealing with such differences is old hand to JavaScript developers. It may violate design principle purity, but it maps to the real world. And according to you, this should trump design principle purity. > In Validator.nu, I have opted to invoke the extension point for XHTML5 > using SVG 1.1 and MathML 2.0, because out of the top four browser > engines, three target supporting the SVG 1.1 feature set (Gecko, > WebKit and Opera; Opera also targets 1.2 Tiny) and two target the > presentational part of MathML 2.0 (Gecko and Opera). (I brought in > semantic MathML only because I was too lazy to split MathML according > to implementation reality once I was importing part of it using an > off-the-shelf third-party schema.) > Again, I was not aware that the validator explicitly disables validity based on browser implementation. That, to me, violates everything I understand about a markup validator. It's your baby, but I hope that the W3C stops incorporating it. > For the time being, I have opted not to invoke the extension point for > HTML5, although I intend to invoke it in due course. > > Now, if one does invoke the extension point for HTML5 (the text/html > serialization) with SVG 1.1, the SVG 1.1 spec governs what's allowed > as descendants of the metadata element in the > http://www.w3.org/2000/svg namespace but HTML5 governs what's possible > to have there. > Everything between the beginning and ending SVG tags should be under the province of the SVG validators. It is not HTML5 > The SVG 1.1 spec says: > "The contents of the 'metadata' should be elements from other XML > namespaces, with these elements from these namespaces expressed in a > manner conforming with the "Namespaces in XML" Recommendation [XML-NS]." > (http://www.w3.org/TR/SVG/metadata.html#MetadataElement) > > So the SVG 1.1 spec allows other specs to extend the content model of > <metadata> as long as the content model is extended in a manner > conforming with the Namespaces REC as long as the content model > extensions use *other* namespaces (presumably other than > http://www.w3.org/2000/svg). > > (Note that validator developers need to calibrate the > validation-sensitive meaning of MUST and SHOULD on a per-spec basis, > because different spec writers use them differently.) > > Now, let's consider what HTML5 makes possible to have there > considering the requirement. The Namespaces REC rules out non-NC local > names and namespaces other than http://www.w3.org/2000/svg. The HTML5 > parsing algorithm doesn't make it possible to satisfy this requirement. > > If you write the tag <rdf:RDF> in text/html after the <metadata> start > tag, you get an element whose local name is "rdf:rdf" whose namespace > is http://www.w3.org/2000/svg. This element has a non-NCName local > name, so it's not conforming per Namespaces so it's not a permitted > extension per SVG 1.1. It's also in the http://www.w3.org/2000/svg > namespace, which isn't permitted per SVG 1.1. > > QED. > The HTML5 specification states in section 9.1.2 that allowable elements include the foreign elements for MathML and SVG. SVG, itself, via its own specification, allows other elements as long as they are properly namespaced. The HTML5 specification needs to treat SVG like the truly foreign object that it is, which means allowing the SVG specification to control what happens between the opening and closing SVG tags. Either that or we cripple the SVG to the point of uselessness. I am aware that how markup is handled when the container is HTML differs from how the markup is handled when the container is either the SVG only, or within XHTML. The resulting DOM differs. But the resulting DOM differing doesn't violate the namespace requirements for the content in SVG. That content is still properly namespaced, the contents still valid SVG. The requirement that you linked in the SVG document is directed at the SVG author, not an SVG implementor. One could just as easily implement an SVG parser that turns all circles into rectangles, but that still does not impact on the validity of the original SVG. Nothing is violated. >> More importantly, this is a HTML5 failure in waiting, because if >> people inline SVG, chances are they will inline whatever SVG they >> find in the wild, which may or may not include RDF/XML. Validly >> include, may I add, in fact recommended when it comes to annotating >> Creative Commons license info. > > I agree. This problem has no good solutions, as far as I can tell. > > 1) Leave RDF/XML-looking stuff non-conforming. Bad because > copy-pasting leads to a lot of errors about stuff that browsers will > ignore--just like they ignore the contents of <metadata> in XML. > 2) Perform full Namespace processing in <metadata> subtrees. Bad > because this would introduce considerable complexity in order to > shuffle around namespaces of stuff that browsers (and so far even > validators!) end up ignoring. Adding a lot of complexity to tweak the > DOM only so that it can be ignored doesn't make sense. > 3) Leaving the DOM building as-is but proclaiming the RDF/XML-looking > stuff that infoset-wise isn't RDF/XML as conforming. Bad because it > would make authors believe that they are actually using RDF/XML and > worse because if someone wanted to consume that data as RDF, they'd > need to have dual code paths for text/html and XML (and the DOM > Consistency Design Principle is all about avoiding that situation). > Or a fourth option is that you focus on validating the HTMl5, and let the SVG be validated on its own. Remember that crappy markup in a standalone SVG document is its own validation, as the contents won't display. As for the RDF/XML within the SVG, again, it's not HTML5. Focus on HTML5, leave the rest be. >> I had assumed that the SVG would be turned over to the SVG parsing >> engine for the browser, which would operate the same regardless of >> whether the SVG is in an HTML document, or an XHTML document. > > Nope. > >> We know what's supposed to happen if crappy markup gets embedded in >> SVG: the user agents are supposed to provide a facility that allows >> the user to extract valid XML, which means they will have to correct >> the crappy markup. > > You'd apply the coercion to infoset algorithm, so <rdf:RDF> in > text/html would turn into <rdfU0003A0rdf > xmlns="http://www.w3.org/2000/svg"> in XML. It's equally useless in > both, but the latter is namespace-well-formed. > >> I decided to see what happens with a browser that actually supports >> HTML5 with inline SVG. I downloaded the latest Firefox nightly and >> enabled HTML5. I then added script to both the HTML and the XHTML >> versions of the files that will access the red SVG circle in the >> document, which doesn't uses the default SVG namespace, and also >> access all dc:title elements. I then printed out several key values >> related to HTML/XHTML differences when it comes to >> elements/attributes and namespaces. >> >> If you load the XHTML page in any SVG enabled browser, and click the >> circle, you'll see that attributes such as namespaceURI et al are >> set. To be expected. Load the HTML document, though, in the FF >> nightly, and again, you see what is expected in an HTML document at >> this time, which doesn't acknowledge namespaces: the namespaceURI >> value is set to the SVG's default namespace, the prefix is null, and >> the localName is set to "dc:title". >> >> This is the DOM difference that Henri discusses. At the same time, >> though, you can use the same functionality to access "dc:title", >> regardless of whether the document is XHTML or HTML. In fact, the DOM >> differences are pretty trivial -- no more than what we've had to deal >> with when it comes to Ajax applications and differences with >> XMLHttpRequest, or how we still have to manage differences in event >> listeners -- test for a value, and act accordingly. > > The differences may seem trivial now. However, from experience from > dealing with the {}lang, {}xml:lang, > {http://www.w3.org/XML/1998/namespace}lang mess, I can assure you that > such trivial differences lead to bugs. See point #3 in the above list > of bad solutions. People probably file bugs with browser makers that no longer support BLINK. One can't be constrained by the possibility of erroneous bug submittals. We can't force perfection, by eliminating every possible way people can do something incorrect. We document the differences, carefully, clearly, and then we release the spec into the wild, where people will do what they always do: get some things right, get some things wrong, and eventually figure out which is which. >> My preferences would be to elegantly manage namespaces for both XHTML >> and HTML in such a way that we don't have these DOM differences. >> However, evidently this causes other problems, so I'm not going to >> push the issue. > > This is point #2 on the above list of bad solutions. > >> Regardless, within the SVGDocument, we should allow XML without >> throwing conformance errors. Given that the Firefox nightly has no >> problems with RDF/XML within the SVG element, > > Firefox nigthlies also have "no problems" with other fantastic cruft > one could throw at them. (However, the dc:foo stuff does make the > nightlies run slightly more code per tag and allocate slightly more > memory than they run or allocate per useful tag, so "no" is an > approximation but a correct one to a useful precision.) > >> and given that differences in the DOM are not significant between the >> two, > > I disagree. What could be more significant than having totally > different namespaces and totally different local names, too? > It is trivial. It is something easily tested, well known, and can be documented, and worked around in JS libraries and applications. We deal with far worse issues with event handling in JS today, then we ever would when it comes to how the DOM differs when SVG is embedded in HTML, as compared to XHTML. More importantly: the author conformance requirements for namespaced elements in SVG is still met, it is still valid SVG. The fact that implementations handle it differently doesn't abrogate this. >> I don't know why we would not support "dc:foo" in SVG. > > Support != making validator silent. > One simple elegant warning that differences will exist in the DOM is all that's needed to still fulfill the nanny function in the validator, without intimidating people into not using correct and valid SVG. One warning at the first occurrence, if you must, with perhaps a link to a SVG validator for the person to validate the SVG separate from the HTML5. What you have now, basically is worse than no support for SVG in HTML5 at all. it is erroneous handling. It is misleading. It is not correct. It is, basically, punitive. >> In fact, I think not supporting valid XML within the SVG is >> sufficient to trigger very serious concerns about support for inline >> SVG in HTML, and hence very serious concerns about HTML5. > > I disagree. What really matters is that SVG elements cause vector > graphics to be drawn on screen. > Then leave the SVG be. Let the browsers create the graphics. Treat the other content in the spec as white noise. Give the one simple warning about DOM differences, and then go on to other things. >> Perhaps what the validator should do is throw an informative warning, >> telling the person that there are DOM differences between how >> namespace is handled in an HTML document as compared to XHTML documents. > > I expect this is where this will end up even though it's a violation > of our Design Principles. :-( However, this is only a problem when > someone tries to actually consume the stuff that looks like RDF/XML > but isn't. > The design principles themselves are a violation of design principles. The inconsistent application of the so-called design principles undermined their usefulness a long time ago. Now, the design principles document is just so many words, and yet another point of contention. As for the RDF/XML, aren't you glad it isn't HTML5 then? You don't have to worry about it, it's some other validator's problem. >> Still, I am hesitant about this, because this really only matters to >> those who need to access the DOM, and that's not the majority of web >> authors. > > Well, it doesn't matter to authors if the author expectation is that > <metadata> is a talisman anyway. :-/ It does matter to authors who'd > actually care about the robust processability of their markup. And it > that case, the author needs to know if the target is a real triple > processor or a regexp hack. > We authors are quite adaptable. What you seem to perceive to be a mountain, we perceive to be a molehill. We have had to work through more challenging differences. What matters is careful noting of differences, and good documentation. Given this, we will contrive. In the case of the metadata section, that is annotation anyway. It may contain the graphic artist name, which should never be removed. It can also contain Creative Commons license, which cannot be removed, without violation of those licenses. The only recourse then would be that the SVG could not be used in an HTML5 document. Or the poor soul who uses it gets possibly 100's of intimidating, ugly warnings and errors in the validator. Considering that most public domain and free to use SVG has this type of annotation, people are effectively blocked from using the majority of SVG in their HTML web pages. That, or face some pretty ugly results in the validator. The same with Inkscape markup for drawings. That data is embedded in the document as information for Inkscape when the document is opened again in the tool. It has no impact on HTML5, or the SVG. Except that as it stands now, it generates 100s of intimidating and horrible looking error messages and warnings in the validator. And the irony is: it's perfectly valid in SVG. A compromise: one simple, non-intimidating warning when the first namespaced entity is parsed, telling the people that the HTML5 cannot validate the contents of the SVG that is not HTML5, and that some differences in the DOM could result (and a link to more detailed information) should be sufficient to meet all needs. Just one per page -- people don't need any more and providing more will just confuse them, and bury the information you're trying to convey. And no error when using SVG in an HTML document, please. Validate the markup, don't buffer people. Please. Shelley
Received on Monday, 7 September 2009 14:22:29 UTC