- From: Maciej Stachowiak <mjs@apple.com>
- Date: Sun, 21 Mar 2010 06:53:28 -0700
- To: Sam Ruby <rubys@intertwingly.net>
- Cc: HTMLwg WG <public-html@w3.org>
On Mar 21, 2010, at 5:22 AM, Sam Ruby wrote: > On 03/21/2010 03:20 AM, Maciej Stachowiak wrote: >> >> I've decided that it's worthwhile to review the HTML5 conformance >> errors >> reported on notable sites in more details. I started the following >> wiki >> page to collect data: >> >> http://www.w3.org/html/wg/wiki/HTML5_Authoring_Conformance_Study >> >> Thanks to Aryeh Gregor and myself, we now have a full >> classification of >> HTML5 conformance errors on the Alexa Top 10. Thanks also to Sam Ruby >> for his blog post that inspired the set of sites chosen and links to >> similar data in raw form. If anyone would like to help with gathering >> the data for the remaining sites, it would be much appreciated. The >> methodology is documented on the wiki page. > > "full"? Not hardly. <grin> Note that this list (so far) is only attempt to classify the categories that validators fall into. It's not an an attempt to justify them. I found that in a number of cases, I personally had no idea why something was disallowed. > I still remain deeply concerned about a "Ready? Fire! Aim?" > approach to solving these problems. The first thing that needs to > be done is to decide on what problems does Authhor Conformance > Requirements address, and how does the having them makes things > better? In short, we would be best served by requiring a change > proposal for such things. If we were at the start of the project, that would be a fine approach. As things stand now, I would personally prefer not to spend several additional years on getting authoring conformance requirements just right. It's true that if we could get consensus on removing them all and replacing them with nothing all the way to REC, then that might save time on the whole. But I would be highly surprised if we could quickly get consensus on such a radical approach. Consider: validator.w3.org would become a tautology machine. It seems like a tough sell to get that through Last Call. In the course of reviewing these errors, I concluded that there is at least one other good reason for document conformance errors besides interoperability. Namely, situations where it is likely the author has made a mistake that may have unintended consequences, even if those consequences are 100% consistent between user agents. For example, I think duplicate IDs are a legitimate error. Even if they don't break the page in an obvious way, they will have surprising effects the moment you call getElementById or attempt to use them as fragment identifiers. It seems reasonable to me that every conformance checker should be required to report that error, at least unless it is specially configured to silence it. > Meanwhile, I've selected one issue each from the top ten list to > explore further here. In your comments below, some point out errors in the Wiki page, which I have endeavored to fix. Others question the motivation for particular conformance requirements, so I didn't change the classification for those, though in some cases I had a comment about likely reasoning. > > google.com: > > the script tag is not unclosed, the html and body tags are unclosed. > HTML5 has many elements which do not require close tags. It even > has many tags that are entirely optional. Both of these tags are > entirely optional, but apparently if present must be explicitly > closed. What operational interop problem does this solve? Actually, the close tags for html and body are both optional. On closer review of the markup, I believe the unclosed tag is <center> (there are two <center> open tags but only one close.) The validator error message could clearly be improved here at the very least. Fixed in the wiki page. > > facebook.com: > > How is this a "bad doctype"? What operational interop problem does > it solve to identify this doctype as non-conforming? I thought the > HTML5 strategy was that the web is to be considered as non-versioned. The XHTML 1.0 Strict doctype is actually allowed in general - it's not flagged as an error on other pages that use it. I believe the validator is complaining about the newline in the doctype string - that's the only difference I can find compared to the msn.com doctype which is not flagged as an error. > > yahoo.com: > > y-pkgid could arguably conform to "proposal Y" for issue-41. > Allowing "modid" would both inhibit the ability of the validator to > catch misspellings, and the ability for future versions of the spec > to define new attributes. This seems to hint at a possible third reason for conformance requirements besides interoperability and catching likely authoring errors: protecting future ability to evolve the language. Splitting out custom attributes with a hyphen seems reasonable if we are considering treating them differently. I updated the wiki page to reflect that. I believe that in this particular case, a data-* attribute would be an appropriate replacement for modid and y-pkgid though. > > youtube.com: > > What interop issues are solved by disallowing div elements inside of > span elements? > > live.com: > > This issue has already been widely discussed. Additional > information can be found here: http://philip.html5.org/data/xmlns-bindings.txt Philip just pointed me to a newer data set: http://philip.html5.org/data/xmlns-attributes.txt Added a link to the wiki page. > > wikipedia.org: > > While there are no errors, there is a warning, and getting the > definition of IRI correct is definitely something that is relevant > to HTML5. The warning is not mandated by HTML5 as far as I know. So it seems irrelevant to discussion of HTML5 conformance requirements. > > blogger.com: > > What interop issues are solved by disallowing blank targets? I don't think this is an interop issue, but it does seem like a likely author oversight, since target="" has no effect. > > baidu.com: > > What interop issues are solved by requiring script elements to come > before </body>? I don't know about <script> itself, but for any element with visible rendering, putting it after </body> is likely not to give the results the author intended. This particular <script> uses document.write. In this case, it writes another <script> tag, but in general document.write could end up putting arbitrary content after </body>. > > msn.com: > > Separate issues: whitespaces within query and whitespace either > before or after the IRI. It looks like there is only one case of trailing whitespace (mysteriously reported as whitespace in query) and none of leading. The rest all seemed to be internal whitespace. I split the two categories. On qq.com all the URL errors were internal whitespace. > > qq.com: > > I realize that X-UA-compatible is controversial, but non-conforming? It seems like this actually *does* meet your standard of creating an interop problem, in that it is a vendor-specific feature which invokes a nonstandard rendering mode. However, it is being used on 3 of the top 10 sites. Regards, Maciej
Received on Sunday, 21 March 2010 13:54:02 UTC