- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 31 Aug 2008 20:23:58 +0300
- To: HTML WG <public-html@w3.org>
On Aug 30, 2008, at 10:58, Henri Sivonen wrote: > I reran the numbers for pages that > 1) did not trigger the quirks mode > AND > 2) had zero parse errors (still ignoring tree builder-level doctype > errors) > AND > 3) had at least one validation error First, some philosophical assumptions underlying my conclusions: 1) Time is a precious resource for people. Therefore, wasting people's time is bad. 2) A validator should primarily be a tool that authors use to help themselves in their authoring task. The primary purpose of a validator is not imposing a particular code aesthetic onto other people. With legacy language features it becomes problematic to help people not waste their time. If an author is writing a new HTML page, the author's time is wasted if we make a useless piece of syntax conforming and pundits convince the author in to use the useless syntax. In this sense, conforming no-op syntax isn't harmless. On the other hand, if an author has existing HTML templates and then adds a new HTML5 feature (<video>) to his/her site and starts using an HTML5 validator as a quality assurance tool, since an HTML5 validator recognizes the new feature, the author's time is wasted if the validator spews a lot of errors about legacy language features that are interoperably implemented and don't really cause harm beyond wasted bandwidth (and perhaps slightly lower maintainability). More concretely, it wastes people's time if experts advise people to write <style type=text/css> instead of <style> but it also wastes people's time if a validator tells people to take type=text/css out when it already has been written. I don't have a good solution for this problem at this time. However, I think that's safe to say that the approach HTML 4 and XHTML 1.0 took isn't the solution. Those specs defined two conformance targets: practical aka. Transitional and wishful aka. Strict. It turned out that most people who care about validation aim for the more permissive conformance target. Also, some people spent a lot of time coding around the strictness of Strict, which, I'd argue, was in some cases a big waste of time and, therefore, bad. Yet, it appears that when the more permissive conformance target doesn't forbid the things that people want to do (with the notable exception of <embed>), in a decade part of the HTML output out there does converge towards to more permissive conformance target. Anyway, I'd like to make the conformance definition of HTML5 not waste the time of people who are upgrading from a previous level of HTML. There will be authors using the <video> element pretty soon. For them, the <video> element will be a killer feature that matters more than HTML 4.01 or XHTML 1.x validation. At that point, they should be able to turn to an HTML5 tool so that the tool is useful for them and doesn't waste their time. For this scenario, it doesn't make sense to make HTML5 conformance definition something that maybe 30% of HTML output has converged on after a decade. (Things grouped together a bit below.) > 0.1142 The internal character encoding declaration must be the first > child of the “head” element. I think we should go back to requiring the declaration to occur within the first 512 bytes. Whether it has non-ASCII before it doesn't matter in that case even for streaming implementation that perform a prescan on the first 512 bytes. The old definition is theoretically ugly, but it seems to be more practical for everyone except validator writers and for me as a validator writer it's sunk cost already. > 0.1001 Attribute “border” not allowed on element “img” at this point. It seems to me that Gecko's and Trident's default image border is extremely unpopular among authors, and making border=0 non-conforming is unhelpful, too. I reiterate my suggestion to make border=0 conforming. > 0.1013 Attribute “cellspacing” not allowed on element “table” at > this point. > 0.0951 Attribute “cellpadding” not allowed on element “table” at > this point. > 0.0935 Attribute “border” not allowed on element “table” at this > point. > 0.0924 Attribute “width” not allowed on element “table” at this point. > 0.0779 Attribute “valign” not allowed on element “td” at this point. > 0.0759 Attribute “width” not allowed on element “td” at this point. > 0.0451 Attribute “height” not allowed on element “td” at this point. > 0.0365 Attribute “align” not allowed on element “table” at this point. > 0.0273 Attribute “height” not allowed on element “table” at this > point. It's clear by now that the layout model offered by HTML tables is something that authors find useful. Using layout tables in HTML and using CSS is not an either-or choice. Since people who use CSS for some things still use layout tables, this is an indication that the CSS language or its incumbent implementations don't make it easy to make that kind of layouts that authors use tables for. Realistically, it will take many years for CSS grid layout to be as deployable by authors as HTML layout tables are today. Moreover, the current installed base of browsers doesn't make CSS table layout a viable alternative for HTML table layout. Chances are that this won't change until the computers that came with Windows XP pre-installed have been disposed of. Chances are that there will be demand for validating HTML5 language features before then. Considering the above, it seems unhelpful for HTML5 to take the position that layout tables are not conforming. (Aside: The accessibility argument against layout tables is moot. Layout tables are so abundant out there that accessibility technology must deal with them anyhow.) > 0.0793 Attribute “language” not allowed on element “script” at this > point. <script language=JavaScript> as harmless and useless as <script type=text/javascript>. > 0.0638 Attribute “align” not allowed on element “td” at this point. I think this one isn't like the other "presentational" table attributes. The alignment of table cells is often tightly coupled with the kind of content the cells have. Moreover, its structure and presentation are truly separated it should be possible to write a style sheet ahead of time for a given set of content features. Here a content feature can be something like "multi- paragraph blockquotes" or "tables with both numbers and text in them". However, intuitively, "tables with numbers in the fifth column" is too specific to be a generic content feature that a style sheet is written to support. If you need to tweak your CSS and class attributes whenever you make a table with a new column mix, structure and presentation are not really being separated. Once you get there, why not encode the alignment in HTML? > 0.0609 Attribute “size” not allowed on element “input” at this point. This HTML feature doesn't have a convenient CSS alternative that were deployable today considering the existing installed base browsers. I think we should just make this attribute conforming. > 0.0529 Attribute “align” not allowed on element “div” at this point. > 0.0282 Attribute “align” not allowed on element “p” at this point. > 0.0372 Attribute “align” not allowed on element “img” at this point. Wow. It would be interesting to examine the use cases for aligning divs and paragraphs. I'd be interested to know if the popularity of the align attribute has something to do with legacy RTL authoring habits. > 0.0401 Bad value (consolidated) for attribute “http-equiv” on > element “meta”. I don't know what values these are, but I hadn't implemented Content- Language yet. > 0.0386 Attribute “name” not allowed on element “a” at this point. That one just refuses to go away. :-( > 0.0354 The “font” element is obsolete. > 0.0208 Attribute “color” not allowed on element “font” at this point. <font color> is the simplest way to map color-coded text from a WYSIWYG editor to HTML. Would <span style='color:red;'> be any better for color-based emphasis or annotations? (Yeah, yeah, it's not good for accessibility, but neither of those are. Is it realistic to kill color UI in WYSIWYG editors?) > 0.0279 Attribute “accesskey” not allowed on element “a” at this point. The design of accesskey sucks, but the attribute seems relatively popular. > 0.0236 Attribute “profile” not allowed on element “head” at this > point. The profile instances are mostly due to WordPress. The scheme of picking at most one page per *hostname* still picked a lot of username.wordpress.com blogs. Also, there are a lot of other WP instances out there. These could be knocked out by a single WP version update. > 0.0224 Attribute “size” not allowed on element “font” at this point. > 0.0202 Attribute “bgcolor” not allowed on element “td” at this point. Presentationalism. This is the cut-off for errors that would not have been errors in HTML 4.01 Transitional. > 0.0194 Element “link” not allowed in this context. (The parent was > element “div”.) Suppressing further errors from this subtree. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Sunday, 31 August 2008 17:24:41 UTC