W3C home > Mailing lists > Public > public-html@w3.org > January 2008

Re: Validation error frequencies

From: Sam Ruby <rubys@us.ibm.com>
Date: Thu, 31 Jan 2008 18:52:14 -0500
Message-ID: <47A25F2E.7080709@us.ibm.com>
To: Henri Sivonen <hsivonen@iki.fi>
CC: HTML Issue Tracking WG <public-html@w3.org>

Henri Sivonen wrote:
> 
> On Jan 31, 2008, at 14:26, Henri Sivonen wrote:
> 
>> I ran an analysis on recent error messages from Validator.nu.
>> http://hsivonen.iki.fi/test/moz/analysis.txt
> 
> I reran the numbers.

Preface: the idea of a grammar designed in such a way to make errors 
reported by a validator actually be useful pleases me greatly.  :-)

[snip]

>> 0198 / 400    Bad value “Content-Type” for attribute “http-equiv” on 
>> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
>> 0056 / 400    Bad value “content-type” for attribute “http-equiv” on 
>> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
>> 0004 / 400    Bad value “Content-type” for attribute “http-equiv” on 
>> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
>> 0002 / 400    Bad value “content-Type” for attribute “http-equiv” on 
>> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
>> 0001 / 400    Bad value “CONTENT-TYPE” for attribute “http-equiv” on 
>> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
> 
> I think we should allow the old internal encoding declaration syntax for 
> text/html as an alternative to the more elegant syntax. Not declaring 
> the encoding is bad, so we shouldn't send a negative message to the 
> authors who are declaring the encoding. Moreover, this is interoperable 
> stuff.
> 
> I think we shouldn't allow this for application/xhtml+xml, though, 
> because authors might think it has an effect.

By that reasoning, a meta charset encoding declaration should not be 
allowed if a charset is specified on the Content-Type HTTP header.  I 
ran into that very problem today:

http://lists.planetplanet.org/archives/devel/2008-January/001747.html

This content was XHTML, but was served as text/html, with a charset 
specified on the HTTP header, which overrode the charset on the meta 
declaration.

Realize that some tools don't know how the content will ultimately be 
served.

Serving XHTML as text/html, with BOTH a charset specified on the HTTP 
header AND a meta charset specified just in case is more common than you 
might think.  For example:

http://www.alistapart.com/articles/previewofhtml5

A much more useful restriction -- spanning both the HTML5 and XHTML5 
serializations -- would be to issue an error if multiple sources for 
encoding information were explicitly specified and if they differ.

>> 0120 / 400    Bad value (redacted) for attribute “href” on element “a” 
>> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: 
>> WHITESPACE in QUERY.
>> 0036 / 400    Bad value (redacted) for attribute “href” on element “a” 
>> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: 
>> DOUBLE_WHITESPACE in QUERY.
>> 0042 / 400    Bad value (redacted) for attribute “src” on element 
>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI 
>> reference: DOUBLE_WHITESPACE in PATH.
>> 0024 / 400    Bad value (redacted) for attribute “href” on element “a” 
>> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: 
>> WHITESPACE in PATH.
>> 0019 / 400    Bad value (redacted) for attribute “src” on element 
>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI 
>> reference: WHITESPACE in PATH.
>> 0019 / 400    Bad value (redacted) for attribute “href” on element “a” 
>> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: 
>> DOUBLE_WHITESPACE in HOST.
>> 0012 / 400    Bad value (redacted) for attribute “href” on element “a” 
>> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: 
>> DOUBLE_WHITESPACE in PATH.
>> 0007 / 400    Bad value (redacted) for attribute “href” on element “a” 
>> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: 
>> WHITESPACE in FRAGMENT.
>> 0003 / 400    Bad value (redacted) for attribute “href” on element 
>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI 
>> reference: WHITESPACE in PATH.
>> 0001 / 400    Bad value (redacted) for attribute “src” on element 
>> “script” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI 
>> reference: DOUBLE_WHITESPACE in PATH.
>> 0001 / 400    Bad value (redacted) for attribute “src” on element 
>> “input” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI 
>> reference: WHITESPACE in PATH.
>> 0001 / 400    Bad value (redacted) for attribute “src” on element 
>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI 
>> reference: WHITESPACE in QUERY.
>> 0001 / 400    Bad value (redacted) for attribute “href” on element 
>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI 
>> reference: WHITESPACE in QUERY.
>> 0001 / 400    Bad value (redacted) for attribute “href” on element 
>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI 
>> reference: WHITESPACE in FRAGMENT.
>> 0001 / 400    Bad value (redacted) for attribute “href” on element “a” 
>> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: 
>> DOUBLE_WHITESPACE in FRAGMENT.
> 
> Wow. The whitespace in IRI issues are far more common than I would have 
> thought. To the extent U+0020 is harmless and interoperably handled, we 
> should probably spec a pre-processing step that suppresses cases that 
> are harmless in practice.

I see this all the time in feeds.  If you look closer, often the real 
cause is mismatched quotes causing the parser to grab part of the next 
attribute as data.

A wise man once said to me "In XHTML5, your example parses unambiguously 
and does not cause interop problems in top 3 browsers that support 
XHTML. Yet, intuitively, it is clearly bogus. This suggests that the 
implicit line isn't quite at ambiguity or interop problems."

I believe that advice applies here.  Spaces in IRI should be an error.

>> 0092 / 400    Attribute “xml:lang” not allowed on element “html” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0018 / 400    Attribute “lang” not allowed on element “html” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0002 / 400    Attribute “xml:lang” not allowed on element “q” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0002 / 400    Attribute “xml:lang” not allowed on element “meta” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0002 / 400    Attribute “xml:lang” not allowed on element “link” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0002 / 400    Attribute “xml:lang” not allowed on element “em” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0004 / 400    Attribute “xml:lang” not allowed on element “span” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0004 / 400    Attribute “xml:lang” not allowed on element “a” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0002 / 400    Attribute “lang” not allowed on element “span” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0002 / 400    Attribute “lang” not allowed on element “div” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0002 / 400    Attribute “lang” not allowed on element “a” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “xml:lang” not allowed on element “li” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “xml:lang” not allowed on element “h4” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “xml:lang” not allowed on element “h3” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “xml:lang” not allowed on element “div” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “xml:lang” not allowed on element “blockquote” 
>> from namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “xml:lang” not allowed on element “abbr” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “lang” not allowed on element “p” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “lang” not allowed on element “h2” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “lang” not allowed on element “h1” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “lang” not allowed on element “em” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
>> 0001 / 400    Attribute “lang” not allowed on element “body” from 
>> namespace “http://www.w3.org/1999/xhtml” at this point.
> 
> It seems that many people have copied XHTML boilerplate, but only few 
> docs use xml:lang on non-root elements.
> 
> Perhaps we should allow xml:lang as a talisman in text/html if lang is 
> present and they have the same value. This isn't going to fun for libs 
> that map HTML5 to XML, though.

It should be an error if both are present on any given element.  That 
restriction would makes "libs" job easier, they simply need to look for 
either "spelling".  In both serializations.

- Sam Ruby
Received on Thursday, 31 January 2008 23:52:29 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:29 UTC