- From: Sam Ruby <rubys@us.ibm.com>
- Date: Thu, 31 Jan 2008 18:52:14 -0500
- To: Henri Sivonen <hsivonen@iki.fi>
- CC: HTML Issue Tracking WG <public-html@w3.org>
Henri Sivonen wrote: > > On Jan 31, 2008, at 14:26, Henri Sivonen wrote: > >> I ran an analysis on recent error messages from Validator.nu. >> http://hsivonen.iki.fi/test/moz/analysis.txt > > I reran the numbers. Preface: the idea of a grammar designed in such a way to make errors reported by a validator actually be useful pleases me greatly. :-) [snip] >> 0198 / 400 Bad value “Content-Type” for attribute “http-equiv” on >> element “meta” from namespace “http://www.w3.org/1999/xhtml”. >> 0056 / 400 Bad value “content-type” for attribute “http-equiv” on >> element “meta” from namespace “http://www.w3.org/1999/xhtml”. >> 0004 / 400 Bad value “Content-type” for attribute “http-equiv” on >> element “meta” from namespace “http://www.w3.org/1999/xhtml”. >> 0002 / 400 Bad value “content-Type” for attribute “http-equiv” on >> element “meta” from namespace “http://www.w3.org/1999/xhtml”. >> 0001 / 400 Bad value “CONTENT-TYPE” for attribute “http-equiv” on >> element “meta” from namespace “http://www.w3.org/1999/xhtml”. > > I think we should allow the old internal encoding declaration syntax for > text/html as an alternative to the more elegant syntax. Not declaring > the encoding is bad, so we shouldn't send a negative message to the > authors who are declaring the encoding. Moreover, this is interoperable > stuff. > > I think we shouldn't allow this for application/xhtml+xml, though, > because authors might think it has an effect. By that reasoning, a meta charset encoding declaration should not be allowed if a charset is specified on the Content-Type HTTP header. I ran into that very problem today: http://lists.planetplanet.org/archives/devel/2008-January/001747.html This content was XHTML, but was served as text/html, with a charset specified on the HTTP header, which overrode the charset on the meta declaration. Realize that some tools don't know how the content will ultimately be served. Serving XHTML as text/html, with BOTH a charset specified on the HTTP header AND a meta charset specified just in case is more common than you might think. For example: http://www.alistapart.com/articles/previewofhtml5 A much more useful restriction -- spanning both the HTML5 and XHTML5 serializations -- would be to issue an error if multiple sources for encoding information were explicitly specified and if they differ. >> 0120 / 400 Bad value (redacted) for attribute “href” on element “a” >> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: >> WHITESPACE in QUERY. >> 0036 / 400 Bad value (redacted) for attribute “href” on element “a” >> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: >> DOUBLE_WHITESPACE in QUERY. >> 0042 / 400 Bad value (redacted) for attribute “src” on element >> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >> reference: DOUBLE_WHITESPACE in PATH. >> 0024 / 400 Bad value (redacted) for attribute “href” on element “a” >> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: >> WHITESPACE in PATH. >> 0019 / 400 Bad value (redacted) for attribute “src” on element >> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >> reference: WHITESPACE in PATH. >> 0019 / 400 Bad value (redacted) for attribute “href” on element “a” >> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: >> DOUBLE_WHITESPACE in HOST. >> 0012 / 400 Bad value (redacted) for attribute “href” on element “a” >> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: >> DOUBLE_WHITESPACE in PATH. >> 0007 / 400 Bad value (redacted) for attribute “href” on element “a” >> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: >> WHITESPACE in FRAGMENT. >> 0003 / 400 Bad value (redacted) for attribute “href” on element >> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >> reference: WHITESPACE in PATH. >> 0001 / 400 Bad value (redacted) for attribute “src” on element >> “script” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >> reference: DOUBLE_WHITESPACE in PATH. >> 0001 / 400 Bad value (redacted) for attribute “src” on element >> “input” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >> reference: WHITESPACE in PATH. >> 0001 / 400 Bad value (redacted) for attribute “src” on element >> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >> reference: WHITESPACE in QUERY. >> 0001 / 400 Bad value (redacted) for attribute “href” on element >> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >> reference: WHITESPACE in QUERY. >> 0001 / 400 Bad value (redacted) for attribute “href” on element >> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >> reference: WHITESPACE in FRAGMENT. >> 0001 / 400 Bad value (redacted) for attribute “href” on element “a” >> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference: >> DOUBLE_WHITESPACE in FRAGMENT. > > Wow. The whitespace in IRI issues are far more common than I would have > thought. To the extent U+0020 is harmless and interoperably handled, we > should probably spec a pre-processing step that suppresses cases that > are harmless in practice. I see this all the time in feeds. If you look closer, often the real cause is mismatched quotes causing the parser to grab part of the next attribute as data. A wise man once said to me "In XHTML5, your example parses unambiguously and does not cause interop problems in top 3 browsers that support XHTML. Yet, intuitively, it is clearly bogus. This suggests that the implicit line isn't quite at ambiguity or interop problems." I believe that advice applies here. Spaces in IRI should be an error. >> 0092 / 400 Attribute “xml:lang” not allowed on element “html” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0018 / 400 Attribute “lang” not allowed on element “html” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0002 / 400 Attribute “xml:lang” not allowed on element “q” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0002 / 400 Attribute “xml:lang” not allowed on element “meta” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0002 / 400 Attribute “xml:lang” not allowed on element “link” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0002 / 400 Attribute “xml:lang” not allowed on element “em” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0004 / 400 Attribute “xml:lang” not allowed on element “span” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0004 / 400 Attribute “xml:lang” not allowed on element “a” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0002 / 400 Attribute “lang” not allowed on element “span” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0002 / 400 Attribute “lang” not allowed on element “div” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0002 / 400 Attribute “lang” not allowed on element “a” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “xml:lang” not allowed on element “li” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “xml:lang” not allowed on element “h4” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “xml:lang” not allowed on element “h3” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “xml:lang” not allowed on element “div” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “xml:lang” not allowed on element “blockquote” >> from namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “xml:lang” not allowed on element “abbr” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “lang” not allowed on element “p” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “lang” not allowed on element “h2” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “lang” not allowed on element “h1” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “lang” not allowed on element “em” from >> namespace “http://www.w3.org/1999/xhtml” at this point. >> 0001 / 400 Attribute “lang” not allowed on element “body” from >> namespace “http://www.w3.org/1999/xhtml” at this point. > > It seems that many people have copied XHTML boilerplate, but only few > docs use xml:lang on non-root elements. > > Perhaps we should allow xml:lang as a talisman in text/html if lang is > present and they have the same value. This isn't going to fun for libs > that map HTML5 to XML, though. It should be an error if both are present on any given element. That restriction would makes "libs" job easier, they simply need to look for either "spelling". In both serializations. - Sam Ruby
Received on Thursday, 31 January 2008 23:52:29 UTC