- From: Geoffrey Sneddon <foolistbar@googlemail.com>
- Date: Sat, 2 Feb 2008 14:19:04 +0000
- To: Sam Ruby <rubys@us.ibm.com>
- Cc: Henri Sivonen <hsivonen@iki.fi>, HTML Issue Tracking WG <public-html@w3.org>
On 31 Jan 2008, at 23:52, Sam Ruby wrote: >>> 0120 / 400 Bad value (redacted) for attribute “href” on element >>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in QUERY. >>> 0036 / 400 Bad value (redacted) for attribute “href” on element >>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: DOUBLE_WHITESPACE in QUERY. >>> 0042 / 400 Bad value (redacted) for attribute “src” on element >>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: DOUBLE_WHITESPACE in PATH. >>> 0024 / 400 Bad value (redacted) for attribute “href” on element >>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in PATH. >>> 0019 / 400 Bad value (redacted) for attribute “src” on element >>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in PATH. >>> 0019 / 400 Bad value (redacted) for attribute “href” on element >>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: DOUBLE_WHITESPACE in HOST. >>> 0012 / 400 Bad value (redacted) for attribute “href” on element >>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: DOUBLE_WHITESPACE in PATH. >>> 0007 / 400 Bad value (redacted) for attribute “href” on element >>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in FRAGMENT. >>> 0003 / 400 Bad value (redacted) for attribute “href” on element >>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in PATH. >>> 0001 / 400 Bad value (redacted) for attribute “src” on element >>> “script” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: DOUBLE_WHITESPACE in PATH. >>> 0001 / 400 Bad value (redacted) for attribute “src” on element >>> “input” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in PATH. >>> 0001 / 400 Bad value (redacted) for attribute “src” on element >>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in QUERY. >>> 0001 / 400 Bad value (redacted) for attribute “href” on element >>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in QUERY. >>> 0001 / 400 Bad value (redacted) for attribute “href” on element >>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: WHITESPACE in FRAGMENT. >>> 0001 / 400 Bad value (redacted) for attribute “href” on element >>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI >>> reference: DOUBLE_WHITESPACE in FRAGMENT. >> Wow. The whitespace in IRI issues are far more common than I would >> have thought. To the extent U+0020 is harmless and interoperably >> handled, we should probably spec a pre-processing step that >> suppresses cases that are harmless in practice. > > I see this all the time in feeds. If you look closer, often the > real cause is mismatched quotes causing the parser to grab part of > the next attribute as data. > > A wise man once said to me "In XHTML5, your example parses > unambiguously and does not cause interop problems in top 3 browsers > that support XHTML. Yet, intuitively, it is clearly bogus. This > suggests that the implicit line isn't quite at ambiguity or interop > problems." > > I believe that advice applies here. Spaces in IRI should be an error. I also agree that it should be non-conforming, but I think we should define behaviour for parsing invalid IRIs (even if we do you just point to LEIRI for UAs, and IRI for documents) — even XML defines error handling for SYSTEM identifiers! Henri, were you meaning to make it conformant or just defining behaviour of spaces in IRIs? I read it as the latter, but just to clear up the matter. -- Geoffrey Sneddon <http://gsnedders.com/>
Received on Saturday, 2 February 2008 14:19:16 UTC