Re: Validation error frequencies

On 31 Jan 2008, at 23:52, Sam Ruby wrote:

>>> 0120 / 400    Bad value (redacted) for attribute “href” on element  
>>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in QUERY.
>>> 0036 / 400    Bad value (redacted) for attribute “href” on element  
>>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: DOUBLE_WHITESPACE in QUERY.
>>> 0042 / 400    Bad value (redacted) for attribute “src” on element  
>>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: DOUBLE_WHITESPACE in PATH.
>>> 0024 / 400    Bad value (redacted) for attribute “href” on element  
>>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in PATH.
>>> 0019 / 400    Bad value (redacted) for attribute “src” on element  
>>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in PATH.
>>> 0019 / 400    Bad value (redacted) for attribute “href” on element  
>>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: DOUBLE_WHITESPACE in HOST.
>>> 0012 / 400    Bad value (redacted) for attribute “href” on element  
>>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: DOUBLE_WHITESPACE in PATH.
>>> 0007 / 400    Bad value (redacted) for attribute “href” on element  
>>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in FRAGMENT.
>>> 0003 / 400    Bad value (redacted) for attribute “href” on element  
>>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in PATH.
>>> 0001 / 400    Bad value (redacted) for attribute “src” on element  
>>> “script” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: DOUBLE_WHITESPACE in PATH.
>>> 0001 / 400    Bad value (redacted) for attribute “src” on element  
>>> “input” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in PATH.
>>> 0001 / 400    Bad value (redacted) for attribute “src” on element  
>>> “img” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in QUERY.
>>> 0001 / 400    Bad value (redacted) for attribute “href” on element  
>>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in QUERY.
>>> 0001 / 400    Bad value (redacted) for attribute “href” on element  
>>> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: WHITESPACE in FRAGMENT.
>>> 0001 / 400    Bad value (redacted) for attribute “href” on element  
>>> “a” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
>>> reference: DOUBLE_WHITESPACE in FRAGMENT.
>> Wow. The whitespace in IRI issues are far more common than I would  
>> have thought. To the extent U+0020 is harmless and interoperably  
>> handled, we should probably spec a pre-processing step that  
>> suppresses cases that are harmless in practice.
>
> I see this all the time in feeds.  If you look closer, often the  
> real cause is mismatched quotes causing the parser to grab part of  
> the next attribute as data.
>
> A wise man once said to me "In XHTML5, your example parses  
> unambiguously and does not cause interop problems in top 3 browsers  
> that support XHTML. Yet, intuitively, it is clearly bogus. This  
> suggests that the implicit line isn't quite at ambiguity or interop  
> problems."
>
> I believe that advice applies here.  Spaces in IRI should be an error.

I also agree that it should be non-conforming, but I think we should  
define behaviour for parsing invalid IRIs (even if we do you just  
point to LEIRI for UAs, and IRI for documents) — even XML defines  
error handling for SYSTEM identifiers!

Henri, were you meaning to make it conformant or just defining  
behaviour of spaces in IRIs? I read it as the latter, but just to  
clear up the matter.


--
Geoffrey Sneddon
<http://gsnedders.com/>

Received on Saturday, 2 February 2008 14:19:16 UTC