On 02/02/08 02:15, Sam Ruby wrote: > > Anne van Kesteren wrote: >> On Fri, 01 Feb 2008 00:52:14 +0100, Sam Ruby <rubys@us.ibm.com> wrote: >>> I believe that advice applies here. Spaces in IRI should be an error. >> >> You might want to have a look at the work on revising RFC 3987: >> >> https://datatracker.ietf.org/drafts/draft-duerst-iri-bis/ >> >> It introduces a "Legacy Extended IRI" (LEIRI) syntax that allows >> spaces and various other characters. This syntax is primarily designed >> for markup languages. > > IMHO, that would be unfortunate. As I pointed out, a common error I see > in feeds is when trying to detect a URI is relative reference (a common > error in RSS feeds where such usage is ambiguous) is that URI can't be > parsed as a URI at all. Digging deeper, the problem often is a missing > close quote (a missing open quote is another common error). I would be > interested to see if Henri were to dig deeper into the specific errors > he sees if this is also the case in his data. http://philip.html5.org/data/spaced-uris.txt shows some offending URIs. The 6149 values can be grouped in various ways (where each one might overlap several categories): 936 only have spaces at the very beginning or end of the string. 291 only have spaces after a '#'. 2860 only have spaces after a '?'. 576 start with "mailto:". 73 contain a '<'. 57 contain a '>'. 23 contain a '"'. 78 match / [A-Za-z]+=/. So it looks like maybe 2-3% are accidentally missing quotes, and the rest are intentionally using spaces in filenames or query strings or fragment identifiers. -- Philip Taylor pjt47@cam.ac.ukReceived on Saturday, 2 February 2008 14:12:03 UTC
This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:44:26 UTC