Re: parsing URI (references) according to RFC 3986

On 6/20/2011 12:37 AM, Julian Reschke wrote:
> So, what *is* the set of interop problems here?
>
> 1) Extracting them from a/@href and friends (whitespace treatment)
>
> 2) Handling invalid ASCII characters (SP, "\", "<", ">"...)
>
> 3) Handling non-ASCII characters in query component
>
> 4) Handling non-ASCII characters in authority components
>
> 5) Handling non-ASCII characters everywhere else
>
> Anything else?
>
> Best regards, Julian


6) Handling percent-encoded values in various components

7) Handling the 'valid' but questionable ASCII characters in various 
segments.  Like the "\", the "|", and even the ";"



=======================================
Test Case: http://0028.iris.test.ing;g

The DOM parsing is different in each FF, IE, and Opera - while both 
Safari and Chrome error.  FF drops the ";g", Opera uses it in the path, 
and IE uses it in the hostname...

Scheme	Hostname		Path	Browser
:					Chrome/12.0.742.100
http:	0028.iris.test.ing	/	Firefox/4.0.1
http:	0028.iris.test.ing	/;g	Opera/9.80
:					Safari/5.0.5
http:	0028.iris.test.ing;g		MSIE 7.0

But the raw HTTP request is interesting because Firefox does it 
differently than its DOM parsing.  Neither Chrome, Safari, or IE even 
make an HTTP request.

Path	Browser
/;g	Firefox/4.0.1
/;g	Opera/9.80


=======================================
Test Case: http://0029.iris.test.ing;./g

In this slightly different case Firefox has changed its handling of the 
";" trailing the hostname, and treats it instead as part of the path.

Scheme	Hostname		Path	Browser
:					Chrome/12.0.742.100
http:	0029.iris.test.ing	/;./g	Firefox/4.0.1
http:	0029.iris.test.ing	/;./g	Opera/9.80
http:	0029.iris.test.ing;.	g	MSIE 7.0
:					Safari/5.0.5

Similar results as above for the HTTP request.

Path	Browser
/;./g	Firefox/4.0.1
/;./g	Opera/9.80

-Chris

Received on Monday, 20 June 2011 08:25:23 UTC