Re: parsing URI (references) according to RFC 3986

On 2011-06-20 10:24, Chris Weber wrote:
> On 6/20/2011 12:37 AM, Julian Reschke wrote:
>> So, what *is* the set of interop problems here?
>>
>> 1) Extracting them from a/@href and friends (whitespace treatment)
>>
>> 2) Handling invalid ASCII characters (SP, "\", "<", ">"...)
>>
>> 3) Handling non-ASCII characters in query component
>>
>> 4) Handling non-ASCII characters in authority components
>>
>> 5) Handling non-ASCII characters everywhere else
>>
>> Anything else?
>>
>> Best regards, Julian
>
>
> 6) Handling percent-encoded values in various components

Is there a *problem* related to this?

I can see that the exposed DOM properties vary on how things are 
canonicalized, but that's a DOM issue, not a URI/IRI issue.

> 7) Handling the 'valid' but questionable ASCII characters in various
> segments. Like the "\", the "|", and even the ";"

If my ABNF math is correct, the invalid characters are:

DQUOTE / "#" / "%" / "/" / "<" / ">" / "?" / "[" / "\" / "]" / "^" / "`" 
/ "{" / "|" / "}"

So "|" and "\" aren't valid (and fall under 2).

What's the problem with ";"? (I recall a thread about a Mozilla problem, 
but maybe we can just consider this a bug that needs to be fixed?)

Best regards, Julian

Received on Monday, 20 June 2011 08:35:30 UTC