[whatwg/url] Validating and escaping mismatch in pathname (#112)

https://url.spec.whatwg.org/#path-state says path segments is handled as following. It means characters other than URL code points are invalid.
> If c is not a URL code point and not "%", syntax violation.
>> The URL code points are ASCII alphanumeric, "!", "$", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", ":", ";", "=", "?", "@", "_", "~", and code points in the ranges U+00A0 to U+D7FF, U+E000 to U+FDCF, U+FDF0 to U+FFFD, U+10000 to U+1FFFD, U+20000 to U+2FFFD, U+30000 to U+3FFFD, U+40000 to U+4FFFD, U+50000 to U+5FFFD, U+60000 to U+6FFFD, U+70000 to U+7FFFD, U+80000 to U+8FFFD, U+90000 to U+9FFFD, U+A0000 to U+AFFFD, U+B0000 to U+BFFFD, U+C0000 to U+CFFFD, U+D0000 to U+DFFFD, U+E0000 to U+EFFFD, U+F0000 to U+FFFFD, U+100000 to U+10FFFD.
>>
>> A syntax violation indicates a non-fatal mismatch between input and syntax requirements. User agents, especially conformance checkers are encouraged to report them somewhere.

But it also says it should escape characters other than default encode set.
> Otherwise, UTF-8 percent encode c using the default encode set, and append the result to buffer.
>> The simple encode set are C0 controls and all code points greater than U+007E.
>> The default encode set is the simple encode set and code points U+0020, '"', "#", "<", ">", "?", "`", "{", and "}".

This means "[", "]", "^", and "|" are invalid for URL but aren't escaped.

Chrome 49.0.2623.87 and Safari 9.1 (11601.5.17.1) doesn't escape "[" and "]" but escape "^" and "|".
Firefox 39.0 escapes all of them, but 45.0 escapes only "^" (sends "[", "]", and "|" as is).

As the those result, I think "[" and "]" should be added to the URL code points, but I wander "^" and "|".

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/112

Received on Monday, 4 April 2016 09:19:25 UTC