Re: [url] Requests for Feedback (was Feedback from TPAC)

On 2015/01/06 02:16, Bjoern Hoehrmann wrote:
> * Martin J. Dürst wrote:
>> The URL spec, as far as I understand, allows Unicode as input, so in
>> that respect, it isn't ghettoizing. But it converts all output to ASCII,
>> and so essentially sends a message that Unicode is second-class.
>>
>> My understanding is that the reason for this is that current browser
>> interfaces are working that way, and I'm not against documenting that,
>> but I'd wish we could get away from that limitation for the general case
>> (i.e. parser results are still Unicode).
>
> There are a couple of conflicting requirements that make that difficult.
> If you make an API for resource identifiers, you don't want it to change
> behavior when new schemes are introduced; you probably also want that an
> input like `example:///ö` is handled the same as `example:///%c3%b6` and
> then also avoid turning `data:image/png,...%xx...` into a mix of random
> Unicode characters interspersed with %xx escapes that would not round-
> trip if decoded. If you want Unicode output, and a data-like scheme is
> introduced, you cannot satisfy all requirements.

This is indeed a theoretical problem, but one that in practice rarely 
shows up and is rather easily dealt with.

First, data:-like schemes are few and far between.

Second, there's no reason to convert to Unicode sequences of %xx that 
can't be converted in full.

Third, the equivalence between "ö" and "%c3%b6" might be provided at a 
higher level in the API, because "is handled the same" assumes a 
universal equivalence function for URIs and IRIs when the specs clearly 
explain that there is no such thing (see 
http://tools.ietf.org/html/rfc3986#section-6 and 
http://tools.ietf.org/html/rfc3987#section-5).

Regards,   Martin.

Received on Tuesday, 6 January 2015 10:35:21 UTC