Re: draft-montenegro-httpbis-uri-encoding

On 2014/03/22 00:46, Zhong Yu wrote:
> On Fri, Mar 21, 2014 at 10:42 AM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
>> * Zhong Yu wrote:
>>> 1. It's improbable that the origin server uses separate encoding
>>> schemes for path and query. If the encoding scheme for the query part
>>> is known, it can be assumed for the path part too.
>>
>> This is actually a common situation. One reason is that the server
>> software handles the path, and some independent script handles the
>> query, and you might well have a server system that uses UTF-8 in
>> paths, but a legacy script expects ISO-8859-1 query strings. If it
>> is necessary and possible to communicate the encoding, if any, then
>> these two components need separate labels for maximum compatibility.
>
> Hmm, OK.
>
> But the browser has no idea how the URI path was constructed, it would
> be presumptuous for the browser to brave a guess and mislead the
> intermediaries.

Björn described the situation from a server point of view. From a client 
point of view, the client just encodes the path part of an URI/IRI with 
UTF-8, and the query part with the encoding of the page.

This isn't exactly ideal to say the least, but it's what happens in 
practice (unless there's an accept-charset attribute, and that only 
works on forms, see 
http://www.w3.org/TR/html5/forms.html#attr-form-accept-charset).

Therefore indeed it makes sense to have two different header fields, or 
some other way to distinguish path encoding and query encoding (assuming 
it makes sense to have such header fields in the first place).

Regards,   Martin.

Received on Monday, 24 March 2014 07:54:32 UTC