W3C home > Mailing lists > Public > www-talk@w3.org > March to April 2010

RE: non ascii character in headers?

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 3 Mar 2010 16:29:53 -0800
To: Joseph Holsten <joseph@josephholsten.com>, Reinier Post <rp@win.tue.nl>
CC: "www-talk@w3.org" <www-talk@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4DAD544E@nambxv01a.corp.adobe.com>
Internet Explorer has an option "Send UTF-8 URLs" in the
Internet Options preference panel.

Not sure what that does. Maybe this belongs in
httpbis and/or public-iri working groups?

Larry
--
http://larry.masinter.net


-----Original Message-----
From: www-talk-request@w3.org [mailto:www-talk-request@w3.org] On Behalf Of Joseph Holsten
Sent: Wednesday, March 03, 2010 9:44 AM
To: Reinier Post
Cc: www-talk@w3.org
Subject: Re: non ascii character in headers?


On Mar 3, 2010, at 3:41 AM, Reinier Post wrote:

> On Tue, Mar 02, 2010 at 02:17:13PM +0100, Julian Reschke wrote:
>> On 02.03.2010 00:49, Brendan Miller wrote:
>>> I'm looking at a possible bug in my companies http handling library.
>>> The code seems to assume that there are no bytes with the higher order
>>> bit set in the http Location header. I'm thinking this will break if
>>> the Location header's URI contains non-ascii characters.
>> 
>> In which case it wouldn't be a valid URI.
>> 
>>> Is my thinking correct, or is there some rule that prohibits non-ascii
>>> chars in an http header?
>> 
>> Valid URIs never contain non-ASCII characters.
> 
> This is not true, see section 2,1 of the spec:
> 
>  http://www.ietf.org/rfc/rfc2396.txt

IRIs that contain non-ASCII characters need to define a way to be converted to a pure ASCII URI. Segments containing domain names will typically get punycoded, other segments typically encode as UTF-8 then percent-encode. Admittedly this stuff stuff is confusing and spread between a number of specs.

But you shouldn't be getting URIs with non-ASCII characters from across the wire. If you are getting high bits, you've got issues because you don't know what how the characters are encoded. Maybe it's UTF-8, but who really knows? It's a non-standard response. You might have luck treating it as a UTF-8 encoded IRI and mapping the IRI to a URI as per RFC 3987 3.1.

If you are hitting this bug in real use, we might be able to help more by knowing what you're dealing with. Otherwise, it's perfectly fine that your code doesn't handle non-ASCII characters in a Location header.
--
Joseph Holsten
http://josephholsten.com
Received on Thursday, 4 March 2010 00:30:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:31 GMT