Re: Content-Location: character set from Mark Nottingham on 2011-02-10 (ietf-http-wg@w3.org from January to March 2011)

From: Mark Nottingham <mnot@mnot.net>
Date: Thu, 10 Feb 2011 20:49:41 +1100
To: Anne van Kesteren <annevk@opera.com>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, "httpbis Group" <ietf-http-wg@w3.org>, "Julian F. Reschke" <julian.reschke@gmx.de>
Message-Id: <DAF4A8B8-CF88-45E4-AB36-297FE2E999D9@mnot.net>

I was thinking that if you attempted to decode as iso-8859-1, and the characters you get fall out of the range allowed by the BNF (token or quoted string), that's considered an error (and therefore a trigger for error handling). 

However, Julian pointed out to me that quoted-string is defined as TEXT, which is <any OCTET except CTLs, but including LWS>, which as you and Martin point out, doesn't help.

Never mind, then -- back to your usual programming.

On 10/02/2011, at 7:40 PM, Anne van Kesteren wrote:

> On Thu, 10 Feb 2011 02:30:47 +0100, Mark Nottingham <mnot@mnot.net> wrote:
>> I suggest calling such headers explicitly invalid so that receivers who choose to implement error handling (as per the previous thread I started recently) have a "hook" to do so; i.e., if a string fails to decode as 8859-1, they can implement error handling to try it as UTF-8. It's not particularly elegant to do it this way, but it is workable given the constraints we have.
> 
> Decoding as ISO-8859-1 cannot fail. Each byte maps to a character.
> 
> I have not checked recently, but the Location header might be decoded as UTF-8 in some clients.
> 
> 
> -- 
> Anne van Kesteren
> http://annevankesteren.nl/

--
Mark Nottingham   http://www.mnot.net/

Received on Thursday, 10 February 2011 09:50:14 UTC