Re: [Need advice] When to decode '+' to ' '? from Mike Brown on 2007-07-25 (uri@w3.org from July 2007)

From: Mike Brown <mike@skew.org>
Date: Wed, 25 Jul 2007 00:56:26 -0600 (MDT)
To: Sebastian Pipping <webmaster@hartwork.org>
CC: Mike Brown <mike@skew.org>, uri@w3.org
Message-Id: <200707250656.l6P6uQ68033962@chilled.skew.org>

Sebastian Pipping wrote:
> > 1. In each name and value, encode each CR, LF, or CR+LF to "%0D%0A".
> 
> -------------------------------------------------------------------------
> What I don't like about it besides the extra work is that the
> data is modified in an irreversible way. Is this optional
> or a must?

Well, look at it this way:

There's what the spec says, and there's what implementations do... you'll 
probably witness a fair amount of variability in what senders send and what 
receivers expect.

I'd say the "be lenient in what you accept and strict in what you produce"
maxim applies here.

> Another thing: Do you have any recommendations how to handle
> "%00" when decoding? Should I cut it out? Should I cut it out and
> ignore everything behind it as if it was "\0"?

%00 wouldn't appear in UTF-8-based data (and if it did, it'd be an error you 
could handle any number of ways: abort, ignore, replace)... but %00 could show 
up if a different encoding were used. In 8-bit encodings, it'd represent NUL, 
but in multibyte encodings other than UTF-8 it could be part of a pair.

So, I'm hesitant to recommend anything. It really depends on what kind of data 
you expect to be receiving and what you intend to do with it, including 
whether you intend to treat it as characters or as bytes (the %-encoded 
sequences represent bytes-that-represent-characters, so your API might operate 
at either level of abstraction). If it's a general-purpose decoder, I'd 
probably convert as naively and gracefully as possible, and leave it to the 
caller to decide whether the data is usable or not. I wouldn't treat %00 
specially.

Mike

Received on Wednesday, 25 July 2007 06:57:02 UTC