W3C home > Mailing lists > Public > uri@w3.org > July 2007

Re: [Need advice] When to decode '+' to ' '?

From: Mike Brown <mike@skew.org>
Date: Wed, 25 Jul 2007 00:56:26 -0600 (MDT)
Message-Id: <200707250656.l6P6uQ68033962@chilled.skew.org>
To: Sebastian Pipping <webmaster@hartwork.org>
CC: Mike Brown <mike@skew.org>, uri@w3.org

Sebastian Pipping wrote:
> > 1. In each name and value, encode each CR, LF, or CR+LF to "%0D%0A".
> 
> -------------------------------------------------------------------------
> What I don't like about it besides the extra work is that the
> data is modified in an irreversible way. Is this optional
> or a must?

Well, look at it this way:

There's what the spec says, and there's what implementations do... you'll 
probably witness a fair amount of variability in what senders send and what 
receivers expect.

I'd say the "be lenient in what you accept and strict in what you produce"
maxim applies here.

> Another thing: Do you have any recommendations how to handle
> "%00" when decoding? Should I cut it out? Should I cut it out and
> ignore everything behind it as if it was "\0"?

%00 wouldn't appear in UTF-8-based data (and if it did, it'd be an error you 
could handle any number of ways: abort, ignore, replace)... but %00 could show 
up if a different encoding were used. In 8-bit encodings, it'd represent NUL, 
but in multibyte encodings other than UTF-8 it could be part of a pair.

So, I'm hesitant to recommend anything. It really depends on what kind of data 
you expect to be receiving and what you intend to do with it, including 
whether you intend to treat it as characters or as bytes (the %-encoded 
sequences represent bytes-that-represent-characters, so your API might operate 
at either level of abstraction). If it's a general-purpose decoder, I'd 
probably convert as naively and gracefully as possible, and leave it to the 
caller to decide whether the data is usable or not. I wouldn't treat %00 
specially.

Mike
Received on Wednesday, 25 July 2007 06:57:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:37 GMT