W3C home > Mailing lists > Public > uri@w3.org > July 2007

Re: [Need advice] When to decode '+' to ' '?

From: Sebastian Pipping <webmaster@hartwork.org>
Date: Wed, 25 Jul 2007 15:32:45 +0200
Message-ID: <46A750FD.1020405@hartwork.org>
To: Mike Brown <mike@skew.org>
CC: uri@w3.org

Mike Brown wrote:
> Well, look at it this way:
> There's what the spec says, and there's what implementations do... you'll 
> probably witness a fair amount of variability in what senders send and what 
> receivers expect.
> I'd say the "be lenient in what you accept and strict in what you produce"
> maxim applies here.

Will do.

>> Another thing: Do you have any recommendations how to handle
>> "%00" when decoding? Should I cut it out? Should I cut it out and
>> ignore everything behind it as if it was "\0"?
> %00 wouldn't appear in UTF-8-based data (and if it did, it'd be an error you 
> could handle any number of ways: abort, ignore, replace)... but %00 could show 
> up if a different encoding were used. In 8-bit encodings, it'd represent NUL, 
> but in multibyte encodings other than UTF-8 it could be part of a pair.
> So, I'm hesitant to recommend anything. It really depends on what kind of data 
> you expect to be receiving and what you intend to do with it, including 
> whether you intend to treat it as characters or as bytes (the %-encoded 
> sequences represent bytes-that-represent-characters, so your API might operate 
> at either level of abstraction). If it's a general-purpose decoder, I'd 
> probably convert as naively and gracefully as possible, and leave it to the 
> caller to decide whether the data is usable or not. I wouldn't treat %00 
> specially.

Good thing is I already store the string length
implicitly (char * first and char * afterLast) so
NUL in between should not be a problem.

Thanks again.

Received on Wednesday, 25 July 2007 13:33:18 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:11 UTC