- From: Sebastian Pipping <webmaster@hartwork.org>
- Date: Wed, 25 Jul 2007 15:32:45 +0200
- To: Mike Brown <mike@skew.org>
- CC: uri@w3.org
Mike Brown wrote: > Well, look at it this way: > > There's what the spec says, and there's what implementations do... you'll > probably witness a fair amount of variability in what senders send and what > receivers expect. > > I'd say the "be lenient in what you accept and strict in what you produce" > maxim applies here. ------------------------------------------------------------------------------ Will do. ------------------------------------------------------------------------------ >> Another thing: Do you have any recommendations how to handle >> "%00" when decoding? Should I cut it out? Should I cut it out and >> ignore everything behind it as if it was "\0"? > > %00 wouldn't appear in UTF-8-based data (and if it did, it'd be an error you > could handle any number of ways: abort, ignore, replace)... but %00 could show > up if a different encoding were used. In 8-bit encodings, it'd represent NUL, > but in multibyte encodings other than UTF-8 it could be part of a pair. > > So, I'm hesitant to recommend anything. It really depends on what kind of data > you expect to be receiving and what you intend to do with it, including > whether you intend to treat it as characters or as bytes (the %-encoded > sequences represent bytes-that-represent-characters, so your API might operate > at either level of abstraction). If it's a general-purpose decoder, I'd > probably convert as naively and gracefully as possible, and leave it to the > caller to decide whether the data is usable or not. I wouldn't treat %00 > specially. --------------------------------------------------------------- Good thing is I already store the string length implicitly (char * first and char * afterLast) so NUL in between should not be a problem. Thanks again. Sebastian
Received on Wednesday, 25 July 2007 13:33:18 UTC