- From: Mike Brown <mike@skew.org>
- Date: Wed, 25 Jul 2007 00:56:26 -0600 (MDT)
- To: Sebastian Pipping <webmaster@hartwork.org>
- CC: Mike Brown <mike@skew.org>, uri@w3.org
Sebastian Pipping wrote: > > 1. In each name and value, encode each CR, LF, or CR+LF to "%0D%0A". > > ------------------------------------------------------------------------- > What I don't like about it besides the extra work is that the > data is modified in an irreversible way. Is this optional > or a must? Well, look at it this way: There's what the spec says, and there's what implementations do... you'll probably witness a fair amount of variability in what senders send and what receivers expect. I'd say the "be lenient in what you accept and strict in what you produce" maxim applies here. > Another thing: Do you have any recommendations how to handle > "%00" when decoding? Should I cut it out? Should I cut it out and > ignore everything behind it as if it was "\0"? %00 wouldn't appear in UTF-8-based data (and if it did, it'd be an error you could handle any number of ways: abort, ignore, replace)... but %00 could show up if a different encoding were used. In 8-bit encodings, it'd represent NUL, but in multibyte encodings other than UTF-8 it could be part of a pair. So, I'm hesitant to recommend anything. It really depends on what kind of data you expect to be receiving and what you intend to do with it, including whether you intend to treat it as characters or as bytes (the %-encoded sequences represent bytes-that-represent-characters, so your API might operate at either level of abstraction). If it's a general-purpose decoder, I'd probably convert as naively and gracefully as possible, and leave it to the caller to decide whether the data is usable or not. I wouldn't treat %00 specially. Mike
Received on Wednesday, 25 July 2007 06:57:02 UTC