tor 2007-05-24 klockan 10:31 -0700 skrev Eric Lawrence:
> I think the trick is distinguishing between a control character and a byte that's part of a multi-byte international character.
>
> Obviously, we'd need to escape any byte not valid in HTTP headers (e.g. 0x0d, 0x0a) to ensure the integrity of the headers.
The quoting productions in RFC2616 isn't very obvious, but technically
the syntax allows any characer 0-127 quoted. 8-bit characters is not
allowed in HTTP anywhere which kind of rules out the use of most
multi-byte characters without recoding them first..
HTTP builds on MIME which builds on RFC822, and it's successor RFC2822
is good reading regarding these things and an example where the BNF has
been constructed such that it clearly separates producer rules from
parser rules, with strict producer rules and relaxed parser rules
accepting many "obsolete" things.
But yes, in HTTP it's a bit of a mess. I do not think many
implementations parse HTTP entirely correct, nor am I sure it's a
desirable thing to parse HTTP fully to the specs as it requires the
parser to allows a great deal of crap nobody expects as it's not allowed
to produce..
To be honest I don't think many MIME parses passes the full RFC2822
requirements either..
Regards
Henrik