- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Mon, 17 Mar 2008 06:21:17 +0100
- To: ietf-http-wg@w3.org
Martin Duerst wrote: > It's all those protocol fields where you need human-readable > text. The Subject: of an email very clearly qualifies as > text, so it's not only body. Using RFC 2047 for <unstructured> header fields bodies is straight forward, no conflict with RFC 2277. Apparently HTTP has no unstructured fields, so that might be irrelevant. Brian has already mentioned comments, using RFC 2047 "as is" (no 2231 parameter-folding) should be no issue for 2616bis comments. I've no idea why anybody would want 4646 language tags in a comment, but that part of RFC 2231 would be also straight forward. And if the other side has no clue what an encoded comment with language tags is, nothing will break - or rather I'm not aware of scenarios where comments could be critical. > Can you show how allowing e.g. new HTTP header fields to > use UTF-8 would break anything in the installed base? I'm sure that the mentioned HTTP/1.0 browser didn't support UTF-8, and I'm sure that I have not yet tested an IE6 plugin claiming to offer some kind IDN of support. Hard to prove a negative, but FF2 didn't like non-UTF-8 <ipath> in your test suite, <ihost> and UTF-8 worked, from that I *guess* UTF-8 can work for this browser. 2616bis is supposed to work with any server and UA since the times of RFC 2068, that's why I think we can at best get rid of the "default Latin-1", but not simply replace it by UTF-8. Your heuristics to distinguish Latin-1 and UTF-8 depends on finding 0x80..0x9F (=> potential UTF-8 trail byte, likely no C1), 0xC0..0xC1 (=> can't be UTF-8), or 0xFE..0xFF (ditto), and further fine tuning opportunities for the STD 63 UTF-8. But it's not guaranteed to work for short strings in a header field, and there's no way to put it into old servers and UAs supporting only Latin-1. > There is a big difference between MIME (ASCII+RFC2045) and > HTTP (iso-8859-1+RFC2045). That's why I'd prefer to get rid of "default Latin-1", going from US-ASCII to UTF-8 later (after 2616bis) is hopefully simpler than from Latin-1 to UTF-8. Using a BOM for this magic is dubious, 0xEFBBBF is valid Latin-1. > I have yet to see a case where the absence of language > information in a header is a problem in practice. Do you > know of any? No. It would need to be something that's displayed, parsed, voice output, dunno, anything where the language or script helps. Excluding eur-EU, mis-EA, zxx-DG, und-IC, etc. <eg> Frank
Received on Monday, 17 March 2008 05:19:36 UTC