- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Fri, 14 Mar 2008 14:25:55 +0100
- To: ietf-http-wg@w3.org
Brian Smith wrote: > Consider: > Content-Type: text/plain;charset="=?utf-8?q?utf-8?=" > (how do you compare this against 'text/plain;charset="utf-8"'?) [...] > the grammar seems to allow encoded-words to be mixed with unencoded > words. Yes. FWS between encoded words is removed by decoders, that's how you get around the length limit, just insert FWS between characters in the encoder. Not between input octets, you can't "split" UTF-8. FWS between encoded and unencoded words is for real (= white space), like FWS beween unencoded words. (FWS is "folding white space", the older MIME RFCs still use "linear white space" like RFC 2616). > multiple encodings (e.g. UTF-8 and UTF-7) to be mixed. Yep, I like pc-multilingual-850+euro better than UTF-7, it arrives faster at the critical "75", especially if combined with a *long* RFC 4646 language tag as specified in RFC 2231. Back to your original point, you don't 2047-encode quoted-strings, RFC 2047 says: | + An 'encoded-word' MUST NOT appear within a 'quoted-string'. [...] | + An 'encoded-word' MUST NOT be used in parameter of a MIME | Content-Type or Content-Disposition field, or in any structured | field body except within a 'comment' or 'phrase'. Roughly the idea is that you can 2047-encode unstructured header fields as you like (example: Subject in mail or news). For any structured header field you must not touch its structure, a comment must be still a comment, a mail address is must not be touched at all, ditto quoted-string etc. (see above). Example: This (is) a test For a structured header field with name "Example" and field body "This (is) a test", the four words actually in some interesting charset, you can 2047-encode "This", "is", "a", "test", "a test". You cannot encode anything with "(is)", it would obscure the structure, here the comment "(" and ")". The goal is, that an MTA or MUA knowing nothing about MIME at all, can simply treat (=?us-ascii*tlh?Q?is?=) or similar as some weird ASCII-word in a comment, it doesn't need to know that it's an US-ASCII Klingon "is", but it needs to know that it's a comment. Example: "umlauted gibberish" You can't encode the gibberish within the quoted-string. But for structured fields quoted-string *always* (please check this) is used where unquoted words are allowed. In other words you cannot do "=?utf-8?Q?umlauted_gibberish?=" within quotes. But you can use =?utf-8?Q?umlauted_gibberish?= without quotes as ordinary word ("ordinary" from the POV of a MIME-agnostic MTA). Frank
Received on Friday, 14 March 2008 13:23:57 UTC