Given what's below, I wonder whether specifying a single encoding for new headers is a practical thing to do; we already may end up referring to 3987, 2822, 2047, and 2231 for existing headers, because there are appropriate encodings from each of those domains (URIs, e- mail addresses, and so on). If that's the case, maybe we should just recommend that new headers use the most appropriate encoding scheme to their domain, list a few examples (see above), and fall back to recommending (say) \u'nnnnnn' from BCP137 if nothing more specific applies. On 17/03/2008, at 11:48 PM, Julian Reschke wrote: >> If so, the next step would be to craft recommendations / >> requirements about what that mechanism will be. Possibilities >> discussed; >> a) RFC2047 > > I haven't seen any evidence this being implemented. > >> b) UTF-8 > > Unfortunately, RFC2616, Section 4.2 currently states: > > message-header = field-name ":" [ field-value ] > field-name = token > field-value = *( field-content | LWS ) > field-content = <the OCTETs making up the field-value > and consisting of either *TEXT or combinations > of token, separators, and quoted-string> > > Thus, if we take that as final word, we can't use anything but > Latin1, thus need to encode non-Latin-1 characters. > >> c) Something from BCP137 section 5 > > ...which would be \u'nnnnnn' or &#xnnnnnn;... > >> d) IRI->URI > >> Separately, we'd need to open new issues for specifying these >> encodings for the field-values of: >> - From > > ...this one is currently defined in terms of RFC2822, Section 3.4... > >> - Warning > > Currently explicitly refers to RFC2047. > >> - Content-Location > > - Location > > - Referer > > These are URI references. No non-ASCII characters anyway. > >> - Content-Dispostion (?) > > Content-Disposition uses I18N *inside* the parameters, for which > there already is RFC2231. -- Mark Nottingham http://www.mnot.net/Received on Tuesday, 18 March 2008 03:19:01 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 6 June 2008 08:04:35 GMT