- From: 신정식, 申政湜 <jungshik@google.com>
- Date: Mon, 20 Dec 2010 16:27:28 -0800
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Maciej Stachowiak <mjs@apple.com>, Mark Nottingham <mnot@mnot.net>, Adam Barth <ietf@adambarth.com>, HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <AANLkTim2at_7UNU2esD2L3gshjKi7Gj0uCRtD4pO_Ede@mail.gmail.com>
Hi, As I wrote below, I really appreciate Julian and Adam for working on this long-standing issue at long last. I should have joined the effort much sooner. Anyway, whether it's a single RFC or two RFCs (one on standard-track and the other being informational), I'm with Adam that it'd better reflect the 'reality' of what web servers do in the wild. 2010/12/16 Julian Reschke <julian.reschke@gmx.de> > On 16.12.2010 19:29, Jungshik Shin (신정식, 申政湜) wrote: > >> >> >> On Wed, Dec 15, 2010 at 10:58 PM, Maciej Stachowiak <mjs@apple.com >> <mailto:mjs@apple.com>> wrote: >> >> >> On Dec 15, 2010, at 10:46 PM, Mark Nottingham wrote: >> >> > Because (if I read the original message correctly -- please >> correct me if I'm wrong) they're sniffing the UA to do it, and if >> they do that, they'll presumably adapt their sniffing based upon >> changes in the browser market (as anyone who sniffs and believes >> that they don't have to monitor the market tends to get bitten, hard). >> >> >> Yes, gmail sniffs the UA and emits RFC 2047 for Firefox and Chrome and >> RFC 5987 (RFC 2231) for Opera. The change to emit RFC 5987 for Opera was >> made rather recently (before that, non-ASCII characters were just tunred >> to question marks for Opera). Anyway, in case of gmail, it's relatively >> easy to make (at least I used to know where the code is and I hope it's >> still there for me to make a quick change). Some other google products >> just turns non-ASCII characters to question marks for all the UAs (e.g. >> Google Docs) :-) Obviously, it's a rather embarrasing bug to fix. >> > > That would be great. > > Actually, I'd change to emit RFC5987 for all UAs *except* those which do > not support it yet (IE, Chrome, Safari). > > > Typically in cases like this you want to get sites to change before >> breaking them. Often it takes surprisingly long for changes like >> this to get implemented and pushed in a large-scale site, even for a >> seemingly simple change. >> >> >> Yes, sites have to change before RFC 2047 support is dropped in Chrome >> and Firefox. >> > > Do we happen to know *which* other sites? > I found a few other Google products emit RFC 2047 for Firefox and Chrome. As for Opera, I must have been hallucinating ! I can almost swear that I saw a change go in to make gmail emit RFC 2231 (it's before RFC 5987) for Opera, but it turned out that it's not the case, yet. I'm sorry for the incorrect statement in the previous email. On the other hand, I found that Google Sites emits 'RFC 2231' for Firefox, but not for Opera (let alone other browsers). BTW, Google search turned up a 7-yr old mail thread at www-international ( http://blog.gmane.org/gmane.org.w3c.internationalization.general/month=20031101 ) where I did advocate for RFC 2231 :-) but others (out of 'practical' needs/concerns) recommended using RFC 2047. It's old and I have little idea how wide-spread the practice is now, but it's an indication that using RFC 2047 (used to) has (have) some following. That may have been partly due to HTML4's and RFC 2388's references to RFC 2047 (2045) although they're about uploading a file as a part of multipart-form data. Anyway, this issue should have been escalated a long long ago and I very much appreciate Julian's and Adam's efforts to nail this down. As for what's actually being emitted by web servers, my guess is that raw byte sequences in various encodings (including UTF-8) are most common followed by simple %-encoding (again in various encodings), which is in turn followed by RFC 2047 (depending on UA web servers talk to) with RFC 5987 emitted by a small number of servers. I wish I had some cycles to collect stats for C-D headers (over 'attachments' Google crawled). I'll ask around if anybody has done that (when I did a year or so ago, I couldn't find any). However, I'm afraid just tallying C-D headers of 'attachments' that were already crawled (e.g. Google corpora) wouldn't work. I guess we need to crawl a good/representative sample of web sites again "masqurading as Firefox (or other browsers)" because most web sites (that emit RFC 2047) are likely to do that only for Firefox (and in some cases, Chrome). As an alternative to the above, I wonder if browsers can collect some stats on the format of C-D header. Jungshik
Received on Tuesday, 21 December 2010 00:27:58 UTC