Re: Content-Disposition next steps

Hi,

As I wrote below, I really appreciate Julian and Adam for working on this
long-standing issue at long last.  I should have joined the effort much
sooner. Anyway, whether it's a single RFC or two RFCs (one on standard-track
and the other being informational), I'm with Adam that it'd better reflect
the 'reality' of what web servers do in the wild.

2010/12/16 Julian Reschke <julian.reschke@gmx.de>

> On 16.12.2010 19:29, Jungshik Shin (신정식, 申政湜) wrote:
>
>>
>>
>> On Wed, Dec 15, 2010 at 10:58 PM, Maciej Stachowiak <mjs@apple.com
>> <mailto:mjs@apple.com>> wrote:
>>
>>
>>    On Dec 15, 2010, at 10:46 PM, Mark Nottingham wrote:
>>
>>     > Because (if I read the original message correctly -- please
>>    correct me if I'm wrong) they're sniffing the UA to do it, and if
>>    they do that, they'll presumably adapt their sniffing based upon
>>    changes in the browser market (as anyone who sniffs and believes
>>    that they don't have to monitor the market tends to get bitten, hard).
>>
>>
>> Yes, gmail sniffs the UA and emits RFC 2047 for Firefox and Chrome and
>> RFC 5987 (RFC 2231) for Opera. The change to emit RFC 5987 for Opera was
>> made rather recently (before that, non-ASCII characters were just tunred
>> to question marks for Opera). Anyway, in case of gmail, it's relatively
>> easy to make (at least I used to know where the code is and I hope it's
>> still there for me to make a quick change).  Some other google products
>> just turns non-ASCII characters to question marks for all the UAs (e.g.
>> Google Docs) :-)   Obviously, it's a rather embarrasing bug to fix.
>>
>
> That would be great.
>
> Actually, I'd change to emit RFC5987 for all UAs *except* those which do
> not support it yet (IE, Chrome, Safari).
>
>
>     Typically in cases like this you want to get sites to change before
>>    breaking them. Often it takes surprisingly long for changes like
>>    this to get implemented and pushed in a large-scale site, even for a
>>    seemingly simple change.
>>
>>
>> Yes, sites have to change before RFC 2047 support is dropped in Chrome
>> and Firefox.
>>
>
> Do we happen to know *which* other sites?
>

I found a few other Google products emit RFC 2047 for Firefox and Chrome. As
for Opera,  I must have been hallucinating !  I can almost swear that I saw
a change go in to make gmail  emit RFC 2231 (it's before RFC 5987) for
Opera, but it turned out that it's not the case, yet. I'm sorry for the
incorrect statement in the previous email.  On the other hand,  I found that
Google Sites emits 'RFC 2231'  for Firefox, but not for Opera (let alone
other browsers).

BTW, Google search turned up a 7-yr old  mail thread at www-international (
http://blog.gmane.org/gmane.org.w3c.internationalization.general/month=20031101
)
where I did advocate for RFC 2231 :-) but others (out of 'practical'
needs/concerns) recommended using RFC 2047. It's old and I have little idea
how wide-spread the practice is now, but it's an indication that using RFC
2047 (used to) has (have) some following. That may have been partly due to
HTML4's and RFC 2388's references to RFC 2047 (2045) although they're about
uploading a file as a part of multipart-form data.

Anyway, this issue should have been escalated a long long ago and I very
much appreciate Julian's and Adam's efforts to nail this down.

As for what's actually being emitted by web servers, my guess is that raw
byte sequences in various encodings (including UTF-8) are most common
followed by simple %-encoding (again in various encodings), which is in turn
followed by RFC 2047 (depending on UA web servers talk to) with RFC 5987
emitted by a small number of servers.

I wish I had some cycles to collect stats for C-D headers (over
'attachments' Google crawled). I'll ask around if anybody has done that
(when I did a year or so ago, I couldn't find any). However,  I'm
afraid  just tallying C-D headers of  'attachments' that were already
crawled (e.g. Google corpora) wouldn't work.  I guess we need to crawl a
good/representative sample of web sites again "masqurading as Firefox (or
other browsers)" because most web sites (that emit RFC 2047) are likely to
do that only for Firefox (and in some cases, Chrome).

As an alternative to the above, I wonder if browsers can collect some stats
on the format of C-D header.

Jungshik

Received on Tuesday, 21 December 2010 00:27:58 UTC