Re: Ticket 262: Discuss whether percent-decoding should also be done by receivers.

On 02/11/2010, at 2:07 PM, Adam Barth wrote:

> On Mon, Nov 1, 2010 at 5:43 PM, Mark Nottingham <mnot@mnot.net> wrote:
>> 
>> Could you expand upon this a bit more? E.g., are you saying that after the 5987 encoding is removed, the resulting string should be percent-decoded? Or that the filename (no *) parameter should be percent-decoded? Both?
> 
> Assume, for the moment, we have an algorithm for extracting a sequence
> of bytes called the filename-parameter-value from the
> Content-Disposition header field value.  Here's my proposal:
> 
> 1) Let the users-default-encoding be the user's preferred encoded
> (e.g., as configured as the default encoding used by the user's
> operating system).
> 2) Let unescaped-filename-parameter-value be the result of applying
> the percent-unescaping algorithm to filename-parameter-value.
> 3) Let the requested-filename by the result of decoding the
> unescaped-filename-parameter-value with the users-default-encoding.
> 
>> Raising as a placeholder:
>>  http://trac.tools.ietf.org/wg/httpbis/trac/ticket/262
>> 
>> I suspect that this issue is going to be similar to #259; i.e., the answer may be different for different implementers. As such, we should try to figure out if that's the case (and why) first.
> 
> The use case is that this behavior is required to compete in some
> segments of the browser market.  Therefore, some number of popular
> user agents will not remove support for these semantics.

Well, *some* form of internationalisation is necessary to compete in segments of the browser market; it's not (yet) clear that *your* form of internationalisation is.

AFAICT this argument boils down to:
 1) FF, Safari and Konqueror support the RFC5987-stype encoding, with some caveats re: ordering and precedence.
  2) IE and Chrome support %-encoding the filename parameter, with some caveats re: charset.

You (representing Chrome?) are arguing for #2. 

IE (unfortunately, although true to form) hasn't said much about this, but based on Eric Lawrence's blog <http://blogs.msdn.com/b/ieinternals/archive/2010/06/07/content-disposition-attachment-and-international-unicode-characters.aspx>, it looks like they don't have a problem with using the RFC5987-style encoding, but do have a strong preference for using a new parameter name, rather than an existing one.

To me, the strongest argument against #2 is that specifying percent-encoding for filename automatically brings out interoperability issues with existing, deployed browsers that don't support encoding in the filename parameter. 

OTOH #1 has a fallback strategy; by placing the filename parameter first, a server can serve an internationalised filename to a UA that can use it, whilst falling back to a "vanilla" filename for those who don't. It's true that support for this isn't widespread now (according to <http://greenbytes.de/tech/tc2231/#attfnboth>), but at least there's a strategy in place for getting from here to there without breaking anything. To make it complete, I agree with the notion that we can warn servers away from generating filename parameters with literal % in them, because there's no interop there.

Overall, then, I don't see how we can get to interop cleanly on percent-encoding the filename parameter, or why we should do so. Yes, some implementations do it today, but that does't lead to the conclusion that we can or should specify their behaviour for all other implementations -- especially when that solution makes things more complex (re: charset). 

What am I missing?


--
Mark Nottingham   http://www.mnot.net/

Received on Wednesday, 3 November 2010 01:37:36 UTC