W3C home > Mailing lists > Public > whatwg@whatwg.org > May 2012

[whatwg] multipart/form-data filename encoding: unicode and special characters

From: Evan Jones <evanj@csail.mit.edu>
Date: Tue, 1 May 2012 21:12:36 -0400
Message-ID: <EE632721-429A-43A2-A22F-8BEA2AAB5D34@csail.mit.edu>
I am not an experienced web standards wonk, so please forgive me if I'm making a mistake here.

When uploading files that contain special characters in their name, it appears to me that it is unspecified as to how those file names should be escaped. As a result, Webkit/Safari/Chrome appear to handle these filenames in one way, while Firefox handles them in another. I'm implementing the server side of this equation, and it is unclear to me what I should be doing. Am I missing something? Webkit even has a bug on this issue that states "I suggest working with WHATWG or HTML WG to get something specified in HTML5, and getting browsers converge on that." Is anyone working on this?


EXAMPLE

Create a file named: b?z'\"hi%22.txt  eg. using the unix command: touch b?z\'\\\"hi%22.txt


Firefox (13.0 beta on Mac) sends the following header, backslash escaping the double quote but not escaping the backslash.

Content-Disposition: form-data; name="somefile"; filename="b?z'\\"hi%22.txt"


Webkit (latest nightly r115711 on Mac): %-escapes the double quote, but does nothing to the literal %

Content-Disposition: form-data; name="somefile"; filename="b?z'\%22hi%22.txt"


THE SPECS: HTML5 states:

http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data

Encode the (now mutated) form data set using the rules described by RFC 2388. [?] File names [?] must use the character encoding selected above, though the precise name may be approximated if necessary (e.g. [?]). User agents must not use the RFC 2231 encoding suggested by RFC 2388.


? this seems contradictory: Encode using RFC 2388, but do not using the encoding suggested by the RFC. Worse, no browser actually follows the RFC (e.g. they all use UTF-8 encoded parameter values), so that doesn't seem like the right answer. Is there a way out of this mess?

Evan

--
http://evanjones.ca/
Received on Tuesday, 1 May 2012 18:12:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 30 January 2013 18:48:08 GMT