[whatwg] multipart/form-data filename encoding: unicode and special characters

On May 1, 2012, at 22:38 , Ashley Sheridan wrote:
> The Webkit method looks the better of the two with regards to how
> server-side languages might interpret it, but it would need work to
> ensure everything that should be escaped is, and that everything that is
> unescaped on the server should be and is done so correctly.

The problem is that currently I am unable to correctly "round trip" an uploaded file name. I would like users to upload a file, and be able to later download the file with the *exact same* file name. If you follow the specifications, this is not possible. Firefox is closer to the MIME RFCs (which specifies backslash quoting in quoted-strings), but apparently that will break IE6, 7, and 8:

https://bugs.webkit.org/show_bug.cgi?id=62107
http://java.net/jira/browse/JERSEY-759

Webkit's %-escaping behaviour is *not* part of the referenced MIME RFCs (which specifies either backslash quoting in quoted-strings, base64 encoding, or %-escaping in special "filename*=" arguments). Thus, if this is the "right answer," it should be specified somewhere. I'm assuming that this needs to be in the HTML5 spec, since HTTP calls this the "body" of the the POST and declares that it is outside the HTTP specification.

Webkit's escaping is also flawed (see bug 62107 above). Files with that contain %-escapes (eg. my%22file.txt, admittedly very rare) will get mangled, because there is no difference between my%22file.txt and my"file.txt.

Currently, I need to detect the browser in order to figure out what kind of unescaping to apply to the file name, and even then in some cases I can't figure out what the right file name is. Webkit claims this is a specification bug, so I'm hoping someone here might tell me if this is the case, and if so where can I file bugs, create test cases, etc?

Evan

--
http://evanjones.ca/

Received on Wednesday, 2 May 2012 04:05:13 UTC