[Bug 16909] multipart/form-data: field name encoding is not specified; browsers do incompatible things

https://www.w3.org/Bugs/Public/show_bug.cgi?id=16909

--- Comment #2 from Evan Jones <evanj@csail.mit.edu> 2012-05-02 20:41:21 UTC ---
Argh; whoops. Sorry for the bugzilla spam. I didn't realize that the "comment"
thingy just filed a bugzilla bug.

HTML5 states: "Encode the (now mutated) form data set using the rules described
by RFC 2388". However, it then modifies the rules:

"The parts of the generated multipart/form-data resource that correspond to
non-file fields must not have a Content-Type header specified. Their names and
values must be encoded using the character encoding selected above (field names
in particular do not get converted to a 7-bit safe encoding as suggested in RFC
2388)."

http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data

So the problem is: what are we supposed to do with field names? In particular,
what if they contain "special" MIME characters (e.g. \r\n newlines,
backslashes, double quotes, or semi-colons?). Different browsers do different
things, meaning that currently server code must detect the browser to do the
right thing.


Example: <input name='bàz%22\"\' value="foo">

Firefox 13b: Content-Disposition: form-data; name="bàz%22\\"\"
Webkit nightly: Content-Disposition: form-data; name="bàz%22\%22\"

Firefox backslash quotes double quotes, except it fails to quote backslashes.
This means its header fails to parse according to the MIME specification (it
sort of decodes as bàz%22\ with an extra trailing \"

Webkit %-escapes the double quotes, but does not %-escape the percent. Thus the
above form control could be either name='bàz"\"\' or the desired name. Webkit
has a bug open on this issue, asking for specification guidance:
https://bugs.webkit.org/show_bug.cgi?id=62107


HTML5 should specify exactly how field names are encoded. Some potential
solutions:

1) Bless Firefox's backslash quoting rules (they are very weird but I think
they are unambiguous?). This means Webkit POSTs will be decoded to the wrong
field names, and POSTs to older servers may parse incorrectly if the name
includes a \ (but that must already happen for Firefox?).

2) Bless Webkit's percent escaping rules (ideally also escaping %). Servers
that strictly parse this format will fail to parse Firefox POSTs if the name
includes a \, and will 

3) Adopt RFC 6266's approach of having two name parameters when there are
special characters: one with the existing escaping, and one with an
unambiguously escaped version. Ideally, existing servers will parse the first
name and not break unless the form value contains a special character. As
servers are upgraded, they will be able to unambiguously parse the new header.
See: http://tools.ietf.org/html/rfc6266


Aside: The *same* issue happens for uploaded file names. I started a mailing
list thread to attempt to collect more information about this:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-May/035610.html

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Wednesday, 2 May 2012 20:41:25 UTC