Re: [whatwg/fetch] Clarification of Body package data algorithm with bytes, FormData and multipart/form-data MIME type (#392)

> See also https://www.w3.org/Bugs/Public/show_bug.cgi?id=16909. It's a known issue and thus far nobody has taken the time to solve it once and for all. Perhaps I can convince you to solve it for the web?

That bug seems to be a quite different although related issue. The bug is about encoding and decoding of form field names while my issue is about parsing multipart part octets into form field values and I am especially interested to clarify what parsing steps should be done before before a possible character decoding.

> Should a multipart part with a name "\_charset_" have no effect on decoding encoding?
I think that probably yes, because RFC2388 does not even describe that name (but RFC7578 does) and because the body package data algorithm with bytes, FormData and application/x-www-form-urlencoded MIME type runs the application/x-www-form-urlencoded parser steps with a use_charset_flag implicitly unset thus it does not change decoding encoding based on the "\_charset_" component.

In addition to my previous reasoning, https://encoding.spec.whatwg.org/ requires the UTF-8 encoding for existing formats deployed in new contexts (and Fetch API probably is a context). So a method (such as a multipart part with a name "\_charset_") to specify an encoding other than UTF-8 must not be supported.

The main question which I have is that with the following body:
> --boundary
> Content-Disposition: form-data; name="field1"
> Content-Type: text/html; charset=iso-8859-1
>
> ... ISO-8859-1 encoded HTML content ...
> --boundary--

Should the Body package data algorithm with bytes quoted above, FormData and multipart/form-data; boundary="boundary" MIME type

1. return a FormData object with one entry whose value is a **Blob** object with a MIME type "text/html; charset=iso-8859-1" containing raw content bytes from the multipart part body. In this case the content bytes are NOT decoded as **Blob** contains bytes and not characters. 
2. return a FormData object with one entry whose value is a **USVString** object containing UTF-8 decoded content bytes from the multipart part body (if that happens not to cause a decoding error). In this case the content type is completely ignored. This is what Firefox does.
3. return a FormData object with one entry whose value is a **USVString** object containing ISO-8859-1 decoded content bytes from the multipart part body. In this case the content type is ignored but its charset parameter is not ignored. This violates the Encoding Standard as a non-UTF-8 decoding is used in a new context.
4. throw a **TypeError** because the multipart part is missing a Content-Disposition filename parameter although it seems to represent a **Blob** because it has a Content-Type header field.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/fetch/issues/392#issuecomment-251069490

Received on Monday, 3 October 2016 10:01:28 UTC