- From: Eero Häkkinen <notifications@github.com>
- Date: Mon, 03 Oct 2016 03:00:57 -0700
- To: whatwg/fetch <fetch@noreply.github.com>
- Message-ID: <whatwg/fetch/issues/392/251069490@github.com>
> See also https://www.w3.org/Bugs/Public/show_bug.cgi?id=16909. It's a known issue and thus far nobody has taken the time to solve it once and for all. Perhaps I can convince you to solve it for the web? That bug seems to be a quite different although related issue. The bug is about encoding and decoding of form field names while my issue is about parsing multipart part octets into form field values and I am especially interested to clarify what parsing steps should be done before before a possible character decoding. > Should a multipart part with a name "\_charset_" have no effect on decoding encoding? I think that probably yes, because RFC2388 does not even describe that name (but RFC7578 does) and because the body package data algorithm with bytes, FormData and application/x-www-form-urlencoded MIME type runs the application/x-www-form-urlencoded parser steps with a use_charset_flag implicitly unset thus it does not change decoding encoding based on the "\_charset_" component. In addition to my previous reasoning, https://encoding.spec.whatwg.org/ requires the UTF-8 encoding for existing formats deployed in new contexts (and Fetch API probably is a context). So a method (such as a multipart part with a name "\_charset_") to specify an encoding other than UTF-8 must not be supported. The main question which I have is that with the following body: > --boundary > Content-Disposition: form-data; name="field1" > Content-Type: text/html; charset=iso-8859-1 > > ... ISO-8859-1 encoded HTML content ... > --boundary-- Should the Body package data algorithm with bytes quoted above, FormData and multipart/form-data; boundary="boundary" MIME type 1. return a FormData object with one entry whose value is a **Blob** object with a MIME type "text/html; charset=iso-8859-1" containing raw content bytes from the multipart part body. In this case the content bytes are NOT decoded as **Blob** contains bytes and not characters. 2. return a FormData object with one entry whose value is a **USVString** object containing UTF-8 decoded content bytes from the multipart part body (if that happens not to cause a decoding error). In this case the content type is completely ignored. This is what Firefox does. 3. return a FormData object with one entry whose value is a **USVString** object containing ISO-8859-1 decoded content bytes from the multipart part body. In this case the content type is ignored but its charset parameter is not ignored. This violates the Encoding Standard as a non-UTF-8 decoding is used in a new context. 4. throw a **TypeError** because the multipart part is missing a Content-Disposition filename parameter although it seems to represent a **Blob** because it has a Content-Type header field. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/fetch/issues/392#issuecomment-251069490
Received on Monday, 3 October 2016 10:01:28 UTC