Re: [whatwg/fetch] Clarification of Body package data algorithm with bytes, FormData and multipart/form-data MIME type (#392) from Eero Häkkinen on 2016-11-25 (public-webapps-github@w3.org from November 2016)

From: Eero Häkkinen <notifications@github.com>
Date: Fri, 25 Nov 2016 01:33:26 -0800
To: whatwg/fetch <fetch@noreply.github.com>
Message-ID: <whatwg/fetch/issues/392/262915793@github.com>

I studied related specifications and existing implementations. Based on that:

* I created a pull request for the Fetch Standard: #424
* I also created a pull request for WPT: w3c/web-platform-tests#4248

Here are my findings and recommendations.

## Definitions

In the case of the _multipart/form-data_, according to RFC-7578:

* Each part MUST have a _Content-Disposition_ header field which MUST contain a _name_ parameter and MAY contain a _filename_ parameter.
* Each part MAY have a _Content-Type_ header field which MAY contain a _charset_ parameter.

## Existing implementations

### Server side implementations

I checked the following ones:

* Node.js frameworks:
Always decode non-file parts using a predefined encoding.
* PHP:
Does not decode non-file parts but store raw bytes.
Does not store _charset_ parameters.
* Python 2 cgi:
Does not decode non-file parts but store raw bytes.
Does not store _charset_ parameters.
* Python 3 cgi and web frameworks:
Always decode non-file parts using a predefined encoding.
* Ruby cgi:
Always decode non-file parts using a predefined encoding.

### Web browser implementations

* Firefox
* Chrome patched with https://codereview.chromium.org/2292763002/

## Different kind of _multipart/form-data_ parts

### A part with a _filename_ parameter and with a _Content-Type_ header field

This is a clear case. There is one sensible option:

### A part with a _filename_ parameter but without a _Content-Type_ header field

This is not a clear case. There are basically two sensible options:

* The part must be parsed into an entry whose name is the value of the _name_ parameter and whose value is a **Blob** whose content is the content of the part and whose _type_ attribute is _text/plain_ which is the default _Content-Type_ for a part according to RFC-7578.
This is what Firefox does and what is specified in RFC-7578.
* The part must be parsed into an entry whose name is the value of the _name_ parameter and whose value is a **Blob** whose content is the content of the part and whose _type_ attribute is unset.
This is similar to what all existing server side implementations do but it is against what is specified in RFC-7578.

I am in favor of the first option as it is actually specified and there is an existing browser implementation.

### A part without a _filename_ parameter and a _Content-Type_ header field

This is quite a clear case. There is one sensible option:

In RFC-7578 that encoding which is supplied by an applications of the RFC-7578 specification (namely the Fetch Standard) is called as a _form-charset_.

### A part without a _filename_ parameter with a _Content-Type_ header field without a _charset_ parameter

The _Content-Type_ header field should probably just be ignored. Therefore, this case should be handled in the same way as the previous one.

### A part without a _filename_ parameter with a _Content-Type_ header field with a _charset_ parameter which specifies a _utf-8_ encoding

Regardless of whether the _charset_ parameter is ignored or not, the decoding encoding is _utf-8_. Therefore, this case should be handled in the same way as the two previous ones.

### A part without a _filename_ parameter with a _Content-Type_ header field with a _charset_ parameter which does not specify a _utf-8_ encoding

This is not a clear case. There are basically two sensible options:

* The part must be parsed into an entry whose name is the value of the _name_ parameter and whose value is a **USVString** whose content is the _utf-8_ decoded content of the part because the Fetch Standard specifies that _multipart/form-data_ parsing is done using _utf-8_ as encoding.
This is basically the same or similar to what all existing implementations do if they expect _utf-8_ input as the Fetch Standard does.
Some server side implementations (such as PHP and Python 2 cgi) do not decode at all but store raw bytes. However, they do not store the _charset_ parameters either, so web backends using them cannot do other than expect that the stored raw bytes are encoded using the expected encoding.
* The part must be parsed into an entry whose name is the value of the _name_ parameter and whose value is a **USVString** whose content is the decoded content of the part. The decoding is done using an encoding specified by the _charset_ parameter.
There are no implementations which do this.

I do not think that the Fetch Standard is a correct place to introduce a completely new way to handle _multipart/form-data_ parts. Therefore, I am in favor of the first option.

### A part with a _name_ parameter equal to _\_charset\__

This is not a clear case. There are basically two sensible options:

* The part is parsed like any other part.
This is basically the same or similar to what all existing implementations do.
* The part is parsed and the parsed content of the part is used as a new encoding for subsequent parts.
There are basically no implementations which do this.
Some server side implementations (such as PHP and Python 2 cgi) do not decode at all but store raw bytes, so web backends using them could use the parsed value for deciding decoding encoding. However, in the case of the Fetch Standard, that would mean that non-file parts should be parsed into **ArrayBuffer**s instead of being parsed into **USVString**s so that web applications could do the decoding.

I do not think that the Fetch Standard is a correct place to introduce a completely new way to handle _multipart/form-data_ parts which has not been implemented by any server side or browser implementation. Therefore, I am in favor of the first option.

## My recommendations

A part with a _filename_ parameter must be parsed into an entry whose values is a **Blob** whose content is part’s content and _type_ attribute has the value of part’s _Content-Type_ header if the part has such header or _text/plain_ otherwise.

A part without a _filename_ parameter must be parsed into an entry whose name is the value of the _name_ parameter and whose value is a **USVString** whose content is the _utf-8_ decoded content of the part.

A part whose _Content-Disposition_ header contains a _name_ parameter whose value is _\_charset\__ is parsed like any other part. It does not change the encoding.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/fetch/issues/392#issuecomment-262915793

Received on Friday, 25 November 2016 09:34:02 UTC