Re: File API: Blob.type

The current File API spec seems to have a mismatch between type in BlobPropertyBag, and type as Blob attribute. The latter declaratively states that the type is an ASCII lower case string. As mentioned by Glenn before, WebKit interpreted this by raising an exception in constructor for non-ASCII input, and lowercasing the string. I think that this is a reasonable reading of the spec. I'd be fine with raising exceptions for invalid types more eagerly.

This is the text in question:

(1)
> type, a DOMString which corresponds to the Blob object's type attribute. If not the empty string, user agents must treat it as an RFC2616 media-type [RFC2616], and as an opaque string that can be ignored if it is an invalid media-type. This value must be used as the Content-Type header when dereferencing a Blob URI.
> 


(2)
> type
> The ASCII-encoded string in lower case representing the media type of the Blob, expressed as an RFC2046 MIME type [RFC2046]. On getting, conforming user agents must return the MIME type of the Blob, if it is known. If conforming user agents cannot determine the media type of the Blob, they must return the empty string. A string is a valid MIME type if it matches the media-type token defined in section 3.7 "Media Types" of RFC 2616 [RFC2616]. If not the empty string, user agents must treat it as an RFC2616 media-type [RFC2616], and as an opaque string that can be ignored if it is an invalid media-type. This value must be used as the Content-Type header when dereferencing a Blob URI.


It would be helpful to have the terminology corrected, and to have this generally clarified - for example, validity is mentioned here, but seems to be unused.

It seems pretty clear from normative text that charset parameter is supposed to work. A non-normative example supports that too. I agree with Arun that this seems best to keep as is.

However, <https://bugs.webkit.org/show_bug.cgi?id=111380> is about a different case - it's about posting multipart form data that has Blob elements with invalid media-types. I'm not even sure which spec is in charge of this behavior - I don't think that anything anywhere says that Blob.type affects media-type of posted multipart data, even though that's obviously the intention. XMLHttpRequest spec defers to HTML, which defers to RFC2388, which mentions files "returned via filling out a form", but not Blobs (which is no surprise given its age).

Making Blobs only hold valid media-types would solve practical issues, but it would be helpful to know what formally defines multipart data serialization with blobs.

We also previously had <https://bugs.webkit.org/attachment.cgi?id=177736&action=review> for sending non-multipart data. Back then, we determined that "Content-Type: " should be sent when the value is invalid. I'm no longer sure if that's right. For this case, XMLHttpRequest authoritatively defines the behavior, although heavily leaning on File API to decide when the type attribute is empty:

> If the object's type attribute is not the empty string let mime type be its value.


Note that "mime type" is then directly used as default media-type for Content-Type header, but it's not parsed to set encoding variable. The encoding could be needed to update a charset in author provided Content-Type header field in later steps of the algorithm. This is probably not right, as Blob should know its encoding better than code that sets header fields on an XMLHttpRequest object.

- WBR, Alexey Proskuryakov

Received on Thursday, 7 March 2013 20:03:13 UTC