Re: File API: Blob.type

Alexey,


On Mar 7, 2013, at 3:02 PM, Alexey Proskuryakov wrote:

> 
> The current File API spec seems to have a mismatch between type in BlobPropertyBag, and type as Blob attribute. The latter declaratively states that the type is an ASCII lower case string. As mentioned by Glenn before, WebKit interpreted this by raising an exception in constructor for non-ASCII input, and lowercasing the string. I think that this is a reasonable reading of the spec. I'd be fine with raising exceptions for invalid types more eagerly.
> 
> This is the text in question:
> 
> (1)
>> type, a DOMString which corresponds to the Blob object's type attribute. If not the empty string, user agents must treat it as an RFC2616 media-type [RFC2616], and as an opaque string that can be ignored if it is an invalid media-type. This value must be used as the Content-Type header when dereferencing a Blob URI.
>> 
> 
> 
> (2)
>> type
>> The ASCII-encoded string in lower case representing the media type of the Blob, expressed as an RFC2046 MIME type [RFC2046]. On getting, conforming user agents must return the MIME type of the Blob, if it is known. If conforming user agents cannot determine the media type of the Blob, they must return the empty string. A string is a valid MIME type if it matches the media-type token defined in section 3.7 "Media Types" of RFC 2616 [RFC2616]. If not the empty string, user agents must treat it as an RFC2616 media-type [RFC2616], and as an opaque string that can be ignored if it is an invalid media-type. This value must be used as the Content-Type header when dereferencing a Blob URI.


This is now clarified; the mismatch is a spec. bug.  Thanks for pointing this out.


> It would be helpful to have the terminology corrected, and to have this generally clarified - for example, validity is mentioned here, but seems to be unused.
> 


Conditions for validity have been clarified; this doesn't warrant throwing a SyntaxError, but it does specify when implementations should ignore poor use of MIME type strings, e.g. here's additional clarification in the slice call:

http://dev.w3.org/2006/webapi/FileAPI/#slide-method-algo


> It seems pretty clear from normative text that charset parameter is supposed to work. A non-normative example supports that too. I agree with Arun that this seems best to keep as is.

+1.


> However, <https://bugs.webkit.org/show_bug.cgi?id=111380> is about a different case - it's about posting multipart form data that has Blob elements with invalid media-types. I'm not even sure which spec is in charge of this behavior - I don't think that anything anywhere says that Blob.type affects media-type of posted multipart data, even though that's obviously the intention. XMLHttpRequest spec defers to HTML, which defers to RFC2388, which mentions files "returned via filling out a form", but not Blobs (which is no surprise given its age).


In fact, I'm not sure if Blob.type should influence the type of multipart form data.  Consider the concatenation of several Blobs into a new Blob, as the Blob constructor allows.  What should the type of a newly constructed Blob be,  if it consists of several differently typed Blobs?  The spec. suggests disregarding the type of each Blob, but encourages the right use of type within the Blob constructor.  

I'm also not sure multipart form data falls under the aegis of the File API, but at least Blobs with invalid types is the same us having no type now (empty string).


> Making Blobs only hold valid media-types would solve practical issues, but it would be helpful to know what formally defines multipart data serialization with blobs.
> 
> We also previously had <https://bugs.webkit.org/attachment.cgi?id=177736&action=review> for sending non-multipart data. Back then, we determined that "Content-Type: " should be sent when the value is invalid. I'm no longer sure if that's right. For this case, XMLHttpRequest authoritatively defines the behavior, although heavily leaning on File API to decide when the type attribute is empty:
> 
>> If the object's type attribute is not the empty string let mime type be its value.
> 
> 
> Note that "mime type" is then directly used as default media-type for Content-Type header, but it's not parsed to set encoding variable. The encoding could be needed to update a charset in author provided Content-Type header field in later steps of the algorithm. This is probably not right, as Blob should know its encoding better than code that sets header fields on an XMLHttpRequest object.
> 


Yes, but implementations can't heuristically determine a Blob's type now.  Type has to be specified correctly or ignored.   What "Blob should know" is now as good as what it is constructed to have as its type, though at read time, thanks to the Encoding Spec, we can determine a fallback encoding.

-- A*

Received on Tuesday, 19 March 2013 18:55:28 UTC