Re: File API: Blob.type from Arun Ranganathan on 2013-03-19 (public-webapps@w3.org from January to March 2013)

From: Arun Ranganathan <arun@mozilla.com>
Date: Tue, 19 Mar 2013 14:41:58 -0400
To: Glenn Maynard <glenn@zewt.org>
Cc: Arun Ranganathan <aranganathan@mozilla.com>, WebApps WG <public-webapps@w3.org>, Alexey Proskuryakov <ap@webkit.org>, Jonas Sicking <jonas@sicking.cc>, Anne van Kesteren <annevk@annevk.nl>
Message-Id: <F2CA8000-E3FF-4CD0-B124-CFA5C5A8A226@mozilla.com>

On Mar 7, 2013, at 7:19 PM, Glenn Maynard wrote:

> Chrome, at least, throws on new Blob([], {type: "漢字"}), as well as lowercasing the string.
> 

Stricter rules are in place for "type" both while constructing Blob and for slice calls:

http://dev.w3.org/2006/webapi/FileAPI/#constructorBlob

and 

http://dev.w3.org/2006/webapi/FileAPI/#slide-method-algo

I agree with previous comments you've made about ByteString not solving any problems that Anne vK brings up; instead, I think using DOMString is probably ok, with tighter rules on what is valid and what should be ignored.  Throwing a SyntaxError might be overkill to developers and a bit too punitive; instead, I advocate sticking with the original spirit of the opaque string idea and ignoring bad use of "type."

> A couple points:
> 
> - I disagree that we should discourage comparing against Blob.type, but ultimately it's such an obvious use of the property, people will do it whether it's encouraged or not.  I'd never give it a second thought, since that appears to be its very purpose.  Web APIs should be designed defensively around how people will actually use the API, not how we wish they would.  Unless lots of Blob.type parameters actually include parameters, code will break unexpectedly when it ends up encountering one.
> - The RFC defines a protocol ("Content-Type"), not a JavaScript API, and a good protocols are rarely good APIs.  Having Blob.type be the literal value of a Content-Type header isn't an elegant API.  You shouldn't need to do parsing of a string value to extract "text/plain", and you shouldn't have to do serialization to get "text/plain; charset=UTF-8".
> 

So the "type" attribute of a Blob object isn't the *literal* value of the header; it's the type of the Blob, expressed as a MIME type.  When dereferencing Blob URLs, you get this type back with the Content-Type header, as you do normally in HTTP scenarios.  This is a well-understood behavior, and I agree with points you've made about not being beholden to the RFC when designing an API.  

I think the question here is whether or not to include *separate attributes* on the Blob interface for the rarely used Charset Parameter, namely anything after the semicolon in MIME types of the sort: "text/plain;charset=UTF-8".  I've considered all your arguments by way of developer advocacy, and actually think we'll do developers a disservice by adding to the Blob interface:

1. The Charset Parameter consideration applies only to text/plain.  There are numerous other MIME types that don't use it: application/*, audio/*, image/*, video/*, etc.  Complicating the interface on the off-chance that a stray use of the Charset parameter breaks a direct equality comparison is "too much API for too little."

2. The Charset Parameter even in the context of text/plain isn't common enough to warrant a special case for text/plain within the API.

3. In general, it's a pretty stable assumption to conclude that developers will expect "type" to be surfaced later along with "Content-Type" when dereferencing a Blob URI.  I don't think we've made an assumption that's terribly galling.

-- A*

Received on Tuesday, 19 March 2013 18:42:28 UTC