Re: File API: Blob.type

On Apr 5, 2013, at 3:17 PM, Alexey Proskuryakov wrote:

> 
> 03 апр. 2013 г., в 13:11, Arun Ranganathan <arun@mozilla.com> написал(а):
> 
>>> My only concern is that blob.type should never contain parameters.  Comparing it to "text/plain" or "image/jpeg" should work, and not mysteriously fail a year later when somebody eventually throws a MIME type parameter into the mix.  Today, all browsers expose text files at text/plain.  If a browser a year from now decides to call text files with a UTF-8 BOM "text/plain; charset=UTF-8", it'll break interop.
> 
> What specifies how a File gets its type? The only requirement I can find is that "User agents must not attempt heuristic determination of type", which I think implies that something like inputElement.files[0].type is always "" for a file chosen by a user via <input type=file>.


The spec. now overreaches a bit :-( 

Not allowing heuristic mechanisms was merely to restrict encoding determination as per at lease one implementation's experience with it being substandard: https://bugzilla.mozilla.org/show_bug.cgi?id=848842

But now maybe we're going a bit far.  Should we standardize how UAs do auto-detect of file type, including something about extensions and some BOM methods?  This seems to be complicated and may be unnecessary -- most UAs do this just about right in the absence of a standard.


> Guessing MIME type from file name or metadata is always a heuristic, as not all platforms will know that "archive.sit" means "application/x-stuffit".
> 
> At the same time, browsers do autodetect types for many files. We'll need to autodetect when serializing a form for submission anyway, so exposing this information a little earlier only makes sense.
> 
> I think that these concerns can be resolved by specifying what File.type is more explicitly. The spec can just say that parameters are not allowed in the browser chosen type.


That seems sensible!  By *not* allowing charset parameters in types determined by UAs, these are now set by web applications only, which may mitigate Glenn's concerns.

Maybe the way forward is to leave this to UAs, and:

1. Say UAs should return file type, if known.
2. UAs must not use heuristics or statistical methods to determine encoding and
3. UAs must not set the charset parameter in the returned type for text/plain.  This will then defer to the encoding spec. and attempt fallback decoding.  Where a web application sets a charset parameter, this will do the right thing for readAsText with fallback decoding.

> 
>>> Additionally, determining a blob's file type seems like the most obvious use of this property, and making people say "if(blob.type.split(";")[0] == 'text/plain')" is simply not a good interface.
>> 
>> 
>> OK -- you're strongly opinionated on the matter of NOT allowing a charset parameter.  I'd like to see if implementers who had an opinion on its usefulness can weigh in -- Darin?  Alexey?
> 
> 
> I do not have a very strong opinion. I like the simpler API of passing parameters through the type attribute, as it's specified currently. This also matches XMLHttpRequest API better. And of course, keeping existing behavior means that we won't break the web.

I like it too.  We keep charset, but don't let user agents set it for auto-detected files; it can only be set with a Blob constructor or a slice call.  Blob.type is a string that can be set by developers and has normative requirements that are not strict tokenization requirements, so I think we're fine here.

-- A*

Received on Friday, 5 April 2013 20:55:48 UTC