Re: File API: Blob.type from Glenn Maynard on 2013-03-20 (public-webapps@w3.org from January to March 2013)

From: Glenn Maynard <glenn@zewt.org>
Date: Tue, 19 Mar 2013 19:52:15 -0500
To: Arun Ranganathan <arun@mozilla.com>
Cc: Arun Ranganathan <aranganathan@mozilla.com>, WebApps WG <public-webapps@w3.org>, Alexey Proskuryakov <ap@webkit.org>, Jonas Sicking <jonas@sicking.cc>, Anne van Kesteren <annevk@annevk.nl>
Message-ID: <CABirCh9pGbu3YC7ONKUavVoeQMgQvew0Pk8SoL2sD+XWfN3Rkw@mail.gmail.com>
On Tue, Mar 19, 2013 at 1:41 PM, Arun Ranganathan <arun@mozilla.com> wrote:

> Stricter rules are in place for "type" both while constructing Blob and
> for slice calls:
>
> http://dev.w3.org/2006/webapi/FileAPI/#constructorBlob
>
> and
>
> http://dev.w3.org/2006/webapi/FileAPI/#slide-method-algo
>


> 2.    Convert every character in relativeContentType to lower case.

I recommend referencing "Converting a string to ASCII lowercase" in HTML.
http://www.whatwg.org/specs/web-apps/current-work/#converted-to-ascii-lowercase

> 1.    If relativeContentType contains any non-ASCII characters, then set
relativeContentType to the empty string and return from these substeps.
> 3.    If relativeContentType contains any line break characters like "CR"
or "LF" or any CTLs or separators, then set relativeContentType to the
empty string and return from these substeps.

#3 is too vague.  I recommend combining #1 and #3, saying: "If any
character in relativeContentType outside of the range U+0020 to U+007E".
That's the printable ASCII range, and excludes all control characters.

> 4.    Parse relativeContentType as an RFC2616 media-type, tokenizing it
according to the ABNF for media-type [RFC2616] with the ASCII "/" character
separating tokens representing the type and subtype productions. If
relativeContentType cannot be tokenized according to the ABNF for
media-type [RFC2616], then set relativeContentType to the empty string and
return from these substeps.

I'm not sure we should be this strict.  I'd lean towards keeping it simple,
allowing any string at all as long as it contains only lowercase, printable
ASCII.

If we really want to be that strict, I recommend specifying what to do
directly instead of by reference to RFC2616.  It's not a very clear
specification, at least for the purposes it's being used for here.  I
recommend not using it as a normative reference at all.

You don't need to say "The following requirements are normative for this
parameter".  That's what the normative language that follows ("must") means.


> So the "type" attribute of a Blob object isn't the *literal* value of the
> header; it's the type of the Blob, expressed as a MIME type.  When
> dereferencing Blob URLs, you get this type back with the Content-Type
> header, as you do normally in HTTP scenarios.  This is a well-understood
> behavior, and I agree with points you've made about not being beholden to
> the RFC when designing an API.
>

For what it's worth, while I'm familiar with how the Content-Type header
works, this wasn't at all clear to me.  To me, a MIME type is
"type/subtype", parameters like charset are metadata included next to a
MIME type (not part of the MIME type itself), and I wouldn't hesitate at
all to say "if(blob.type == 'text/plain')".

(I think the RFC is simply vague on this point, and I'm sure other people
have different interpretations--the point is just that this is a
reasonable, intuitive view of MIME types.)

I think the question here is whether or not to include *separate
> attributes* on the Blob interface for the rarely used Charset Parameter,
> namely anything after the semicolon in MIME types of the sort:
> "text/plain;charset=UTF-8".  I've considered all your arguments by way of
> developer advocacy, and actually think we'll do developers a disservice by
> adding to the Blob interface:
>
> 1. The Charset Parameter consideration applies only to text/plain.  There
> are numerous other MIME types that don't use it: application/*, audio/*,
> image/*, video/*, etc.  Complicating the interface on the off-chance that a
> stray use of the Charset parameter breaks a direct equality comparison is
> "too much API for too little."
>
> 2. The Charset Parameter even in the context of text/plain isn't common
> enough to warrant a special case for text/plain within the API.
>
> 3. In general, it's a pretty stable assumption to conclude that developers
> will expect "type" to be surfaced later along with "Content-Type" when
> dereferencing a Blob URI.  I don't think we've made an assumption that's
> terribly galling.
>

I'm not concerned with exposing parameters; I don't think it's important,
or even necessarily useful.  I only suggested it as an alternative, if the
functionality of being able to manipulate MIME type parameters is wanted.
You're arguing that it's a rarely used special case, which is an argument
for not exposing it at all (not for leaking the special case into .type).

My only concern is that blob.type should never contain parameters.
Comparing it to "text/plain" or "image/jpeg" should work, and not
mysteriously fail a year later when somebody eventually throws a MIME type
parameter into the mix.  Today, all browsers expose text files at
text/plain.  If a browser a year from now decides to call text files with a
UTF-8 BOM "text/plain; charset=UTF-8", it'll break interop.

Additionally, determining a blob's file type seems like the most obvious
use of this property, and making people say "if(blob.type.split(";")[0] ==
'text/plain')" is simply not a good interface.

(I don't know what #3 means.  I'm not saying .type should be removed.)

-- 
Glenn Maynard
Received on Wednesday, 20 March 2013 00:52:46 UTC