Re: Discussion of Blob URI Scheme for Binary Data Access | IETF from Arun Ranganathan on 2011-05-15 (uri@w3.org from May 2011)

From: Arun Ranganathan <arun@mozilla.com>
Date: Sun, 15 May 2011 17:33:07 -0400
To: Joseph Anthony Pasquale Holsten <joseph@josephholsten.com>
CC: uri@w3.org
Message-ID: <4DD04693.4040800@mozilla.com>
Greetings Joseph, and thanks for your response :)


On 5/13/11 7:03 PM, Joseph Anthony Pasquale Holsten wrote:
> On May 13, 2011, at 1:05 PM, Arun Ranganathan wrote:
>
>> [T]he File API introduces a URI scheme for Blob access [4].  The URI scheme uses a subset of the HTTP status codes, and is designed to be used wherever http URIs can be used within HTML markup and within APIs in JavaScript (e.g. for "img src =", alongside XMLHttpRequest, etc.).  The nascent URL API [5] which coins and revokes blob: URIs is also used with the Stream API [6] for video-conferencing use cases, and thus this scheme is becoming integral to emerging technologies under the broad aegis of HTML.
> Hello! Before you begin on your journey, please accept my apologies. The road to a URI scheme is hard and paved with pedantry. That you use the acronym URL probably means this will be many interesting conversations. That you want the URI to be opaque means you will certainly enjoy the attention of URN fans.
>

I recognize this, and in fact, *did* actually start with a URN.  And as 
far as URI/URL goes, I prefer the term URI, but the Stream API [1] 
(which uses this scheme) actually prefers Blob URL over Blob URI.  I quote:

"A Blob URL is the same as what the File API specification calls a Blob 
URI, except that anything in the definition of that feature that refers 
to |File 
<http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#file>| 
and |Blob 
<http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#blob>| 
objects is hereby extended to also apply to |Stream 
<http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#stream>| 
and |GeneratedStream 
<http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#generatedstream>| 
objects."

> You're going to have to nail down what can and can't be an opaqueString. Why? Because an opaqueString must not contain a '#' or else the fragment is ambiguous. And that's just the beginning.
>

This is an excellent point; not defining opaque string better is a 
shortcoming that I'm happy to fix.

> You'll need to consider why you might allow or forbit '@', ':', and '/' as well. It's not that computers have any issues with them, but they're meaningful to humans.

Hmmm.... OK.  Can you be a bit more specific with any use case you have 
in mind where confusion might arise?

> You'll need to justify why this should or shouldn't be a URN. It's true, no one uses urn: URIs. But no one uses blob: URIs today either. One way is shorter, one is already designed for the purpose you're looking for. If you go with a urn: some people will look at you funny. But if you can actually get a urn to be well used, you might pull up a bunch of other opaque identifiers out of obscurity as well.

I honestly did consider URN, but after discussions on the public-webapps 
listserv, concluded that URNs were not tenable.

Firstly, web developers wrote in saying that the syntax using urn:uuid 
was simply weird.  Something about "URN" on the open web (of 
applications) galled the spirit.  Naming is, after all, the second 
hardest problem in software.  They preferred something that clearly 
indicated what was going on.  "file:///" is unambiguous with respect to 
the name of the scheme and what it is trying to do, so we felt that we 
should have a cleanly named scheme for Blob and Stream access (where a 
"Stream" may be seen as a dynamic Blob).

Secondly, implementer opinion was divided between apathy and strong 
opinions against urn:uuid.  I ignored apathy (though payed attention 
when major implementers said they had a "slight preference").  For 
instance, read [2][3][4] on the public-webapps listserv.  I'd hate to 
call out individual user agents, but I was keen to hear from major 
browser vendors.  You can follow the relevant thread to see how they 
stood on urn:uuid vs. a dedicated scheme (such as blob: ).

> You'll need to think about internationalization. You will not enjoy the process. You might be the one author that finally writes a decent guide to writing the internationalization section of a URI scheme spec. You'll probably just repurpose something from another draft and mould it so it looks plausibly accurate without really understanding what it means.

Phew!  Point well taken.  You've assigned me a veritable Hercules 12 
Tasks :-)

More seriously, the specification leaves "opaque string" as an 
implementation detail (only suggesting UUID informatively), much as I 
used to leave things as an "exercise to the reader" in mathematics ~p.  
Do you think the specification itself should provide guidance, or is an 
informative note sufficient?  I'm inclined to the latter.  We put in a 
LOT of thought towards i18n in the Firefox project.  I'm sure other 
projects do as well.

> You're going to get hung up on the validity vs reservedness vs authoritativeness of any string that starts with blob:. Is it a valid blob: URI? Does the blob: URI actually identify a real thing? Can my blob: URI share the same string as yours, but identify different things?

They are actually unique per session, which is why I suggest UUID as 
being good choices.  Also, scope rules and origin rules in HTML may 
satisfy these clauses.  But in a nutshell, two blob:URI are not likely 
to point to the same resource.  After all, a blob:URI is most likely to 
be used to identify a distinct file on a user's hard drive, which in all 
likelihood, the user themselves picked.  Also, a blob:URI can point to a 
web-app generated Blob object using BlobBuilder.  Or, a 
video-conferencing Stream.  All these resources are pretty much unique 
in space and time.

> What if I use a blob: URI for a thing that doesn't exist, and then you create a thing that should be identified by that blob: URI? Do I then get to access it? Can I share access to a resource by sharing its blob: URI? What does it all mean?

The short answer is no :)

But the point is well taken.  I'm optimistic that the scope rules, 
origin rules, and lifetime stipulations cover this.  Do you feel they 
fall short?

> But before you do any of that crap, please write out examples for every important way that blob: URIs can be used.
> - how to parse a blob: URI (ABNF is great for matching, sometimes okay for tokenizing, and useless for parsing. What does each token mean?)
> - how to construct a blob: URI from tokens
> - what operations you can perform using those tokens
> - every kind of error, and what error recovery looks like
> - how a blob: URI could be used outside of a web browser. Yes, things other than browsers still exist.
>

This point is well taken, and I intend to follow this feedback in the 
specification.  Thanks again for the helpful suggestions.

> Finally, you really should talk to the folks at http://magnet-uri.sourceforge.net. You have similar aims and I think you both could benefit from using the same names for things. magnet:?xt=urn:uuid:550e8400-e29b-41d4-a716-446655440000#aboutABBA looks horrible, but that's not enough reason to reinvent the opaque string wheel, right?
>

I'm not sure the magnet project's goals really overlap with identifying 
distinct resources that are represented by the Blob API (and the File 
API) or the Stream API.  But I did dutifully study the pointer you provided.

-- A*

[1] See for Blob URL: 
http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#blob-url
[2] http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/0642.html
[3] http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/0656.html
[4] http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/0736.html
Received on Sunday, 15 May 2011 21:33:37 UTC