W3C home > Mailing lists > Public > uri@w3.org > May 2011

Re: Discussion of Blob URI Scheme for Binary Data Access | IETF

From: Joseph Anthony Pasquale Holsten <joseph@josephholsten.com>
Date: Tue, 17 May 2011 09:51:25 -0500
Cc: uri@w3.org
Message-Id: <DB4683D4-8714-40B9-A030-88F4BC0AF141@josephholsten.com>
To: arun@mozilla.com
TL;DR: lock down the opaqueString to be a UUID and all's well.

On May 15, 2011, at 4:33 PM, Arun Ranganathan wrote:

> Greetings Joseph, and thanks for your response :)
> 
> 
> On 5/13/11 7:03 PM, Joseph Anthony Pasquale Holsten wrote:
>> On May 13, 2011, at 1:05 PM, Arun Ranganathan wrote:
[...]
>> You'll need to consider why you might allow or forbit '@', ':', and '/' as well. It's not that computers have any issues with them, but they're meaningful to humans.
>> 
> 
> Hmmm.... OK.  Can you be a bit more specific with any use case you have in mind where confusion might arise?

Just that URIs with an authority tend to look like scheme://userinfo@host:port/segment/segment?query. There's no technical reason you  couldn't allow blob:u@h:p/s?q as a valid blob: URI. But people will look at it and assume meaning where there is none.

>> You'll need to justify why this should or shouldn't be a URN. It's true, no one uses urn: URIs. But no one uses blob: URIs today either. One way is shorter, one is already designed for the purpose you're looking for. If you go with a urn: some people will look at you funny. But if you can actually get a urn to be well used, you might pull up a bunch of other opaque identifiers out of obscurity as well.
>> 
> 
> I honestly did consider URN, but after discussions on the public-webapps listserv, concluded that URNs were not tenable.
> 
> Firstly, web developers wrote in saying that the syntax using urn:uuid was simply weird.  Something about "URN" on the open web (of applications) galled the spirit.  Naming is, after all, the second hardest problem in software.  They preferred something that clearly indicated what was going on.  "file:///" is unambiguous with respect to the name of the scheme and what it is trying to do, so we felt that we should have a cleanly named scheme for Blob and Stream access (where a "Stream" may be seen as a dynamic Blob).  

I'm not sure something like blob:550e8400-e29b-41d4-a716-446655440000#aboutABBA could ever clearly indicate what is going on. That said...

> Secondly, implementer opinion was divided between apathy and strong opinions against urn:uuid.  I ignored apathy (though payed attention when major implementers said they had a "slight preference").  For instance, read [2][3][4] on the public-webapps listserv.  I'd hate to call out individual user agents, but I was keen to hear from major browser vendors.  You can follow the relevant thread to see how they stood on urn:uuid vs. a dedicated scheme (such as blob: ).

+1 to implementors' opinions

>> You'll need to think about internationalization. You will not enjoy the process. You might be the one author that finally writes a decent guide to writing the internationalization section of a URI scheme spec. You'll probably just repurpose something from another draft and mould it so it looks plausibly accurate without really understanding what it means. 
>> 
> 
> Phew!  Point well taken.  You've assigned me a veritable Hercules 12 Tasks :-)
> 
> More seriously, the specification leaves "opaque string" as an implementation detail (only suggesting UUID informatively), much as I used to leave things as an "exercise to the reader" in mathematics ~p.  Do you think the specification itself should provide guidance, or is an informative note sufficient?  I'm inclined to the latter.  We put in a LOT of thought towards i18n in the Firefox project.  I'm sure other projects do as well.

The simplest thing would be to forbid characters that require encoding, especially percent encoding. If you do this, every valid IRI in UTF-8 is the binary equivalent of its semantically equivalent URI.

If you have IRIs be equivalent to their URIs, an informative note is plenty. If an IRI will require some effort to convert down to a URI, or if ambiguity allows a single URI to be encoded multiple ways, please provide guidance.

>> You're going to get hung up on the validity vs reservedness vs authoritativeness of any string that starts with blob:. Is it a valid blob: URI? Does the blob: URI actually identify a real thing? Can my blob: URI share the same string as yours, but identify different things? 
> 
> They are actually unique per session, which is why I suggest UUID as being good choices.  Also, scope rules and origin rules in HTML may satisfy these clauses.  But in a nutshell, two blob:URI are not likely to point to the same resource.  After all, a blob:URI is most likely to be used to identify a distinct file on a user's hard drive, which in all likelihood, the user themselves picked.  Also, a blob:URI can point to a web-app generated Blob object using BlobBuilder.  Or, a video-conferencing Stream.  All these resources are pretty much unique in space and time.
> 
>> What if I use a blob: URI for a thing that doesn't exist, and then you create a thing that should be identified by that blob: URI? Do I then get to access it? Can I share access to a resource by sharing its blob: URI? What does it all mean?
>> 
> 
> The short answer is no :)
> 
> But the point is well taken.  I'm optimistic that the scope rules, origin rules, and lifetime stipulations cover this.  Do you feel they fall short?

On further study, I think you're fine.

--
http://josephholsten.com
Received on Tuesday, 17 May 2011 14:51:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 17 May 2011 14:51:54 GMT