Re: Lifetime of Blob URL from Jonas Sicking on 2010-07-26 (public-webapps@w3.org from July to September 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Mon, 26 Jul 2010 14:12:33 -0700
To: David Levin <levin@google.com>
Cc: Adrian Bateman <adrianba@microsoft.com>, Darin Fisher <darin@chromium.org>, "arun@mozilla.com" <arun@mozilla.com>, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <AANLkTimfVmm6mhMNzEbRR27-_i9_uDotjOkP-ebRt2sz@mail.gmail.com>
On Tue, Jul 13, 2010 at 7:37 AM, David Levin <levin@google.com> wrote:
> On Tue, Jul 13, 2010 at 6:50 AM, Adrian Bateman <adrianba@microsoft.com>
> wrote:
>>
>> On Monday, July 12, 2010 2:31 PM, Darin Fisher wrote:
>> > On Mon, Jul 12, 2010 at 9:59 AM, David Levin <levin@google.com> wrote:
>> > On Mon, Jul 12, 2010 at 9:54 AM, Adrian Bateman <adrianba@microsoft.com>
>> > wrote:
>> > I read point #5 to be only about surviving the start of a navigation. As
>> > a
>> > web developer, how can I tell when a load has started for an <img>?
>> > Isn't
>> > this similarly indeterminate.
>> >
>> > As soon as img.src is set.
>> >
>> > "the spec could mention that the resource pointed by blob URL should be
>> > loaded successfully as long as the blob URL is valid at the time when
>> > the
>> > resource is starting to load."
>> >
>> > Should apply to xhr (after send is called), img, and navigation.
>> >
>> > Right, it seems reasonable to say that ownership of the resource
>> > referenced
>> > by a Blob can be shared by a XHR, Image, or navigation once it is told
>> > to
>> > start loading the resource.
>> >
>> > -Darin
>>
>> It sounds like you are saying the following is guaranteed to work:
>>
>> img.src = blob.url;
>> window.revokeBlobUrl(blob);
>> return;
>>
>> If that is the case then the user agent is already making the guarantees
>> I was talking about and so I still think having the lifetime mapped to the
>> blob
>> not the document is better. This means that in the general case I don't
>> have
>> to worry about lifetime management.
>
> Mapping lifetime to the blob exposes when the blob gets garbage collected
> which is a very indeterminate point in time (and is very browser version
> dependent -- it will set you up for compatibility issues when you update
> your javascript engine -- and there are also the cross browser issues of
> course).
> Specifically, a blob could go "out of scope" (to use your earlier phrase)
> and then one could do img.src = blobUrl (the url that was exposed from the
> blob but not using the blob object). This will work sometimes but not others
> (depending on whether garbage collection collected the blob).
> This is much more indeterminate than the current spec which maps the
> blob.url lifetime to the lifetime of the document where the blob was
> created.
> When thinking about blob.url lifetime, there are several problems to solve:
> 1. "An AJAX style web application may never navigate the document and this
> means that every blob for which a URL is created must be kept around in some
> form for the lifetime of the application."
> 2. A blob passed to between documents would have its blob.url stop working
> as soon as the original document got closed.
> 3. Having a model that makes the url have a determinate lifetime which
> doesn't expose the web developer to indeterminate behaviors issues like we
> have discussed above.
> The current spec has issues #1 and #2.
> Binding the lifetime of blob.url to blob has issue #3.

Indeed.

I agree with others that have said that exposing GC behavior is a big
problem. I think especially here where a very natural usage pattern is
to grab a File object, extract its url, and then drop the reference to
the File object on the floor.

And I don't think specifying how GC is supposed to work is a workable
solution. I doubt that any browser vendor will be willing to lock down
their GC to that degree. GC implementations is a very active area of
experimentation and has been for many many years. I see no reason to
think that we'd be able to come up with a GC algorithm that wouldn't
be obsolete very soon.

However I also don't think #3 above is a huge problem. You can always
flush a blob to disk, meaning that all that is leaked is an entry in a
url->filename hash table. No actual data needs to be kept in memory.
It's definitely still a problem, but I figured it's worth pointing
out.

Given that, I see no other significantly different solution than what
is in the spec right now. Though there are definitely some problems
that we should fix:

1. Add a function for "destroying" a url reference seems like a good idea.
2. #2 above can be specced away. You simply need to specify that any
context that calls blob.url extends the lifetime such that the url
isn't automatically destroyed until all contexts that requested it are
destroyed.
3. We should define that worker scopes can also extract blob urls.

However this leaves deciding on what syntax to use for creating and
destroying URLs. The current method of obtaining a url is:

x = myfile.url;
we could simply add
myfile.killUrl();

which kills the url that was previously returned from the file.
However this requires that people hold on to the Blob object and so
seems like a suboptimal solution. We could also do

x = myfile.url;
we could simply add
window.destroyBlobUrl(x);

However this keeps the creator and destructor functions far from each
other, which IMHO isn't very nice.

It has also been suggested that we change the syntax for obtaining urls to:

x = window.createBlobUrl(myfile);
and
window.destroyBlobUrl(x);

however the myfile.url syntax feels really nice and would be
unfortunate to loose. Instead I propose the following syntax:

x = myfile.url;
and
Blob.destroyUrl(x);
File.destroyUrl(x);

ECMAScript already puts functions on constructor objects, so we'd not
be inventing anything new here. For example array1.concat(array2) is
equivalent to Array.concat(array1, array2).

This is what I propose we use. I'm definitely interested to hear what
other people think though.

/ Jonas
Received on Monday, 26 July 2010 21:13:31 UTC