Lazy Blob

Hi all,

with the likes of postMessage and Web Intents that we are getting access to now, it is increasingly common that data may flow from a server to an in-browser page, that may then pass that data on to another in-browser page (typically running at a different origin). In a many cases, such data will be captured as Blobs. There can be several reasons why one may wish to toss a Blob over postMessage rather than a URL:

    • Origin restrictions and lack of CORS support (not necessarily fixable by the hacker working on the site);
    • CSP on the receiving end;
    • Cookies or authorisation issues;
    • Hotlinking prevention;
    • Secrecy of the actual URL to the data (possibly for privacy reasons);
    • Probably a bunch of others I haven't thought of.

The problem is that the way that things currently work, the sending end needs to load the data in full before transmitting it. That's okay when it's small and there are only few resources, or if you know that the receiving end will read it all right away anyway. But there are cases in which there is a lot of data (e.g. an image gallery) or where the receiving end might filter the data, only load it item per item, etc.

This is something that we noted while working on implementing some Web Intents based services but that applies more broadly. The most acute example was Jungkee Song's Pick Image Intent that could easily find itself in a position to load several hundreds of megabytes that might then just be thrown away by the receiving end.

The proposals below were hashed out with Jungkee Song and Richard Tibbett, based on the above use case as well as several people in the Intents and DAP groups stating that they bumped into situations where they would have needed this (feel free to send additional use cases folks).


POTENTIAL SOLUTIONS
===================

The overall idea is simple irrespective of which approach is chosen: create a Blob that can lazily fetch the content of a URL. Blobs are designed to be transparently lazy if they need to, so nothing needs to change at the Blob level, or for that matter at the level of anything that may end up reading from one. In fact, a good implementation that reads file-based Blobs is probably already lazy. A Lazy Blob is just a Blob.

Things start to get messier (unless we've missed an option) when you plug that into URLs (don't they always). That's where the multiple approaches outlined below kick in.

In all approaches it is assumed that if there is information to be inferred from context (e.g. access to cookies) then the relevant context is the one in which the Blob is created. Reading of the Blob performed in another context changes nothing to that.

User agents are always allowed to access the actual data at any arbitrary moment, between immediately and when code requests it. Of course, quality of implementation may dictate a strategy that doesn't make it just as bad as loading everything immediately.

Also, none of the options below are detailed anywhere near where they should be — but if we agree on the need and on a general direction I'll be happy to take care of that. If there's agreement on this, I volunteer as editor (Jungkee and perhaps Richard may be interested as well).


=== The Simple Approach

partial interface BlobBuilder {
    Blob getBlobFromURL (DOMString url);
};

Usage:
var bb = new BlobBuilder()
,   blob = bb.getBlobFromURL("http://specifiction.com/kitten.png");

This is the simplest possible approach. When called, essentially the equivalent of this happens:

var xhr = new XMLHttpRequest();
xhr.open("GET", url, true);
xhr.responseType = "blob";

And upon reading the blob, send() is triggered and everything else happens as if the XHR blob from xhr.response were accessed directly.

Pro: Extremely simple.
Con: Does not allow much control over the request that gets made, notably some uses will likely require setting headers.



=== The Somewhat Less Simple Approach

partial interface BlobBuilder {
    Blob getBlobFromURL (DOMString url, optional DOMString method, optional Object headers);
};

Usage:
var bb = new BlobBuilder()
,   blob = bb.getBlobFromURL("http://specifiction.com/kitten.png", "GET", { Authorization: "Basic DEADBEEF" });

Everything is the same as the previous version but the method and some headers can be set by enumerating the Object. I *think* that those are all that would ever be needed.

This is currently my preferred approach. If you like it too, you may consider the following ones as merely curiosities.


=== Using XHR For Options

partial interface BlobBuilder {
    Blob getBlobFromURL (XMLHttpRequest xhr);
};

Usage:
var bb = new BlobBuilder()
,   xhr = new XMLHttpRequest();
xhr.open("GET", "/kitten.png", true);
xhr.setRequestHeader("Authorization", "Basic DEADBEEF");
var blob = bb.getBlobFromURL(xhr);

This avails the developer the full flexibility and power of XHR, and uses the configured XHR object to make the request (which is forced to responseType = "blob" behind the scenes). It's more powerful and might be more future-proof, but it's more verbose and carries some extra complexity. Once the XHR object has been given to the BlobBuilder it needs to be impossible to change it, events should fire on it, etc.


=== Another XHR Approach

partial interface XMLHttpRequest {
    Blob makeLazyBlob ();
};

Usage:
var xhr = new XMLHttpRequest();
xhr.open("GET", "/kitten.png", true);
xhr.setRequestHeader("Authorization", "Basic DEADBEEF");
var blob =xhr.makeLazyBlob(xhr);

This is very similar to the previous one in its power and complexity, but the difference is that everything happens on XHR.


=== The Inheritance Approach

interface LazyBlob : XMLHttpRequest {
    void send (); // noop
};

Usage:
var lb = new LazyBlob();
lb.open("GET", "/kitten.png", true);
lb.setRequestHeader("Authorization", "Basic DEADBEEF");

More of the same, turns send() into a no-op.


=== Cloneable XHR

Just toss the XHR itself over postMessage().


Speaking personally, I prefer option B but I'm certainly open to other options. My primary concern at this time is to get a sense for whether there's agreement on the issue to start with or not.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Wednesday, 1 August 2012 14:59:28 UTC