Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...) from Aaron Boodman on 2008-05-12 (public-webapi@w3.org from May 2008)

From: Aaron Boodman <aa@google.com>
Date: Sun, 11 May 2008 21:22:32 -0700
To: Maciej Stachowiak <mjs@apple.com>
Cc: Chris Prince <cprince@google.com>, "Web API WG (public)" <public-webapi@w3.org>, Ian Hickson <ian@hixie.ch>
Message-ID: <278fd46c0805112122m75a716d1o2056d3c8257add59@mail.gmail.com>

On Sun, May 11, 2008 at 6:46 PM, Maciej Stachowiak
>> It seems better if the read API is just async for this case to prevent
>> the problem.
>
> It can't entirely prevent the problem. If you read a big enough chunk, it
> will cause swapping which hits the disk just as much as file reads. Possibly
> more, because real file access will trigger OS prefetch heuristics for
> linear access.

Right, I think the UA has to have ultimate control over the chunk size
to prevent this. The length parameters on the read apis I suggested
would have to be what the caller desires, but the implementation
doesn't necessarily have to honor it. I've changed the parameter names
on our wiki page to 'desiredLength' to reflect this.

>> Whether XHR spools to disk is an implementation detail, right? Right
>> now XHR is not practical to use for downloading large files because
>> the only way to access the result is as a string. Also because of
>> this, XHR implementations don't bother spooling to disk. But if this
>> API were added, then XHR implementations could be modified to start
>> spooling to disk if the response got large. If the caller requests
>> responseText, then the implementation just does the best it can to
>> read the whole thing into a string and reply. But if the caller uses
>> responseBlob (or whatever we call it) then it becomes practical to,
>> for example, download movie files, modify them, then re-upload them.
>
> That sounds reasonable for very large files like movies. However, audio and
> image files are similar in size to the kinds of text or XML resources that
> are currently processed synchronously. In such cases they are likely to
> remain in memory.

> In general it is sounding like it might be desirable to have at least two
> kinds of objects for representing binary data:
>
> 1) An in-memory, mutable representation with synchronous access. There
> should also be a copying API which is possibly copy-on-write for the backing
> store.
>
> 2) A possibly disk-backed representation that offers only asynchronous read
> (possibly in the form of representation #1).

I agree with this, but I think using Blob/File whatever as the default
representation is convenient because you don't need to add multiple
getter APIs to things such as XHR (responseBytes and responseBlob).
And you probably remove some potential confusion over which getter is
correct to use for a given situation.

> Both representations could be used with APIs that can accept binary data. In
> most cases such APIs only take strings currently. The name of representation
> #2 may wish to tie it to being a file, since for anything already in memory
> you'd want representation #1. Perhaps they could be called ByteArray and
> File respectively.

Calling it File seems a little weird to me, particularly in the case
of XMLHttpRequest.

> Open question: can a File be stored in a SQL database? If
> so, does the database store the data or a reference (such as a path or Mac
> OS X Alias)?

There definitely needs to be a way to store Files locally. I don't
have a strong opinion as to whether this should be in the database, or
in DOMStorage, or in something new just for files.

- a

Received on Monday, 12 May 2008 04:24:02 UTC