Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...) from Maciej Stachowiak on 2008-05-12 (public-webapi@w3.org from May 2008)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 11 May 2008 18:46:31 -0700
To: Aaron Boodman <aa@google.com>
Cc: Chris Prince <cprince@google.com>, "Web API WG (public)" <public-webapi@w3.org>, Ian Hickson <ian@hixie.ch>
Message-Id: <ABCDA560-C376-4FDF-AB51-59F55655C458@apple.com>

On May 11, 2008, at 6:01 PM, Aaron Boodman wrote:

> On Sun, May 11, 2008 at 5:46 PM, Maciej Stachowiak <mjs@apple.com>  
> wrote:
>> Well, that depends on how good the OS buffer cache is at  
>> prefetching. But in
>> general, there would be some disk access.
>
> It seems better if the read API is just async for this case to prevent
> the problem.

It can't entirely prevent the problem. If you read a big enough chunk,  
it will cause swapping which hits the disk just as much as file reads.  
Possibly more, because real file access will trigger OS prefetch  
heuristics for linear access.

>>> I see what you mean for canvas, but not so much for XHR. It seems  
>>> like
>>> a valid use case to want to be able to use XHR to download very  
>>> large
>>> files. In that case, the thing you get back seems like it should  
>>> have
>>> an async API for reading.
>>
>> Hmm? If you get the data over the network it goes into RAM. Why  
>> would you
>> want an async API to in-memory data? Or are you suggesting XHR  
>> should be
>> changed to spool its data to disk? I do not think that is practical  
>> to do
>> for all requests, so this would have to be a special API mode for  
>> responses
>> that are expected to be too big to fit in memory.
>
> Whether XHR spools to disk is an implementation detail, right? Right
> now XHR is not practical to use for downloading large files because
> the only way to access the result is as a string. Also because of
> this, XHR implementations don't bother spooling to disk. But if this
> API were added, then XHR implementations could be modified to start
> spooling to disk if the response got large. If the caller requests
> responseText, then the implementation just does the best it can to
> read the whole thing into a string and reply. But if the caller uses
> responseBlob (or whatever we call it) then it becomes practical to,
> for example, download movie files, modify them, then re-upload them.

That sounds reasonable for very large files like movies. However,  
audio and image files are similar in size to the kinds of text or XML  
resources that are currently processed synchronously. In such cases  
they are likely to remain in memory.

In general it is sounding like it might be desirable to have at least  
two kinds of objects for representing binary data:

1) An in-memory, mutable representation with synchronous access. There  
should also be a copying API which is possibly copy-on-write for the  
backing store.

2) A possibly disk-backed representation that offers only asynchronous  
read (possibly in the form of representation #1).

Both representations could be used with APIs that can accept binary  
data. In most cases such APIs only take strings currently. The name of  
representation #2 may wish to tie it to being a file, since for  
anything already in memory you'd want representation #1. Perhaps they  
could be called ByteArray and File respectively. Open question: can a  
File be stored in a SQL database? If so, does the database store the  
data or a reference (such as a path or Mac OS X Alias)?

Regards,
Maciej

Received on Monday, 12 May 2008 01:47:11 UTC