Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...) from Maciej Stachowiak on 2008-05-12 (public-webapi@w3.org from May 2008)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 11 May 2008 17:46:02 -0700
To: Aaron Boodman <aa@google.com>
Cc: Chris Prince <cprince@google.com>, "Web API WG (public)" <public-webapi@w3.org>, Ian Hickson <ian@hixie.ch>
Message-Id: <E19071D1-E4E4-4BAC-98C6-05B38EF7229D@apple.com>

On May 11, 2008, at 4:40 PM, Aaron Boodman wrote:

> On Sun, May 11, 2008 at 4:22 PM, Maciej Stachowiak <mjs@apple.com>  
> wrote:
>>> Here's one additional question on how this would work with  
>>> ByteArray.
>>> The read API for ByteArray is currently synchronous. Doesn't this  
>>> mean
>>> that with large files accessing bytearray[n] could block?
>>
>> If the ByteArray were in fact backed by a file, then accessing  
>> bytearray[n]
>> could lead to part of the file being paged in. However, the same is  
>> true if
>> it is backed by RAM that is swapped out. Even accessing uninitialized
>> zero-fill memory could trap to the kernel, though that's in general  
>> not as
>> bad as hitting disk (whether for swap or file bytes).
>
> But expressing the API as an array makes it seem like access is always
> cheap, encouraging people to just burn through the file in a tight
> loop. Such loops would actually hit the disk many times, right?

Well, that depends on how good the OS buffer cache is at prefetching.  
But in general, there would be some disk access.

>> I can see how you may want to have an object to represent a file  
>> that can be
>> handed to APIs directly, but that has only an async read interface  
>> for JS.
>> However, I am pretty sure you would not want to use such an object to
>> represent binary data returned from an XHR, or the pixel contents  
>> of a
>> <canvas>. After all, the data is already in memory. So perhaps  
>> files need a
>> distinct object from other forms of binary data, if we wanted to  
>> enforce
>> such a restriction.
>
> I see what you mean for canvas, but not so much for XHR. It seems like
> a valid use case to want to be able to use XHR to download very large
> files. In that case, the thing you get back seems like it should have
> an async API for reading.

Hmm? If you get the data over the network it goes into RAM. Why would  
you want an async API to in-memory data? Or are you suggesting XHR  
should be changed to spool its data to disk? I do not think that is  
practical to do for all requests, so this would have to be a special  
API mode for responses that are expected to be too big to fit in memory.

Regards,
Maciej

Received on Monday, 12 May 2008 00:46:41 UTC