ArrayBuffer v ArrayBufferView (was JS examples) from Ryan Sleevi on 2012-09-06 (public-webcrypto@w3.org from September 2012)

From: Ryan Sleevi <sleevi@google.com>
Date: Thu, 6 Sep 2012 11:05:57 -0700
To: Arun Ranganathan <arun@mozilla.com>
Cc: David Dahl <ddahl@mozilla.com>, "public-webcrypto@w3.org Working Group" <public-webcrypto@w3.org>
Message-ID: <CACvaWvYhc6kvBd7+hJzr+HG8+rnUfdwzQBd7cumAJ7aTxBFdbg@mail.gmail.com>
On Thu, Sep 6, 2012 at 10:36 AM, Arun Ranganathan <arun@mozilla.com> wrote:
> Ryan,
>
>
> On Sep 6, 2012, at 1:11 PM, Ryan Sleevi wrote:
>
>>
>> Can you explain how ArrayBuffer is any easier than ArrayBufferView?
>>
>
> Not *easier* per se, but (depending on what you're trying to do), you can obtain an ArrayBuffer from a file, without predetermining what sort of view you want to use.  Additionally, if you have a string, you can coin a Blob, and read it back as an ArrayBuffer.  Those kinds of API conveniences mean you don't have to focus on what view object to use.

Sure, but if you had an ArrayBuffer from a file, and you wanted to
incrementally process data, only accepting ArrayBuffer would mean the
caller would have to .slice() the data continuously, which performs a
copy operation on the underlying bytes (necessary due to the whole
Transferrable semantics, AFAICT)?

Using ArrayBufferView, you can easily .slice() another ABV without
requiring a fundamental copy of the underlying ArrayBuffer, IIUC. This
means that if you have a file encrypted with a signature, you can
.slice() to extract the signature, start up a Verifier, and you can
.slice() to get the encrypted data, and start up a Decrypter, both
without requiring you copy the underlying ArrayBuffer.

For the use case of a File/Blob, can't you just wrap in a DataView,
whose sole purpose seems to be to allow applications to distance
themselves from the particular type of view? I don't think the problem
is /reading/ the data, the problem is determing data suitable to place
in to the API. Logically, application developers will want to use
DOMString - whether as a literal string or to represent 'a series of
bytes', but because DOMStrings are UTF-16 in JavaScript, that creates
a skew between most every other API, where strings are char*/1 byte
and you can represent both ASCII and 'arbitrary data' using the same
datatype.

So for input, I think we want to make sure the caller is explicitly
making a choice how to represent the data, so it's clear how many
bytes and what bytes will be transformed.

For output, if we're going to talk in terms of needing to provide
concrete views, then I think all the views should be more explicit -
everything is a UInt8ClampedArray (logically), but could also be seen
as a DataView.

To restate:
The ease of use question seems to be the general problem of DOMString
being UTF-16, and thus "problematic" for representing data that
expects it to be a "sequence of octets", and "problematic" for
developers that expect ASCII DOMStrings are encoded as UTF-8. I don't
think that's a problem solved by ArrayBuffer v ArrayBufferView, and is
in the general problem domain of what TypedArrays is trying to solve.

>
>
>> Perhaps you meant to say Blob (from the File API), since Blob has a
>> constructor that can take an ArrayBufferView or DOMString.
>>
>> My concern with Blob (and with ArrayBuffer w/o the View) is that they
>> both seem to require that the underlying data be copied in order to
>> construct/invoke, whereas ArrayBufferView is a slice of an already
>> existing ArrayBuffer (or arbitrary data source, in the case of
>> DataView), and thus is copy-free.
>
>
> Can you flesh your concern out a bit more?  What's a "worst case scenario" that you envision?

Read a 100 MB file.
It is encrypted.
It has a signature at the end (say 20 bytes)
To get the data, you must verify the signature, and then decrypt.

Using ArrayBuffer, rather than ArrayBufferView, as I understand it, you must:
Copy 20 bytes into a new ArrayBuffer. Start a Verifier.
If it verifies, Copy 100 MB **MINUS** 20 bytes into a **new**
ArrayBuffer. Start a Decrypter.

At this point, regardless of whether you immediately discard the
'original' array buffer, there is a point in time where two
"effectively 100 MB" ArrayBuffers exist, using twice the memory.

Now, there's still the possibility of a 200MB allocation - since
without streaming input, you'll have both the original, 100MB-20B
Encrypted ArrayBuffer, and the ~100MB Decrypted ArrayBuffer. That also
saddens me, hence ISSUE-18.

In my "ideal" world, your maximum allocation during the entire
transformation phase would be 100MB + (whatever implementation
specific block sizing is needed for implementations that don't do
in-place transformations. Whether this be 64K or 1MB or whatever,
that's up to the user agent, *not* to the script, to effectively
management). I think the Streaming APIs are one way to get there, but
in the absence of those, the ArrayBufferView strawman was trying to
find a type that didn't explicitly require copying.
Received on Thursday, 6 September 2012 18:06:26 UTC