Re: Streaming - [Re: CryptoOperation and its life cycle] from Ryan Sleevi on 2012-12-13 (public-webcrypto-comments@w3.org from December 2012)

From: Ryan Sleevi <sleevi@google.com>
Date: Thu, 13 Dec 2012 14:42:49 -0800
To: Aymeric Vitte <vitteaymeric@gmail.com>
Cc: public-webcrypto-comments@w3.org
Message-ID: <CACvaWvbMnd20i_wVR_ROtEVUpq2gZnNv6rBJWm0azusZwKavfw@mail.gmail.com>
On Thu, Dec 13, 2012 at 2:27 PM, Aymeric Vitte <vitteaymeric@gmail.com> wrote:
> What I would like to do seems not easy (of course one shot operations are
> easier), but it's not a marginal case, it has been requested a lot of time
> on some projects (node.js) and a living example is
> https://github.com/Ayms/node-Tor (both for hash and encryption)
>
> The proposed solution is :
>
>
> var h1 = window.crypto.digest("SHA1");
> var h2 = window.crypto.digest("SHA1");
> h1.process(stream1);
> h2.process(stream1);
> h2.process(stream2);
> h1.finish();
> h2.finish();
>
> This is not a streaming solution, and I would not promote clones as well at
> all.
>
> But the promises style (which I believe just complicate a lot the proposal
> although almost everybody seem to agree that's the right way to spec things)
> does not make clear what happens to the list of pending datas while
> different process methods are invoked successively, first I wrote :
>
> H.process(stream1);
> H.process(stream2);
> H.onprogress=function() { console.log(this.result)};
>
> Then I thought that it was completely wrong (is stream2 processed by
> H.process(stream1) ?)
>
> Unlike other APIs, you don't have any update method, but why not just simply
> when process method ends :
>
> - return digest of process method
> - put remaining blocks (ie the ones that would not have been processed by un
> update method) in pending datas as the oldest ones

The API is always asynchronous. As such, .process() always returns
void, because it must NEVER return a response within the same turn of
the event loop (
http://www.whatwg.org/specs/web-apps/current-work/#event-loops ).

Instead, .process() causes a task to be enqueued to the event loop to
run asynchronously.

Again, there is no requirement that the underlying implementation
support multi-part operations. It MAY decide to perform no
cryptographic operations until the task posted by .finish() has been
posted - indeed, if the underlying implementation does not support
multi-part, that is the only thing it can do (that, or not support the
algorithm)

Further, if you were to say, .process(stream1), where Stream1 was an
ArrayBufferView that contained 20 "blocks" of data for the underlying
algorithm, it's entirely valid for the UA to process the first 10
blocks, fire a progress event, **run the event loop**, and then
process another 10 blocks, and fire another progress event. That is,
two progress events - for a single ArrayBufferView.

(If you're wondering why a UA would do that, we'll say cooperative
multi-threading model where it does want to allocate more than N
timeslices to processing data, hence only 10 blocks at a time).

Alternatively, it could fire a single progress event with all 20 blocks

Or it could fire no progress events until .finish() was called, fire a
single progress event with all of the accumulated data, and then fire
oncomplete.

To put it differently, there is absolutely no 1:1 mapping between
.process() and onprogress. .process() adds data, onprogress informs
you when data is available - and it's up to the user agent to decide
how to best optimize that for its underlying implementation. The only
guarantee is that as data becomes available, progress events should be
fired, and eventually, oncomplete will be.

>
> But still I don't know what would fire onprogress exactly...
>
> Again, I might be misreading, the intention here is to move forward, not to
> complicate things
>
>
>
> Le 13/12/2012 19:49, Ryan Sleevi a écrit :
>
>> On Thu, Dec 13, 2012 at 2:16 AM, Aymeric Vitte <vitteaymeric@gmail.com>
>> wrote:
>>>>
>>>> onprogress follows the Progress Events model, in which the client is
>>>> informed of progress. There is always at least one onprogress event
>>>> (which may be due to the final completion of the data), and there is
>>>> always zero or one oncomplete events. At the oncomplete event firing,
>>>> all of the data is available in result.
>>>>
>>>> This was raised as a point of concern by Wan-Teh back on June 20th,
>>>> but it arguably follows the model of what existing APIs (such as File
>>>> API or Streams API) do through their readAsArrayBuffer methods, and
>>>> with how XMLHttpRequest makes data available through progress events.
>>>>
>>>> To be clear: .result contains the data available, and may grow to add
>>>> more data, up and until oncomplete is fired.
>>>>
>>> The specs say : "an interface to support streaming/progressive output has
>>> also been requested. How such an interface would be implemented, if at all,
>>> remains TBD."
>>
>>
>> This is talking in particular about accepting Blob objects from the
>> File API ( http://dev.w3.org/2006/webapi/FileAPI/ ) **and returning
>> Blob**, or accepting Stream objects from the Streams API (
>> http://dvcs.w3.org/hg/streams-api/raw-file/tip/Overview.htm ) **and
>> returning Stream objects**
>>
>>>
>>> I assume this is related to the case below, current implementations of
>>> streaming can do something like :
>>>
>>> var H=new Hash('sha1');
>>> H.update(stream1);
>>> var res1=H.digest(stream1);
>>> H.update(stream2);//hash stream1+stream2
>>> var res2=H.digest(stream2);
>>> etc...
>>>
>>> Which can become with Webcrypto something like :
>>>
>>> var H=(new Hash('sha1')).digest();
>>> H.process(stream);//stream1
>>> H.onprogress=function() {
>>>      console.log(this.result);
>>>      this.process(stream);//stream2
>>> };
>>>
>>>
>>> Apparently this will return stream1 hash and stream2 hash (not stream1
>>> hash followed by stream1 + stream2 hash), probably I am misreading something
>>> because it looks therefore useless to call several time "process" for the
>>> same CryptoOperation object and the list of pending data is only feeded by
>>> "process" which empty it, then for now it's not really a list (should not
>>> the list of pending data be updated when process ends ?).
>>
>> There is presently **no** requirement that the underlying
>> implementation support multi-part operations. The language was worded
>> in such that a browser implementation MAY synthesize multi-part
>> operations under the hood into a single operation.
>>
>> Syntactically, the call sequence under the current Editor's Draft (
>>
>> https://dvcs.w3.org/hg/webcrypto-api/raw-file/f5e8d9a3e18f/spec/Overview.html
>> ) is
>>
>> var h = window.crypto.digest("sha1");
>> h.process(stream1);  // MUST be an ArrayBufferView
>> h.process(stream2);  // MUST be an ArrayBufferView
>> h.finish();
>>
>> Given that API ONLY defines SHA-family hashes at present, and that
>> there is NO incremental hashing supported by these constructs (since
>> the final block contains both padding and the finalized length), it
>> makes no sense to to return the intermediate parts.
>>
>> Really, what I think you're asking about is ISSUE-22, which asks where
>> CryptoOperations should be clonable. IF they were (and I presently
>> don't think so yet), THEN you would write something like
>>
>> var h1 = window.crypto.digest("SHA1");
>> h1.process(stream1);  // MUST be an ArrayBufferView
>> var h2 = h1.clone();
>> h1.finish();
>> h2.process(stream2);  // MUST be an ArrayBufferView
>> h2.finish();
>>
>> Upon invoking their oncomplete callbacks for h1 and h2, h1.result ==
>> H(stream1) and h2.result == H(stream1+stream2);
>>
>> However, like I said, I am generally opposed to clone methods (not
>> structured clone, but explicit clone), in particular when the
>> object-being-cloned is an EventTarget, as I think it creates confusion
>> for what to do with pending tasks in the HTML Event Loop when the
>> object is cloned during a task? Do they get cloned as well? If not,
>> it's possible for h2.result == H(Stream2), which would not be
>> expected.
>>
>> In short, the only way to do what you're asking today is
>>
>> var h1 = window.crypto.digest("SHA1");
>> var h2 = window.crypto.digest("SHA1");
>> h1.process(stream1);
>> h2.process(stream1);
>> h2.process(stream2);
>> h1.finish();
>> h2.finish();
>>
>> As you can see, multi-part operations, streaming, and cloning are
>> rather complex issues, whereas 'single shot' operations are much
>> clearer:
>>
>> var h1 = window.crypto.digest("SHA1", [ stream1 ]);
>> // no .process() method, no .finish() method. Only digests the data
>> supplied in the .digest call
>> var h2 = window.crypto.digest("SHA1", [ stream1, stream2 ]);
>> // no .process(), no .finish(). Only digests the data supplied in the
>> .digest call
>>
>>>
>>> PS : typos in the spec
>>> 12.1.2.3bis "Remove data from the list of pending data." --> "Remove item
>>> from the list of pending data."
>>> 19.3.4 "Upon invoking init: " --> what init method ?
>>>
>>> --
>>> jCore
>>> Email :  avitte@jcore.fr
>>> GitHub : https://www.github.com/Ayms
>>> Web :    www.jcore.fr
>>> Webble : www.webble.it
>>> Extract Widget Mobile : www.extractwidget.com
>>> BlimpMe! : www.blimpme.com
>>>
>
> --
> jCore
> Email :  avitte@jcore.fr
> GitHub : https://www.github.com/Ayms
> Web :    www.jcore.fr
> Webble : www.webble.it
> Extract Widget Mobile : www.extractwidget.com
> BlimpMe! : www.blimpme.com
>
Received on Thursday, 13 December 2012 22:43:19 UTC