Re: Iterative crypto.subtle.digest from Eric Roman on 2016-11-14 (public-webcrypto@w3.org from November 2016)

From: Eric Roman <ericroman@google.com>
Date: Mon, 14 Nov 2016 12:16:25 -0800
To: Artem Skoretskiy <tonn81@gmail.com>
Cc: "public-webcrypto@w3.org" <public-webcrypto@w3.org>
Message-ID: <CAFswn4n-unCg6NbCzfWhJy2euTLR56+J5sf-JyjjSB7ZUF2ShQ@mail.gmail.com>

There have been alternate proposals for addressing this with Streams (
https://github.com/w3c/webcrypto/issues/73), which was tabled as a possible
feature for a next version.

Note that past revisions of the Web Crypto spec used an API similar to the
one you are proposing. See:
https://www.w3.org/TR/2012/WD-WebCryptoAPI-20120913/#Crypto-method-createDigester.
So this approach was at least considered, however I believe it was
abandoned in favor of the simpler one-shot + Promise approach (and looking
towards Streams for possibly addressing t he multi-part use case in the
future).

I do agree with you that for certain applications the asynchronous (and
one-shot) interface, for SHA digests in particular, is
inconvenient/impractical.

On Sat, Nov 5, 2016 at 6:45 AM, Artem Skoretskiy <tonn81@gmail.com> wrote:

> Dear W3C group,
>
> I have a feedback regarding your "digest" method https://w3c.github.io/
> webcrypto/Overview.html#SubtleCrypto-method-digest
>
> That is great we could have a native hashes calculation in the browser.
> However, there are some missing parts to make it usable.
>
> At the moment, you must pass complete buffer into digest method, e.g.:
>
> window.crypto.subtle.digest('SHA-1', new TextEncoder("utf-8").encode('Hello
> world!')).then(function(digest){
>     console.log(digest);
> })
>
> That is completely fine till your content is small. Once you start to deal
> with content in size of Gigabytes or Terabytes, you are stuck.
>
> With current implementation you need to read complete content into RAM,
> that makes heavy use of the RAM and also brings a limit on the content you
> could handle.
>
> Yes, usually all the content is in the RAM, but there are several cases
> when it is not:
>
> - File (selected by a user)
> - Content that is generated on fly, e.g. PDF or ZIP
>
> In my scenario I'm hashing user files before uploading them to a cloud (so
> that we don't upload already uploaded files). With current standard I
> cannot handle big files, e.g. 3GB in size.
>
> I would propose digest to be iterative so you could generate hash by
> chunks and keep RAM usage log.
>
> For example:
>
> var hash = new window.crypto.subtle.digest('SHA-1');
> hash.update(TextEncoder("utf-8").encode('Hello'))
> hash.update(TextEncoder("utf-8").encode(' world!'))
>
> hash.digest().then(function(digest){
>     console.log(digest);
> });
>
> That is pretty common practice for hashing in modern languages. E.g. for
> Python:
>
> import hashlib
>
> digest = hashlib.sha1()
> digest.update('Hello')
> digest.update(' world!')
> print(digest.hexdigest())
>
> d3486ae9136e7856bc42212385ea797094475802
>
>
> That would solve my use case (I would generate hash by chunks) and reduce
> memory footprint in other scenarios with big content.
>
> Alternatively, you could allow providing File / Blob as input as well as
> Buffer. Then browsers would need to implement efficient reading and hashing
> by chunks then. For me as a developer -- that would be easier, but less
> flexible.
>
> var file = new File([""], "filename");
> window.crypto.subtle.digest('SHA-1', file).then(function(digest){
>     console.log(digest);
> });
>
> I hope that change would take place in future revisions to make
> cryptographic hashing a first-class citizen in browsers.
>
> --
> Truly yours,
> Artem Skoretskiy
>

Received on Monday, 14 November 2016 20:16:59 UTC