Iterative crypto.subtle.digest from Artem Skoretskiy on 2016-11-05 (public-webcrypto@w3.org from November 2016)

From: Artem Skoretskiy <tonn81@gmail.com>
Date: Sat, 5 Nov 2016 14:45:46 +0100
To: public-webcrypto@w3.org
Message-ID: <CAGLVQBYYOrkSbJ8wjvAcATOLDi_E7nX2esD-Fe3gYPzqDPYcJw@mail.gmail.com>

Dear W3C group,

I have a feedback regarding your "digest" method
https://w3c.github.io/webcrypto/Overview.html#SubtleCrypto-method-digest

That is great we could have a native hashes calculation in the browser.
However, there are some missing parts to make it usable.

At the moment, you must pass complete buffer into digest method, e.g.:

window.crypto.subtle.digest('SHA-1', new TextEncoder("utf-8").encode('Hello
world!')).then(function(digest){
    console.log(digest);
})

That is completely fine till your content is small. Once you start to deal
with content in size of Gigabytes or Terabytes, you are stuck.

With current implementation you need to read complete content into RAM,
that makes heavy use of the RAM and also brings a limit on the content you
could handle.

Yes, usually all the content is in the RAM, but there are several cases
when it is not:

- File (selected by a user)
- Content that is generated on fly, e.g. PDF or ZIP

In my scenario I'm hashing user files before uploading them to a cloud (so
that we don't upload already uploaded files). With current standard I
cannot handle big files, e.g. 3GB in size.

I would propose digest to be iterative so you could generate hash by chunks
and keep RAM usage log.

For example:

var hash = new window.crypto.subtle.digest('SHA-1');
hash.update(TextEncoder("utf-8").encode('Hello'))
hash.update(TextEncoder("utf-8").encode(' world!'))

hash.digest().then(function(digest){
    console.log(digest);
});

That is pretty common practice for hashing in modern languages. E.g. for
Python:

import hashlib

digest = hashlib.sha1()
digest.update('Hello')
digest.update(' world!')
print(digest.hexdigest())

d3486ae9136e7856bc42212385ea797094475802


That would solve my use case (I would generate hash by chunks) and reduce
memory footprint in other scenarios with big content.

Alternatively, you could allow providing File / Blob as input as well as
Buffer. Then browsers would need to implement efficient reading and hashing
by chunks then. For me as a developer -- that would be easier, but less
flexible.

var file = new File([""], "filename");
window.crypto.subtle.digest('SHA-1', file).then(function(digest){
    console.log(digest);
});

I hope that change would take place in future revisions to make
cryptographic hashing a first-class citizen in browsers.

-- 
Truly yours,
Artem Skoretskiy

Received on Saturday, 5 November 2016 13:46:19 UTC