Re: New JSON-LD digital signature library for Javascript (browsers and node.js) from David I. Lehn on 2014-12-09 (public-credentials@w3.org from December 2014)

From: David I. Lehn <dil@lehn.org>
Date: Mon, 8 Dec 2014 23:43:46 -0500
To: Melvin Carvalho <melvincarvalho@gmail.com>
Cc: Dave Longley <dlongley@digitalbazaar.com>, W3C Credentials Community Group <public-credentials@w3.org>
Message-ID: <CADcbRRPja1LCLHDOPEUW9RGPeO9jdneRUwhBBT_evLH8wjipZQ@mail.gmail.com>

On Mon, Dec 8, 2014 at 9:24 PM, Melvin Carvalho
<melvincarvalho@gmail.com> wrote:
> On 8 December 2014 at 16:26, Dave Longley <dlongley@digitalbazaar.com>
> wrote:
>> On 12/08/2014 10:14 AM, Melvin Carvalho wrote:
>> On 8 December 2014 at 16:06, Dave Longley <dlongley@digitalbazaar.com>
>> wrote:
>>> On 12/08/2014 04:40 AM, Melvin Carvalho wrote:
>>> On 8 December 2014 at 04:31, Manu Sporny <msporny@digitalbazaar.com>
>>> wrote:
>>>> Digital Bazaar has just released a convenience library for creating and
>>>> verifying JSON-LD Signatures in Javascript in the browser and in
>>>> node.js:
>>>>
>>>> https://github.com/digitalbazaar/jsonld-signatures/
>>>> ...
>>> ...
>>> Three things I'd love to see as convenience functions:
>>>
>>> 1. Normalize -- Done

You are referring to the normalize() call in jsonld.js right?

>>> 2. Signing -- Done

I'm pretty sure you are referring to the jsonld-signatures API right?

>>> 3. Hash content into ID, so that blank nodes can easily be replaced with
>>> a URI (I'd suggest ni:///sha256;<base64urlhash>
>>>
>>> (3) would facilitate (2) more easily, imho, as part of a common 3 step
>>> process
>>>

Did you want this hashing utility in the jsonld-signatures lib?  It
doesn't seem like it has anything to do with signatures.  Seems like
it should be in a new library but it might need to know jsonld.js
internals.

>>> What would the details of (3) be? What is the "content" that would be
>>> hashed?
>>
>>
>> What I usually do is hash the normalized form, which I think is possibly
>> the most logical thing to do?
>>
>> So:
>>
>> if @id
>>   return
>> else
>>   @id = sha256(normalized(json-ld))
>>
>>
>> The blank node IDs would then be a part of the "content" you're hashing.
>> The two step process would be:
>>
>> 1. Normalize the document (dataset) and hash (eg: sha256).
>> 2. Replace the canonical blank node IDs from the document using the "ni"
>> URI scheme and hash.
>
>
> Yes!
>
>> After step #2, normalizing and hashing the document now produces a
>> different hash,
>
> Correct
>
>> which seems to make the use of such a hash in the document IDs not all
>> that useful,
>
> It's a hash of the content, not of ID+content. Hashing the ID before you
> know it is kind of hard.
>
>>
>> confusing even. Thoughts?
>
>
> Yes, I can see that.  It is slightly confusing but as a utility function
> very helpful?
>
> Not just for credentials or payments, for any data structure (JSON or
> otherwise) without an ID. This might have billions or trillions of use
> cases.  Naming is hard and everyone will have a strategy for it, but a neat
> convenience function that will give you a UUID that's also a hash of the
> content I think is useful.  This can allow developers to get up and running
> more quickly when working with unnamed data.
>

Pardon the excessive quoting... I'm confused by much of this
discussion and not sure where to start!

Dave suggested running the current algorithms to get canonical blank
nodes, then replacing them, yet then it is mentioned only the content
is hashed?

What is the intended purpose of these hash ids?  Globally unique blank
node ids?  Globally unique content ids?  Content hashes will only be
globally unique if they are hashing content that already includes some
other globally unique id.  As an example, JSON-LD like {"foo":"bar"}
that appears one or more times in one or more datasets would generate
the same id hash each time right?  If the intent is to generate UUIDs,
then it seems using RFC4122 UUID algorithms to generate blank node ids
is the way to go.  If you want to use RFC6920 URIs, then maybe hash
the RFC4122 data?

In general, I thought the theory was that if you wanted data to have
an id, you give it an id.  Content hashes are useful for many various
reasons, but you need to make sure the use cases and algorithms handle
hash uniqueness or non-uniqueness as appropriate.  I'd like to hear a
bit more about what the real use cases are for this idea.

Also, I'm not sure what this has to do with credentials. :-)

-dave

Received on Tuesday, 9 December 2014 04:44:13 UTC