Adopting the Hashlink spec as a work item? from Manu Sporny on 2019-01-01 (public-json-ld-wg@w3.org from January 2019)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Tue, 1 Jan 2019 16:36:40 -0500
To: W3C Credentials CG <public-credentials@w3.org>
Message-ID: <212f8037-ae29-f993-49d7-2a2a0ababda5@digitalbazaar.com>
Hi all (bcc: W3C JSON-LD WG, VCWG, and Protocol Labs Research Division),

I promise that this is the last one for today. :)

This email is about the Hashlink specification, which is intended to be
a way to enforce that the content at a given hyperlink has not changed
since it was published.

The specification is intended to be a joint work product of the W3C
Digital Verification Community Group, JSON-LD WG (unofficial), VCWG
(unofficial), and the W3C Credentials Community Group. The CCG needs to
decide if it's adopting this specification as a Work Item.

Ok, so what's the problem and why do we care?

We do the following things in this community:

 * Link to JSON-LD Contexts
 * Link to ZKP Schemas
 * Link to evidence from Verifiable Credentials
 * Link to external data from DID Documents

... and we have no way to tell if the content at the end of that link
has changed since the Verifiable Credential was issued, or the DID
Document was created.


In the best case, verification of the cryptographic signature fails
(because the content at the end of the link changed). In the average
case, the signature doesn't fail, and we use data that may have been
modified. In the worst case, the data was modified by an attacker and
really bad things happen (like a developer being sloppy with how they
use JSON-LD Contexts in a financial system and the source/destination
fields being flipped in a financial transaction). There are mitigations
for all of these issues, but they require the developer to be aware of
the problem in the first place.

What would be great is if we could trust that when we use a hyperlink in
our systems (like a Verifiable Credential or a DID Document), and
digitally sign that hyperlink, that we're also signing the content at
the hyperlink. That's what the hashlink specification enables us to do.

Here's a simple example for the JSON-LD Context for the current DID
Spec. Here's the current link:

https://w3id.org/did/v0.11

You have no idea if the version you retrieved and the version I
retrieved are the same. Now, here's what it looks like when we secure it
with a "Legacy URL" Hashlink (blake2b 8 byte multihash + base58btc
multibase):

https://w3id.org/did/v0.11?hl=z3aq31uzgnZBuWNzUB

Now we both know that the version we download is the same (because of
the little bit at the end starting with "hl=", which stands for
Hashlink). Here's what it looks like as a (non-legacy) Hashlink URL:

hl:z3aq31uzgnZBuWNzUB:zpr1Xd34f3NYqfr1yMzb6TBCrWWrvJeGVRJGsUMMyVWXS8

Yes, that looks awful, but there are a number of redeeming
characteristics with the second form. The first is that the latter value
is optional -- you can discard everything after the second colon and
it's still a valid Hashlink URL. All you really need is
"hl:z3aq31uzgnZBuWNzUB" and you can get the data from anywhere else
(like different URLs, a local cache, etc.) and still thwart the attacks
listed above.

The other nice thing about Hashlink URLs, and this is the really
exciting bit, is that you can create multi-sourced hashlinks that span
completely different network architectures. Let's say that you had
information that you really wanted to make sure didn't disappear (like
the DID JSON-LD Context above), and you publish it at the following
locations:

https://w3id.org/did/v0.11

magnet:?xt=urn:btih:73C59D931B7E0C089C031D6CFE0D16AE

ipfs://ipfs/QmR7W4GQUFWDPMVrQfmNE8xJC6LoVAyaWeRnDp4gS9/did-v0.11

onion://pq6kufupl4mc43g2/didv0.11

The Hashlink URL would be really long, something like this:

hl:z3aq31uzgnZBuWNzUB:zeGVRWXS8TakFeJueF2bim3PaaDqbtqjkpxUc8ETS
WXe6dQLWXQWvqiUdw8TJrncx3uKhwfc88MtM5xZbR27FhVRUKv9ogekamVtdE3U
bXnXpMRT1AseCtoBUt1NE8x2SsnJxGfiZN45VVSCp6jh4dgcufL16tWrHREiSYE
SEGP1J75yXCvAdvKPr7nb5aY

... but the Hashlink URL above 1) ensures the integrity of the thing
you're linking to, and 2) still works if 3 out of the 4 networks listed
above failed. To put it another way, the link above could survive (for
example) the failure of the Web, BitTorrent, and Tor.

This approach also solves the following problems that other communities
in our orbit are having by:

* JSON-LD WG: Enabling backwards-compatible content integrity for JSON-
        LD Contexts.
* JSON-LD WG: Enabling non-Web-based, multi-sourced, compact, content
        integrity for JSON-LD Contexts.
* Sovrin: Enabling content integrity protected blockchain-based ZKP
        Schemas to be hosted on the Sovrin ledger (and Web-based
        locations) and referenced from Verifiable Credentials.
* IPFS: Enable organizations to dip their toes into IPFS as a "backup
        mechanism" w/o having to fully commit to jumping in with both
        feet.
* Verifiable Credentials WG: All the JSON-LD benefits above and the
        ability to digitally sign content integrity protected
        hyperlinks to evidence, terms of use, and any other information
        linked to from a Verifiable Credential.
* Credentials CG: Include links to data from DID Documents where the
        content at the links are secured by the cryptography of the
        ledger.

The specification is 9 pages long and can be found here:

https://tools.ietf.org/html/draft-sporny-hashlink-02

This email is a request to the CCG Chairs to add this to the next
meeting Agenda and for the CCG and adopt it as a work item.

-- manu

PS: For those wondering "Why not Magnet URIs, ni:/// URIs, or Resource
    Integrity Proofs?", there is a very short explanation here
    https://github.com/w3c-dvcg/hashlink#readme that I'm happy to
    elaborate upon if needed.

-- 
Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
Founder/CEO - Digital Bazaar, Inc.
blog: Veres One Decentralized Identifier Blockchain Launches
https://tinyurl.com/veres-one-launches
Received on Tuesday, 1 January 2019 21:37:08 UTC