Re: hashlinks vs trusty URIs from Ivan Herman on 2020-06-08 (public-credentials@w3.org from June 2020)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 8 Jun 2020 09:16:57 +0200
To: Kim Hamilton <kimdhamilton@gmail.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-Id: <50909C8C-5732-4EB8-AA1E-81064C202568@w3.org>
Kim,

I am sure that Manu will also chime in:-) but I maybe I can contribute a bit.

> On 8 Jun 2020, at 01:03, Kim Hamilton <kimdhamilton@gmail.com> wrote:
> 
> ok, I missed that trusty URIs require an actual transformation in some cases, e.g:
> 
> >  To support self-references, i.e. resources that contain their own trusty URI, the generation process involves not just to compute the hash from a given artifact but to actually transform the artifact into a new version that contains the newly generated trusty URI.
> 
> That's definitely not desirable. Ivan and Manu -- the approach you describe makes sense and is consistent with the current approach for LD proofs (the canonicalization step). I hadn't seen discussion of it yet, and I'm interested in joining wherever those conversations are happening. I'm also interested in learning more about what Ivan mentioned about XML signatures. 
> 

For reference: https://www.w3.org/TR/xmldsig-core/ <https://www.w3.org/TR/xmldsig-core/>

But it is "simply" a vocabulary (in "pure" XML, not in RDF) to express, well, the data for signatures. Whether that vocabulary is still appropriate or not is not for me to tell; what I see (without being an expert, far from it) is that the various signature methods change very frequently, so newer approaches may not be covered. The latest revision of that standard is from 2013, which may be considered as ancient history in this area...

> To back up, I don't have one specific question, rather a category of unknowns. Here's the context:
> 
> In VC-EDU we have a number of data standards that have historically been written in XML. Since RDF can be serialized as XML, we've not worried too much about the emphasis on JSON-LD (compared to XML) in VCs. But the time for hand-waving around this issue is past, so there are a variety of issues (ranging in depth):
> The VC data model lists certain syntaxes (JSON, JSON-LD), and while it's clear that's not meant to be exhaustive, it doesn't have recommendations for adding new syntaxes. Can we just do it? Or do we need to add some sort of extension? We'd like to have certain things hosted as official w3c artifacts (e.g. XML schemas), so maybe we just need to worry about the latter category of artifacts
I believe VC has been defined as an abstract model with JSON-LD as one serialization. As far as I am concerned, JSON-LD is "simply" a serialization of RDF. Well… not exactly: it is a serialization of RDF Datasets, although I am not sure VC uses the concept of datasets. 

(If this distinction is indeed important for the use cases at hand, then RDF/XML may not be o.k.; whilst Turtle has an official extension towards datasets (TriG), and JSON-LD covers RDF Datasets out of the box, RDF/XML has never been extended in this direction. Neither has RDFa. There exist a document looking into it for RDF/XML: https://www.w3.org/Submission/rdfsource/ <https://www.w3.org/Submission/rdfsource/> but I do not think it had any follow up).

> Are there other groups focused on XML/RDF signatures and tooling (using similar approaches to our JSON-LD proofs)? Basically we want to understand if we should join existing efforts or build something new?
Not that I know of although, back in the days, many were saying that XML Signatures ought to cover RDF/XML as well. The fact of the matter, however, is that RDF/XML, as a syntax for RDF, has been fallen into disgrace in the past decade, in favor of Turtle, RDFa, and JSON-LD. I do not see this trend being reversed.

There is a major roadblock, though, which is syntax-independent and inherently RDF related: all signatures rely on a canonical version of an RDF graph or dataset. Defining a canonical version of a graph without a restriction on the usage of BNodes is a major mathematical challenge. This has been known to be a problem for a long time and was unsolved for many years. Today, we know that there is an algorithm put forward by D. Longley and friends, which also has a public implementation, and there is a very similar algorithm published (although not deployed as code) by another expert named Adrian Hogan. That is it… I do not know of any other mathematical algorithms. Alas!, none of the two are currently standards in the W3C sense yet, they still have to undergo the rigor of a mathematical vetting and of the W3C process to be stamped as a standard. (There are currently discussions to go down that road, Manu and I are busy trying to set that up, but it goes more slowly than we would like that to happen).

If that problem is finally settled, we can also have a full standard for signing (and possibly encrypting?) RDF datasets. I would expect the two issues (ie, canonicalization and signature vocabulary) to be handled by the same Working Group. (I am committed to lead this through the W3C process if I am still around…)

> In theory, it seems like if the signature is computed on the RDF graph, it should preserve across XML/RDF and JSON-LD/RDF, but this is an example of something we've been hand-waving about and need to ensure.
The works cited above are on abstract RDF Datasets. Ie, it is syntax independent, and must be usable for any serialization: Turtle/TriG, JSON-LD, etc.

I hope this helps…

Ivan

> The question about hashlinks vs trusty uris was really a rathole on this fork of investigation, so ignore that for now.
> Some examples of where these issues are arising:
> The EDCI effort is using XML VCs to comply with eIDAS legal signature requirements, but they don't have anything official from w3c to base that on and would like guidance
> PESC/XML transcripts are very widely used in North America. They had been focused on mapping to JSON-LD, but would be interested in whether we're providing proper XML support
> Maybe a topic for a future CCG call? A lot to unpack here...
> 
> Thanks,
> Kim
> 
> On Sat, Jun 6, 2020 at 7:19 AM Manu Sporny <msporny@digitalbazaar.com <mailto:msporny@digitalbazaar.com>> wrote:
> On 6/6/20 3:28 AM, Ivan Herman wrote:
> > I would think having a separate vocabulary to make statements like
> > 
> > <graph URI> <:hasHash> "hash value" .
> 
> My read on the paper is the same as Ivan's read.
> 
> The cleaner solution is to annotate the RDF graph, like the above, and
> is effectively what the Linked Data Proof stuff does (as a part of graph
> canonicalization).
> 
> Modifying the RDF graph or transforming it is what was being done for a
> decade+ before Dave Longley invented the generalized solution.
> Modification of an RDF graph to hash it has terrible complexity
> consequences on software that needs to use the modified graph and
> determine if that modified graph is the same one that is sitting on a
> local system. In short, you create a very complex transformation and
> comparison issue when you modify RDF graphs in order to hash them (or
> refer to them using hashes).
> 
> To provide an alternative, the RDF Dataset Canonicalization Algorithm
> canonicalizes the RDF graph in a way that a hash can be generated for it
> without having to modify the original information. That hash could be
> paired with hashlinks, but I'm struggling to understand the specific use
> case (and don't have the spare cycles to put further thought into it
> this moment).
> 
> I only had about 15 minutes to read through the Trusty URIs paper (first
> time I had heard of it, nice arxiv archeology work!). My take away is
> that it does things that are unnecessary with the solutions we have
> available to us today. The basic generalized RDF hashing building blocks
> are there via the RDF Dataset Canonicalization Algorithm. That hash can
> then be used by any technology that can express a hash and metadata
> about that hash (Linked Data Proofs/Signatures, Hashlinks, Magnet URIs,
> Named Information, etc.)
> 
> Understanding the specific use case you're going after might help... or
> if you think Trusty URIs can do something that can't be done with the
> current generalized tooling we have?
> 
> -- manu
> 
> -- 
> Manu Sporny - https://www.linkedin.com/in/manusporny/ <https://www.linkedin.com/in/manusporny/>
> Founder/CEO - Digital Bazaar, Inc.
> blog: Veres One Decentralized Identifier Blockchain Launches
> https://tinyurl.com/veres-one-launches <https://tinyurl.com/veres-one-launches>
> 


----
Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43
ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Monday, 8 June 2020 07:17:04 UTC