- From: David Booth <david@dbooth.org>
- Date: Thu, 10 Jun 2021 01:27:36 -0400
- To: semantic-web@w3.org
Hi Phil, On 6/9/21 6:48 AM, Phil Archer wrote: > . . . > 1. Why is there any need to sign a graph and > not just the bytes? See the explainer document at > https://w3c.github.io/lds-wg-charter/explainer.html#noProblem > for the answer to this. Sorry to belabor this, but I read the explainer document, and I still do not see an answer to this question. The section you referenced refers to the "Constrained data transfer" use case and the "Space-efficient verification of the contents of Datasets" use case. It concludes: "In these scenarios, a signature on the original file, such as a JSON signature on a JSON-LD file, is not appropriate, as the conversion will make it invalid." I assume "the conversion" means the conversion of the original JSON-LD file to a different RDF serialization, and that sentence is pointing out that the hash of the original JSON-LD file will not match a hash of the different serialization. But clearly the hash should be taken of a *canonicalized* original, and when it is converted to a different serialization, the recipient must re-canonicalize it before checking the hash. This is, in essence, exactly what "RDF Dataset Hash (RDH)" in the charter does anyway. To my mind, this is analogous to computing the hash of an arbitrary file, compressing the file for transmission (which puts it into a different serialization that is informationally equivalent), and then having the recipient decompress the file before verifying the hash. Serializing RDF to a non-canonical form is analogous to compression: you have to put it back to the canonical form (analogy: decompress it) before checking the hash. I agree that to make this work, a canonical RDF *serialization* is needed. But I do not see the need to canonicalize the *abstract* RDF Dataset (though it is nice to have the canonicalization algorithm defined in a way that allows it to be easily applied to several RDF serializations). In fact, the proposed "RDF Dataset Hash (RDH)" is actually computed on the canonicalized *serialization* anyway. It is NOT computed directly on the abstract RDF dataset. And if the hash is computed on those serialized bytes anyway, then really it is the serialized bytes that are being signed. It is a leap of faith to believe that the signing of those RDF bytes indicates that the signer agrees with the semantic content that those bytes represent. But that is the same leap of faith that we take when the bytes represent a PDF document that was digitally signed. The leap of faith is that we can interpret those bytes as they were semantically intended -- either as RDF, PDF or whatever. So unfortunately I seem to be missing a fairly fundamental point here, because I am still not understanding what benefit is to be gained by restricting the source documents to RDF. Why not allow them to be other kinds of documents also, such as PDF? Or, to recast my question in terms of Manu's summary: On 6/6/21 4:52 PM, Manu Sporny wrote: > 1. Define a generalized canonicalization mechanism for > abstract RDF Datasets. > > 2. Define a way of serializing and hashing the > canonicalized form from #1. > > 3. Define a way of expressing digital signatures (proofs) > using the hashed form of the RDF Dataset from #2. Why do the digital signatures need to be restricted to using the hash of canonicalized RDF, as opposed to using the hash of, say, a PDF document? Wouldn't people want to digitally sign PDF documents too? Why shouldn't they use the same RDF digital signature vocabulary to talk about PDF documents instead of RDF Datasets? I feel like I'm missing some fundamental assumption in your intended use case. Thanks, David Booth
Received on Thursday, 10 June 2021 05:28:13 UTC