- From: David Booth <david@dbooth.org>
- Date: Thu, 10 Jun 2021 10:13:12 -0400
- To: semantic-web@w3.org
On 6/10/21 3:40 AM, Ivan Herman wrote: > . . . > If I "just" start by, say, a Turtle representation of a Graph, I > can of course convert that into canonical n-quads and hash the n-quads. > But if the same Turtle representation is converted by RDFLib (or any > other tool) into, God forbid, RDF/XML, the BNode identifiers will be > different, ie, the conversion of the RDF/XML to n-quads will be > different and, consequently, the hash will be different. *Unless the RDF > canonicalization assigns the canonical identifiers to the BNodes in the > process.* Yes, of course the hash will be different if you have not first canonicalized back to the canonical N-Quads format before checking the hash. But that's like saying that if you send a compressed file then the hash of the compressed file won't match the hash of the original file. Of course it won't: you need to decompress it before checking the hash. > So I am not really sure I actually understand your problem: you cannot > avoid a canonical relabeling of the BNodes in the general case. That is > what the abstract RDF canonicalization does: define canonical BNode > labels in a serialization independent manner. In my view, that is > absolutely necessary in general. I don't think that conclusion logically follows. Don't get me wrong, I see the value in defining a canonicalization algorithm that can be used on a whole family of RDF serializations, which is what the proposed algorithm does. But I do not see it as *necessary* to solve the problem. AFAICT only *one* canonical serialization -- such as canonical N-Quads -- is actually needed to enable any isomorphic RDF serialization to transmitted, given that we can already convert between various RDF serializations and obtain isomorphic datasets. (And to whatever extent our current serializations/libraries do not produce isomorphic results then that is a bug that needs to be fixed.) All the sender and receiver need to do is agree to compute the hash on a canonical N-Quads serialization of the RDF dataset that is transmitted, even if that RDF dataset is transmitted in a completely different serialization. In fact, if I've understood correctly, that's exactly what the proposed "RDF Dataset Hash (RDH)" algorithm does. In fact, in the proposed charter, I don't recall seeing the result of the abstract RDF Dataset canonicalization being used for *anything* other than to produce a canonical N-Quads serialization. That seems to me like pretty compelling evidence that the *abstract* canonicalization is not actually needed: only the canonical N-Quads serialization is really needed. So I don't understand your view that the *abstract* canonicalization is "absolutely necessary". I still feel like I am somehow missing a fundamental assumption that others are making and I have not yet been able to identify. Thanks, David Booth
Received on Thursday, 10 June 2021 14:15:27 UTC