Re: Thoughts on the LDS WG chartering discussion from Ivan Herman on 2021-06-10 (semantic-web@w3.org from June 2021)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 10 Jun 2021 12:28:44 +0200
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: semantic-web@w3.org
Message-Id: <40304CC9-A41E-419E-86D3-6B0A53E686E9@w3.org>
Graphs/Datasets are more often than not store in datastores, triple stores, knowledge graphs, you name it. The serialization format used to feed the triple store is irrelevant, and clients of such triple stores may request the data in different serialization format that suits their needs. If the consistency of such graphs (ie, set of triples or quads in the triple store) has to be checked via, say, a hash, then the approach you are describing does not work, due to the problem of bnode labels: triplestores are free to relabel the bnodes of incoming graphs and producing new labels when they export them.

Also: isomorphic graphs do not have the same hash value, because graphs may be ony b isomorphic via a suitable relabeling of bnodes.

Ivan

> On 10 Jun 2021, at 12:05, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> On 6/10/21 3:40 AM, Ivan Herman wrote:
> 
> [...]
> 
>> 
>> But. If I "just" start by, say, a Turtle representation of a Graph, I can of course convert that into canonical n-quads and hash the n-quads. But if the same Turtle representation is converted by RDFLib (or any other tool) into, God forbid, RDF/XML, the BNode identifiers will be different, ie, the conversion of the RDF/XML to n-quads will be different and, consequently, the hash will be different. *Unless the RDF canonicalization assigns the canonical identifiers to the BNodes in the process.*
> 
> I really don't understand this point.  If I start with a Turtle document, just send the Turtle.  Well, except for the problem that deserializing Turtle documents doesn't always produce isomorphic graphs.  But the solution to this is easy, just use a format that always produces isomorphic graphs.  Send that.  No canonicalization necessary as each deserialization will produce an isomorphic graph.  And the hash is done on the document itself so standard methods for verifiable transmission of documents can be used without modification.
> 
> If the starting point is a document in some other format, have the sender convert it to the appropriate format using the environment that the sender considers appropriate and send the resulting document.  If the starting point is an actual RDF graph, serialize the graph in the appropriate format and send the resulting document.  In each case, because deserialization in the document format produces isomorphic graphs, the recipient will end up with a graph isomorphic to the graph that the sender wanted to send.
> 
> Which document format to use?  As far as I can tell, N-Triples (N-Quads) is the only document format where deserialization produces isomorphic RDF graphs (datasets).  Well, except for case normalization of language tags.
> 
>> So I am not really sure I actually understand your problem: you cannot avoid a canonical relabeling of the BNodes in the general case. That is what the abstract RDF canonicalization does: define canonical BNode labels in a serialization independent manner. In my view, that is absolutely necessary in general.
> 
> But there is no need for this if all you are trying to do is to verifiable transmission of isomorphic RDF graphs.
> 
>> 
>> Ivan
>> 
> [...]
> 
> peter
> 
> 
> 


----
Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43
ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Thursday, 10 June 2021 10:30:06 UTC