Re: Thoughts on the LDS WG chartering discussion from Ivan Herman on 2021-06-10 (semantic-web@w3.org from June 2021)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 10 Jun 2021 15:29:55 +0200
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: semantic-web@w3.org
Message-Id: <A39B70F8-3186-4B24-AE76-264F5B63988E@w3.org>

> On 10 Jun 2021, at 14:49, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> On 6/10/21 8:04 AM, Ivan Herman wrote:
> 
>> 
>>> On 10 Jun 2021, at 13:55, Peter F. Patel-Schneider <pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>> wrote:
>>> 
>>> 
>>> How does "consistency" fit into this?  Every RDF graph (or datastore) is consistent.
>> 
>> I am sorry, wrong choice of words. If you want to check that the graph you retrieve from the data store has not been tampered with; e.g., by checking its hash.
>> 
>> An analogy is a number of open source sites where one can download an application and check the hash value of the downloaded package against the hash of the application announced somewhere.
>> 
>> Ivan
>> 
> It appears that the task here is for a sender to package up an RDF graph (or dataset) and send it to one or more receivers with the guarantee that the graph (or datatset) that the receivers produce is isomorphic to the original graph (or dataset).
> 
> The sender (open source site in your analogy) prepares a document that serializes this graph (or dataset).  This document is in some format where deserialization results in isomorphic graphs  (where execution of the code does the same thing on all computers).  The sender also provides a hash of the document.   Receivers download the document and the hash and check that the downloaded document matches the hash.   At some later date, receivers deserialize the document (run the code).

I think the situation I am referring to is different (and the analogy with the source code breaks down) insofar as the term 'sender' and 'receiver' is not really the good pair of terms.

The creator ("sender") of the data stores the graph in a triple store, and also provides through some means, the hash of that data (Eric can probably provide specific example in the health domain, ie, storing some protein related RDF data). The consumer ("receiver") extracts the data from the triple store at some point in time, and he/she has to calculate the hash to check the data integrity (ie, nobody has tampered with the data in the database or through the extraction).

Unless the triple store stores all the triples with the bnode identifiers as provided by the creator (which, afaik, rarely the case) the data provided to the consumer will have different identifiers and I do not see how the hash could be calculated by the consumer unless there is a canonical labeling of the bnodes.

Ivan

> 
> So no need for canonicalization.  No need for special processes to ensure that nothing bad has happened.  The only need is for a document format where deserialization of a document results in isomorphic graphs (or datasets).  That rules out most document formats for RDF graphs (or datsets) leaving N-Triples (or N-Quads).
> 
> 
> peter
> 
> 

----
Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43
ORCID ID: https://orcid.org/0000-0003-0782-2704

Received on Thursday, 10 June 2021 13:31:46 UTC