Re: Thoughts on the LDS WG chartering discussion from Peter F. Patel-Schneider on 2021-06-10 (semantic-web@w3.org from June 2021)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 10 Jun 2021 11:34:51 -0400
To: Ivan Herman <ivan@w3.org>
Cc: semantic-web@w3.org
Message-ID: <8b870322-39ec-7f7a-adfc-62b96f9bc7f9@gmail.com>

On 6/10/21 9:29 AM, Ivan Herman wrote:

>
>
>> On 10 Jun 2021, at 14:49, Peter F. Patel-Schneider <pfpschneider@gmail.com 
>> <mailto:pfpschneider@gmail.com>> wrote:
>>
>> On 6/10/21 8:04 AM, Ivan Herman wrote:
>>
>>>
>>>> On 10 Jun 2021, at 13:55, Peter F. Patel-Schneider 
>>>> <pfpschneider@gmail.com <mailto:pfpschneider@gmail.com> 
>>>> <mailto:pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>>> wrote:
>>>>
>>>>
>>>> How does "consistency" fit into this?  Every RDF graph (or datastore) is 
>>>> consistent.
>>>
>>> I am sorry, wrong choice of words. If you want to check that the graph you 
>>> retrieve from the data store has not been tampered with; e.g., by checking 
>>> its hash.
>>>
>>> An analogy is a number of open source sites where one can download an 
>>> application and check the hash value of the downloaded package against the 
>>> hash of the application announced somewhere.
>>>
>>> Ivan
>>>
>> It appears that the task here is for a sender to package up an RDF graph 
>> (or dataset) and send it to one or more receivers with the guarantee that 
>> the graph (or datatset) that the receivers produce is isomorphic to the 
>> original graph (or dataset).
>>
>> The sender (open source site in your analogy) prepares a document that 
>> serializes this graph (or dataset).  This document is in some format where 
>> deserialization results in isomorphic graphs  (where execution of the code 
>> does the same thing on all computers).  The sender also provides a hash of 
>> the document.   Receivers download the document and the hash and check that 
>> the downloaded document matches the hash.   At some later date, receivers 
>> deserialize the document (run the code).
>
> I think the situation I am referring to is different (and the analogy with 
> the source code breaks down) insofar as the term 'sender' and 'receiver' is 
> not really the good pair of terms.
>
> The creator ("sender") of the data stores the graph in a triple store, and 
> also provides through some means, the hash of that data (Eric can probably 
> provide specific example in the health domain, ie, storing some protein 
> related RDF data). The consumer ("receiver") extracts the data from the 
> triple store at some point in time, and he/she has to calculate the hash to 
> check the data integrity (ie, nobody has tampered with the data in the 
> database or through the extraction).

>
> Unless the triple store stores all the triples with the bnode identifiers as 
> provided by the creator (which, afaik, rarely the case) the data provided to 
> the consumer will have different identifiers and I do not see how the hash 
> could be calculated by the consumer unless there is a canonical labeling of 
> the bnodes.
>
>
> Ivan
>
Why not just ask the triple store system whether there have been updates?

OK, if you have a triple store that doesn't provide this information or the 
triple store doesn't expose unchanging internal bnode identifiers or you don't 
trust the triple store *and* you want to know whether the data in the store is 
isomorphic to some known RDF graph *and* you don't want to keep around this 
known RDF graph then you need to compute canonical hashes.

But this is nothing like verifiable transmission of RDF graphs. And there is 
still no need to transmit or even compute some lossy serialization of either 
graph.


peter

Received on Thursday, 10 June 2021 15:40:34 UTC