Re: Reification and Provenance modelling from Bob Ferris on 2011-10-30 (public-rdf-comments@w3.org from October 2011)

From: Bob Ferris <zazi@smiy.org>
Date: Sun, 30 Oct 2011 14:38:37 +0100
To: public-rdf-comments@w3.org
Message-ID: <4EAD535D.8080209@smiy.org>
Hi,

I'm just following the continuous dataset context discussion on the 
public-rdf-wg mailing list and observed a crucial statement that was 
made by Pat Hayes [1]:


On 13 October 2011 14:29, Pat Hayes <phayes@ihmc.us> wrote:

 >> That's not really much of a problem, because it just means that you
 >> have to keep triples of different context apart in separate graphs.
 >
 > No. You *should* compose your data so that data can be merged. That
 > is the entire purpose of the Semantic Web design. Without that, all
 > of linked data is just a bunch of isolated DB table fragments in a
 > poor notation.

which directly reminds me to the (optional (!)) applicability of 
statement identifiers to resolve this use case. Especially the discussed 
"foaf:age"-issue seems to be a perfect use case for statement 
identifiers to describe external context of time and provenance (from my 
POV).

An example:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.com> .
@prefix is: <http://purl.org/ontology/is/core#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix prov: <http://www.w3.org/ns/prov-o/> .


# (merged) assertions about a person in the "default" graph
ex:APerson a foaf:Person .
ex:APerson foaf:age 14 ex:StId1 . # optional utilisation of a statement 
identifier
ex:APerson foaf:age 15 ex:StId2 .

# external context about the statement that can be identified via the 
statement identifier ex:StId1
ex:StId1 is:info_service ex:Facebook ;
	dcterms:modified "2010-06-23"^^xsd:dateTime ;
	prov:wasDerivedFrom ex:StId3 .

# external context about the statement that can be identified via the 
statement identifier ex:StId2
ex:StId2 is:info_service ex:MySpace ;
	dcterms:modified "2011-01-14"^^xsd:dateTime ;
	prov:wasDerivedFrom ex:StId4 .

# the graph of the information service Facebook
<#facebookGraph>  { ...
	ex:APerson foaf:age 14 ex:StId3 .
... }

# the graph of the information service MySpace
<#mySpaceGraph>  { ...
	ex:APerson foaf:age 15 ex:StId4 .
... }


Statement identifier assertions could maybe separated into a 
provenance/external context graph.

Cheers,


Bo


[1] http://lists.w3.org/Archives/Public/public-rdf-wg/2011Oct/0228.html


On 9/22/2011 7:49 PM, Richard Cyganiak wrote:
> Hi Bob,
>
> On 21 Sep 2011, at 12:11, Bob Ferris wrote:
>> I think that the important use cases are already covered in [1]. My specific one is powered by multiple information providers and requires an access control mechanism. Especially important for that use case is to be able to push back changes to its origins, i.e., if I have a resource description that is aggregated by information from multiple information providers, I need to know which statement is from which information provider and, furthermore, if single statements are spread over multiple graphs (views), I need to be able to handle changes on these statements as well.
>
> Load the data from each information provider into a separate graph. Then create a single-triple graph for each triple, and assert {?g ex:isOriginalSourceOfTriple ?t} between the original graph and the single-triple graph. Whenever you merge or aggregate multiple graphs into a new graph, assert a new triple {?new_graph ex:containsTriple ?t}. This allows tracking of any triple back to its original source in order to update it.
>
>>>>> SELECT * WHERE {
>>>>>     TRIPLE ?t { ?s ?p ?o }
>>>>>     ...
>>>>> }
> …
>> SELECT ?t WHERE {
>> 	?s ?p ?o ?t }
>
> Both of these examples are equivalent except for order and an extra keyword and punctuation. There is no difference in complexity.
>
>>>> My use case of my proposal is reification and how to relate single statements a.k.a. shortcut relations to its reification class instances.
>>>
>>> Now we're getting somewhere. Can you explain why this use case of property reification isn't well-addressed by named graphs? An example might help.
>>
>> I don't want to scramble this information into separate graphs, i.e., shortcut relations and reification class instances should be able to co-exist in one and the same graph.
>
> You can leave everything in the original graph, and in addition create a new single-triple graph that contains only the reified triple. Use the graph IRI of the single-triple graph in place of a statement identifier.
>
>>>>>> To make statements about them somewhere else we usually need an identifier to refer to them, or?
>>>>>
>>>>> No, because graphs are literals, so one can repeat the literal to make statements about it.
>>>>
>>>> Well, then I have the same disadvantage as in the existing Named Graph proposal, i.e., statements of one named graph do not have any semantically relation to identical statements of another named graphs.
>>>
>>> That's not true. The semantic relation between the statements is that they're identical. It's like using the literal number 1 in two different graphs, or the string "Bob". We don't need to assign an identifier to these literals in order to know that they're the same. Literals are self-denoting in RDF.
>>
>> Okay, you are right. However, graphs can be more complex than a simple number- or string-typed literal. Furthermore, we would utilise these graphs for further processing of our model. Usually a literal can be seen as a kind of leaf in a graph representation, or?
>
> This has nothing to do with the original question asked above. You still don't need an identifier to refer to a graph literal, because it's a literal, and they are self-denoting. The complexity of the literal doesn't matter for this as long as equality is well-defined (and it is for RDF graphs).
>
>> Quoted from [2]:
>>
>> "one can also decouple a reused statement by changing its statement
>> identifier; i.e., the triple of the statement are still the same
>> but the relation to the original statement might now be another e.g.,
>> reflected by a provenance statement e.g.,<#s20>  :original<#s19>"
>>
>> i.e. if I intend that an utilised statement in multiple graphs belongs semantically together, so that I really refer to that statement, then I'll utilise the same statement identifier; otherwise, I'll utilise a different statement identifier (and if necessary I can still relate these statements to each other).
>
> You can do the same with single-triple graphs.
>
>> Let's imagine the following use case: you are trying to implement an algorithm that ranks information from multiple information providers. Before the aggregation and federation task, you would usually store the information fetched from different information providers separately. Therefore, you could utilise Named Graphs and statement identifiers. Different information providers can provide the same information, i.e., the same statements. However, to keep track of their origin you will maybe address them by different statement identifiers at the beginning.
>
> The scheme I described in the beginning of this message could be used to handle this situation with named graphs.
>
>>>> Real world knowledge description are then, at the moment with the existing SPARQL specification, not really query-able, if we have many isolated single-triple named graphs.
>>>
>>> I don't understand what this means. Can you give me an example of such a knowledge description, and an example query that you cannot express in SPARQL if the data is organized in single-triple named graphs?
>>
>> Let's take the multiple information providers scenario. If I would store the federated information still in separate graphs to keep track of the provenance, an information resource would not really be query-able, because single statements are isolated into separate graphs. (please keep the statement duplication proposal aside here)
>
> Why would I keep that aside? It's how you solve that problem in SPARQL + single-triple graphs. My question was for an example that cannot be solved in SPARQL + single-triple graphs.
>
>> However, by utilising statement identifiers I can still track the provenance and single statements are not scrambled into separate graphs and I can easily query this information by specifying the graph that contains all these statements.
>
> You can do all that too by creating single-triple graphs and ex:originalSourceOfTriple/ex:containsTriple, as described in the beginning of this mail.
>
>>> How would you represent these two options using statement identifiers?
>>
>> Here is an example (following the syntax as introduced in [2]):
>
>>
>> <#alice>  :friend<#bob>  <#s1>  . # a statement that can be identified by statement identifier #s1
>> <#alice>  :friend<#bob>  <#s2>  . # a statement that can be identified by statement identifier #s2
>>
>> <#g1>  rdf:type rdfg:Graph<#s3>  ;
>> <#g1>  :contains<#s1>  <s#4>  . # a graph that contains the statement #s1
>>
>> <#g2>  rdf:type rdfg:Graph<#s5>  ;
>> <#g2>  :contains<#s1>  <s#6>  . # another graph that contains the statement #s2
>>
>> <#g3>  rdf:type rdfg:Graph<#s7>  ;
>> <#g3>  :contains<#s2>  <s#8>  . # a graph that contains the statement s#2
>>
>> #g1 and #g2 contain the same statement (#s1)
>> #g3 contains another statement (#s2)
>
> Same with single-triple graphs:
>
>     <#s1>  {<#alice>  :friend<#bob>  }
>     <#s2>  {<#alice>  :friend<#bob>  }
>
>     <#g1>  {<#alice>  :friend<#bob>  }
>     <#g2>  {<#alice>  :friend<#bob>  }
>     <#g3>  {<#alice>  :friend<#bob>  }
>
>     <#metadata>  {
>       <#g1>  ex:containsTriple<#s1>.
>       <#g2>  ex:containsTriple<#s1>.
>       <#g3>  ex:containsTriple<#s2>.
>     }
>
>>>>>> However, I believe that there is a strong antipathy for single-triple graphs.
>>>>>
>>>>> This is not a technical argument.
>>>>
>>>> The technical argument is that one of the bad query handling with single-triple graphs (see above).
>>>
>>> You mean stores that don't support mirroring the named graphs into the default graph?
>>
>> I intended to address the query-ness issue, i.e., scramble information (caused by "unnecessary" graph isolations) vs. composed information (produced by the utilisation of statement identifiers and statements that are de-coupled from its graph enclosure).
>
> I maintain my claim that everything you can do with statement identifiers, you can do easily with single-triple named graphs too. I still have not seen anything that convinces me that your proposed scheme works any better than what we already can do with SPARQL today.
>
> Best,
> Richard
>
>
>
>>
>>> That's not a complaint about the proposal, but a complaint about the state of implementations; and that's something we can't fix by writing something else into the spec.
>>>
>>> Best,
>>> Richard
>>
>> Cheers,
>>
>>
>> Bo
>>
>>
>> [1] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs-UC
>> [2] http://lists.w3.org/Archives/Public/public-rdf-comments/2011Jan/0001.html
Received on Sunday, 30 October 2011 13:39:32 UTC