Re: Reification and Provenance modelling from Richard Cyganiak on 2011-09-20 (public-rdf-comments@w3.org from September 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 20 Sep 2011 22:30:56 +0100
To: Bob Ferris <zazi@smiy.org>
Cc: public-rdf-comments@w3.org
Message-Id: <639D26B7-E57F-49D7-9D5E-F9741335FB22@cyganiak.de>
On 20 Sep 2011, at 19:28, Bob Ferris wrote:
> (albeit, I get the impression that I cannot really convince you from my proposal ;) )

I'm not so much interested in proposals at this time, but in use cases and requirements. That's because this group needs to understands what people are trying to achieve. Otherwise we can't effectively compare different proposals.

> On 9/20/2011 5:16 PM, Richard Cyganiak wrote:
>> I would assume that the default graph contains all triples regardless of their named graph.
> 
> So far I do not have seen a triple store, which duplicates all statements in its default graph

Most of them do this AFAIK. I'm pretty sure it's the default in Virtuoso, and we're running TDB in that configuration. I'm pretty sure that I've seen it for 4store as well. This scenario is explicitly pointed out as a “useful arrangement” in the SPARQL 1.1 spec:

http://www.w3.org/TR/sparql11-query/#exampleDatasets

> i.e., this would break a bit the concept of name graphs, e.g., imagine if I have a named graph with all my personal data, I wouldn't be happy if this data is also query-able via the default graph.

Access control is an orthogonal issue. If you have a way of specifying access control to named graphs, then I would expect the store to exclude them from the default graph if the client is not authorized to see them. In standard SPARQL, if your default graph is public, then so are all your named graphs.

>> Then a statement identifier approach could be queried like this:
>> 
>> SELECT * WHERE {
>>    TRIPLE ?t { ?s ?p ?o }
>>    ...
>> }
> 
> I do not think that we would need such a TRIPLE keyword.

How else would you bind a variable to a statement identifier? For example, “give me the statement identifier for the triple {<bob> a foaf:Person}”?

> My use case of my proposal is reification and how to relate single statements a.k.a. shortcut relations to its reification class instances.

Now we're getting somewhere. Can you explain why this use case of property reification isn't well-addressed by named graphs? An example might help.

>>> To make statements about them somewhere else we usually need an identifier to refer to them, or?
>> 
>> No, because graphs are literals, so one can repeat the literal to make statements about it.
> 
> Well, then I have the same disadvantage as in the existing Named Graph proposal, i.e., statements of one named graph do not have any semantically relation to identical statements of another named graphs.

That's not true. The semantic relation between the statements is that they're identical. It's like using the literal number 1 in two different graphs, or the string "Bob". We don't need to assign an identifier to these literals in order to know that they're the same. Literals are self-denoting in RDF.

Just to repeat this: If RDF had graph literals or “triple literals”, and the same literal occurred in two different graphs, then the design of RDF literals requires that they'd have to match if you asked a query.

>> Occurrences of the same literal in different graphs are semantically equivalent (unlike, say, blank node identifiers).
> 
> Do really intend this always?

It's definitely how literals are defined in RDF. I didn't perceive any problem with that so far.

> I don't think so, see my example above. Hence, we have to cover both cases.

Not sure what you mean here. I don't understand the case where you sometimes would want 1 and 1 to be identical and sometimes not.

>> RDF graphs and named graphs are abstract data models, and implementers are free to store them any way they want internally,
> 
> Yes, I know. However, why do we talk nowadays about quad stores instead of triple stores.

We talk about “graph stores” and “SPARQL stores” too. That's what they are storing in an abstract sense, considering the interface they present to the world. This doesn't mean that they are internally organized in any particular way. (Some “quad stores” are actually column stores, and some are quint stores etc)

>> I'm still trying to understand what the perceived problem with single-triple named graphs is.
> 
> Real world knowledge description are then, at the moment with the existing SPARQL specification, not really query-able, if we have many isolated single-triple named graphs.

I don't understand what this means. Can you give me an example of such a knowledge description, and an example query that you cannot express in SPARQL if the data is organized in single-triple named graphs?

(There should be a law that forbids invoking the “real world” in an argument unless you give a real-world example ;-)

>> Regarding #2, it's probably false because the RDF abstract syntax does not constrain implementations, and I'm unconvinced that an optimized implementation of your scheme would actually be more space-efficient than an optimized implementation of named graphs.
> 
> Well, the current Named Graphs semantics (as defined by Bizer et al.) say (more or less) that equal statements in separate graphs do not have any relation to each other. As you said above the literal-graph proposal treat equal literals as equal (without any identifier). Both proposals do not really reflect real world need, where would need to be able to represent both options as needed.

How would you represent these two options using statement identifiers? AFAICT two statements with different statement identifiers would have the same relationship that two single-triple named graphs have, or is there some difference that I'm missing?

> Albeit, maybe graphs that contain the same statements are an edge case, however, we have to be able to represent edge cases as well. The power of its expressiveness is so far the success of RDF.

I tend to agree, and my argument is that single-triple named graphs actually work reasonably well as an edge case (while having the advantage that they also work very well for the common case).

>>> However, I believe that there is a strong antipathy for single-triple graphs.
>> 
>> This is not a technical argument.
> 
> The technical argument is that one of the bad query handling with single-triple graphs (see above).

You mean stores that don't support mirroring the named graphs into the default graph? That's not a complaint about the proposal, but a complaint about the state of implementations; and that's something we can't fix by writing something else into the spec.

Best,
Richard
Received on Tuesday, 20 September 2011 21:31:26 UTC