Re: Reification and Provenance modelling

Hi Richard,

(albeit, I get the impression that I cannot really convince you from my 
proposal ;) )

On 9/20/2011 5:16 PM, Richard Cyganiak wrote:
> On 20 Sep 2011, at 11:08, Bob Ferris wrote:
>> Just imagine a triple store full of single-triple graphs. Querying this triple store might really getting complex, or?
>
> You're talking about querying with SPARQL? This is a bit out of scope here as we can't change SPARQL anyways,

Yes, but at the end everything has to work easily in cooperation. SPARQL 
relies on RDF.

> but I don't see how there's a difference in query complexity between single-triple graphs and statement identifiers.

Well, I guess we wouldn't really query in the way of query proposals 
below. We maybe would utilise the statement identifier in the subject 
position to retrieve some provenance information. We maybe also would 
utilise the statement identifier in the object position to retrieve all 
graphs (via its identifiers) that contain these statements. Furthermore, 
I'm able to utilise the GRAPH keyword as usual.

> I would assume that the default graph contains all triples regardless of their named graph.

So far I do not have seen a triple store, which duplicates all 
statements in its default graph, i.e., this would break a bit the 
concept of name graphs, e.g., imagine if I have a named graph with all 
my personal data, I wouldn't be happy if this data is also query-able 
via the default graph.

> Then a statement identifier approach could be queried like this:
>
> SELECT * WHERE {
>     TRIPLE ?t { ?s ?p ?o }
>     ...
> }

I do not think that we would need such a TRIPLE keyword.

>
> (You could perhaps tweak the syntax to shave off a few characters.)
>
> And the single-triple graphs can be addressed like this:
>
> SELECT * WHERE {
>     GRAPH ?g { ?s ?p ?o }
>     ...
> }
>
> Verbosity aside, I don't see a difference in complexity.
>
>> I guess, nobody really wants to isolate single triples in separate graphs, or?
>
> Well, apparently some people want triple-level metadata, and named graphs support triple-level metadata.

Yes, but in a bad way. My use case of my proposal is reification and how 
to relate single statements a.k.a. shortcut relations to its reification 
class instances.

>
>>>> The simple graph literals proposal [4] looks a bit more elegant, however, these graphs have still no identifier (from my POV).
>>>
>>> Why is this a problem? Note that you can make statements about them.
>>
>> To make statements about them somewhere else we usually need an identifier to refer to them, or?
>
> No, because graphs are literals, so one can repeat the literal to make statements about it.

Well, then I have the same disadvantage as in the existing Named Graph 
proposal, i.e., statements of one named graph do not have any 
semantically relation to identical statements of another named graphs. 
This is in general a good design decision, since different people can 
share same interest etc. However, our knowledge representation language 
must also be able to reutilise a statement if someone intends to do so, 
i.e., if some wants to ensure the provenance of that statement and its 
handling in the information flow lifecycle (changes, deletions, ...).

> Occurrences of the same literal in different graphs are semantically equivalent (unlike, say, blank node identifiers).

Do really intend this always? I don't think so, see my example above. 
Hence, we have to cover both cases.

>
>>>> All these proposals cannot deal with the "Slicing datasets according to multiple dimensions" [5].
>>>
>>> I don't think that's true. The same triple can exist in multiple graphs. Nothing stops a triple store from providing different views on the same set of triples.
>>
>> Yes, of course. However, in the existing proposals we would simply duplicate the data
>
> You keep saying that but I don't think it's true. RDF graphs and named graphs are abstract data models, and implementers are free to store them any way they want internally,

Yes, I know. However, why do we talk nowadays about quad stores instead 
of triple stores. Usually people assume that the forth position is the 
intended for the named graph identifier (if we go further the fifth is 
for the internal statement identifier, ...).

> including space-efficient storage. Quoting:
>
> [[
> This abstract syntax is the syntax over which the formal semantics are defined. Implementations are free to represent RDF graphs in any other equivalent form.
> ]] – http://www.w3.org/TR/2011/WD-rdf11-concepts-20110830/#section-Graph-syntax
>
> So whether data is duplicated internally, or whether a storage scheme is used that internally uses triple identifiers and represents graphs as list of those, is entirely up to the implementation.
>
> (Think of SQL views. In the abstract relational model, they contain data – but that data is merely computed on demand from underlying base tables. Implementations *may* materialize the view or an index over the view to speed up queries, but that doesn't mean that the view model forces the duplication of data. Named graphs could be views on other graphs in the same dataset.)
>
>> Well, I guess that I outlined already the disadvantages of these proposals (at least from my POV),
>
> You mentioned two things, as far as I can see:
>
> 1. named graphs don't deal well with single-triple graphs
> 2. having the same triple in multiple graphs is not space-efficient
>
> Regarding #1, you haven't shown anything to back that up.

My proposal for this issue to make use of statement identifier.

> I'm still trying to understand what the perceived problem with single-triple named graphs is.

Real world knowledge description are then, at the moment with the 
existing SPARQL specification, not really query-able, if we have many 
isolated single-triple named graphs.

> You have not explained the problem besides saying that you don't like the approach, which – with all due respect – I find not compelling as an argument.

Okay, I thought that problem is clear for everybody reading this list. 
Sorry that I made this assumption. So I never have seen "single-triple 
named graphs" on the "advantages" side of a pro/cons list.

>
> Regarding #2, it's probably false because the RDF abstract syntax does not constrain implementations, and I'm unconvinced that an optimized implementation of your scheme would actually be more space-efficient than an optimized implementation of named graphs.

Well, the current Named Graphs semantics (as defined by Bizer et al.) 
say (more or less) that equal statements in separate graphs do not have 
any relation to each other. As you said above the literal-graph proposal 
treat equal literals as equal (without any identifier). Both proposals 
do not really reflect real world need, where would need to be able to 
represent both options as needed.

>
>>>> - statements can be utilised in multiple graphs
>>>
>>> This is possible in [1].
>>
>> Via importing single-triple graphs into other graphs? (This looks somehow artificial to me, sorry).
>
> No, by having two graphs that contain the same statement.

Albeit, maybe graphs that contain the same statements are an edge case, 
however, we have to be able to represent edge cases as well. The power 
of its expressiveness is so far the success of RDF.

>
>> However, I believe that there is a strong antipathy for single-triple graphs.
>
> This is not a technical argument.

The technical argument is that one of the bad query handling with 
single-triple graphs (see above).


Cheers,


Bo

Received on Tuesday, 20 September 2011 18:29:02 UTC