Re: TriG document examples from Andy Seaborne on 2011-12-14 (public-rdf-wg@w3.org from December 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Wed, 14 Dec 2011 08:59:50 +0000
To: Pat Hayes <phayes@ihmc.us>
CC: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <4EE86586.8020003@epimorphics.com>
On 14/12/11 02:18, Pat Hayes wrote:
> Some comments in-line below.
>
> On Dec 12, 2011, at 8:06 AM, Andy Seaborne wrote:
>
>> """
>> Create a short example for a TriG document and a clear notion of what is entailed by it
>> """
>>
>> Possibly falling on the "short", because we haven't had much discussion in the last few weeks ...
>>
>> 	Andy
>>
>>
>> == Case 1: This is not a TriG document
>> Case 1 is here because I want to talk about graph literals further on.
>>
>> :doc1 :asserts { :y1 :name "foo" } .
>> :doc2 :asserts { :y1 :name "foo" } .
>
> You dont list any entailments, but there are some, eg
>
> :doc2 :asserts _:zz .
> _:zz a rdf:LiteralGraph .
>
> and similar. In general, not having literals in subject position does not really block any entailments, since they can all be 'mirrored' by triples with a bnode as the subject. (See section 7.2 in the 2004 Semantics document.)
>
>
>>
>> Pro:
>> Purest form of graphs-as-values
>>
>> Con:
>> Referring to the same value from two places requires the literal to be written twice.  For integers, that's no big deal; for a graph literal it's a bit of a nuisance, even for a graph of say 10 triples.
>>
>> Danger of requiring widespread re-engineering (parsers, storages, other specs).
>>
>> Does not mention the web - it does not talk about g-boxes or accessing them).  A consequence of "no web" is that additional layers of modelling are needed for talking about the content of a web page etc, yet this is the most significant use of multi-graph datasets.
>>
>>
>> == Case 2: Named values
>>
>> The case uses names for graph literals so that the same literal can be used in two places (by reference).
>> -----------------------------------
>> { # Default graph
>>   :doc1 :asserts :graphLiteral1 .
>>   :doc2 :asserts :graphLiteral1 .
>> }
>>
>> :graphLiteral1 { :y1 :name "foo" } .
>> -----------------------------------
>>
>> entails
>>
>> :graphLiteral1 :denotesLiteral [] .
>
> ?? Can you explain this? I cannot understand what it is supposed to mean. What exactly is the intended semantics of 'denotesLIteral'?
>
>> :graphLiteral1 a rdf:Literal .
>> :graphLiteral1 a rdf:LiteralGraph .
>> -----------------------------------
>>
>> The relationship between :graphLiteral1 and { :y1 :name "foo" } is much like owl:sameAs except between an individual and a value.
>
> It is *exactly* like owl:sameAs. So in OWL applications, we would not really be able to prevent owl:sameAs assertions triggering these kinds of entailments in RDF.

That's what I thought until I read
http://www.w3.org/TR/owl-ref/#sameAs-def

"""
The built-in OWL property owl:sameAs links an individual to an 
individual. Such an owl:sameAs statement indicates that two URI 
references actually refer to the same thing: the individuals have the 
same "identity".
"""

Are literals individuals?

And then what about the "two URI references"?

at which point I wimp'ed out and said "much like owl:sameAs except" :-)

>
>>   It need not be symmetric in IRI/value.
>>
>> Pros:
>> We can now refer to large graph literals and not need to repeat them.
>>
>> Cons:
>> Still requires additional machinery to be used for common cases. Such machinery is additional complexity to the data publisher and data user.
>>
>>
>> == Case 3a: Explicit event modelling
>> See also "Time-varying g-boxes : a dataset pattern"
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Oct/0148.html
>> and also the idea of "event modelling"
>>
>> One way to "add the web" is to identify the fundamental concepts of the web.  We already have "naming" in case 2, the other is dereference.
>>
>> REST does not give a name to the act of dereference but we can do that.  The relationship of name and graph value is one of "snapshot" or "seen".
>
> Again, this does not make sense (to me). Are you meaning to give a name to the dereference "relationship"? But surely this has to be a three-way relationship involving time (or something like a time)?

I'm naming the instance of the action of dereference.  The one-shot 
event of the GET that occurs at a point in time.

Is this clearer:

:event :occurredAt "2011-12-14T08:43:19.148+00:00"^^xsd:dateTime .
:event :accessed :doc1 .
:event :observed  { :y1 :name "foo" } .

>
>>
>> -----------------------------------
>> { # Default graph
>>   # Adds details to the observation.
>>   # Does not need to be in the default graph, or even in the dataset
>>   # but using the default graph as the manifest is natural.
>>
>>   :doc1 :asserts :graph1 .
>>   :doc2 :asserts :graph1 .
>>   :graph1 rdfs:value uuid:187 .
>>   :doc1 :accessedAt "2011-12-11T20:30:07+00:00"^^xsd:dateTime .
>>   :doc2 :accessedAt "2011-12-12T13:45:57+00:00"^^xsd:dateTime .
>> }
>>
>> uuid:187 { :y1 :name "foo" } .
>
> This does not even begin to make sense. rdf:value has no semantics, for a start, so there are no (nontrivial) entailments involving uuid:187

Would you accept:

:graph1 :observedToBe uuid:187 .

I want to show the same URI for

:doc1 :asserts :graph1 .
:doc2 :asserts :graph2 .

so they are making the same claims without resorting to repeating the 
literal.

Case 2 and 3a are very similar - the difference being that case 3a is 
grounded the acts of reading the g-box to see g-snaps, and case 2 is 
about naming values to avoid having to write them down repeatedly.

>> -----------------------------------
>>
>> so entailed
>>
>> # If RDF had syntax for graph literals.
>> uuid:187 rdf:graphSnapshot  { :y1 :name "foo" }
>
> How could that *possibly* be entailed by the above? Those triples dont even mention that graph literal.

The idea of a dataset pattern is that gives the association of URI and 
graph value in:

uuid:187 { :y1 :name "foo" } .


We know there are different patterns already out there being used for 
real.  "Dataset patterns" are an attempt to recognize different 
approaches and not start by saying "you're wrong - do it this way" where 
'this way' requires a lot of undefined machinery that is simply cost to 
the users with no benefit (at least, no perceived benefit) because their 
UC doesn't need it.

UC: Want to query a dataset of graph, graphs come (one time i.e. now) 
from several g-boxes.  Why have all this graph value indirection?

>
>>
>> uuid:187 rdf:graphSnapshot
>>      "{ :y1 :name "foo" }"^^rdf:graphSyntaxTurtle .
>>
>> uuid:187  a  rdf:observation .
>>
>> schema:
>>
>> rdf:graphSnapshot rdfs:range rdfs:Resource .
>> rdf:graphSnapshot rdfs:domain rdf:GraphLiteral .
>>
>> -----------------------------------
>>
>> Pros:
>> We can now talk about what's in a document (at a point in time).
>
> I really dont think this will work, even if you made it make basic sense. What happens if you put all this stuff into another graph literal and then do a similar number on it, but using times earlier than the ones referenced in the literal itself?

The timestamps are auxiliary for the example. The core is the pattern for:

uuid:187 { :y1 :name "foo" } .

means that non-deref URI names a graph value and that value was a g-snap.

Th default graph triples are how it might be added to to give a more 
complete usage.  Sorry if that wasn't clear.

	Andy

>
>>
>> Cons:
>> While we have now captured all the information, it does not support the common use case of one value per g-box being relevant so it is machinery but only partial.  I think that it would require the world to change how it is using named graphs currently.
>>
>> In other words, the "value in the g-box" case is so common it deserves making easy to use.
>>
>> == Case 3b: Named versions
>> The "Rolling Snapshots" Pattern and Vocabulary
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Oct/0152.html
>>
>> Case 3a puts the modelling onto the dataset.
>> Case 3b is supported by the way the publisher publishes the data.
>> rdf:seenToHaveValue rdfs:domain web:location .
>>
>> Every version of the graph in a g-box is given a distinct URI by the publisher.
>>
>> No trig document.
>>
>> But you might have a graph that describes the observations:
>>
>> {
>>   :doc1 rdf:versionedValue :doc1-v1 .
>>   :doc1 rdf:versionedValue :doc1-v2 .
>>
>>   :doc1-v1 rdf:validAt "2011-12-11T20:30:07+00:00"^^xsd:dateTime .
>>   :doc1-v2 rdf:validAt "2011-12-12T13:45:57+00:00"^^xsd:dateTime .
>> }
>>
>>
>> == Case 4: "value in the g-box"
>> c.f. Jeremy's web cache
>>
>>
>> A common case is where the g-box is only accessed once - a collection of graph values are assembled for the purposes of a query.  The graph values have an associated URI (not denoting it).
>>
>> As a query is a one time action, the idea that there is one value for each location ("current value") is natural.
>>
>> (This is the restricted form of RDF dataset that is captured by SPARQL FROM NAMED but not the only RDF dataset style possible as allowed by the machinery of SPARQL.  FROM NAMED makes the restriction that location accessed and graph name are the same.)
>>
>> -----------------------------------
>>
>> { # Possible default graph
>>   # Include info of when :doc1 and :doc2 were accessed
>>   # but that puts back the even of dereferencing.
>> }
>>
>> :doc1 { :y1 :name "foo" } .
>> :doc2 { :y1 :name "foo" } .
>>
>> -----------------------------------
>> entails:
>>
>> :doc1 rdf:seenToHaveValue { :y1 :name "foo" }  .
>> :doc1 a rdf:g-box .
>>
>> and schema:
>>
>> rdf:seenToHaveValue rdfs:domain web:location .
>> rdf:seenToHaveValue rdfs:range rdf:GraphLiteral .
>>
>> -----------------------------------
>>
>> == Case 5: Use of foaf:primaryTopic
>>
>> Some systems use the primary topic of a graph as the graph name.
>>
>> I'm not sure if any of these are contenders for our named graph - it would seem to require a lot of new machinery to get the punning right.
>
> Actually I think this would be fairly easy to do, as the punning machinery itself has been worked out in several other places now (OWL2, Common Logic, IKL), but it would need a major rewrite/addendum to the semantics document and maybe also Concepts.
>
> Pat
>
>>
>>
>>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
Received on Wednesday, 14 December 2011 09:00:29 UTC