Re: Three solution designs to the first three Graphs use cases from Andy Seaborne on 2012-01-05 (public-rdf-wg@w3.org from January 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 05 Jan 2012 14:49:01 +0000
To: public-rdf-wg@w3.org
Message-ID: <4F05B85D.7050007@epimorphics.com>
On 05/01/12 11:09, Andy Seaborne wrote:
> On 04/01/12 19:23, David Wood wrote:
>> Thanks, Sandro. That's very helpful.
>>
>> It might be useful to consider augmenting TriG syntax to support your
>> third solution (explicitly naming relations). I'd be quite happy with
>> that.
>
> What would the data model be?
>
>> We could also consider standardizing the existing TriG syntax to be a
>> syntactic shorthand for TriG REST semantics; that is, a lack of
>> explicitly declared relation infers log:semantics.
>
> I think we should not fix a semantics for undeclared relationships.
>
> Otherwise, it invalidates existing TriG documents which don't exactly
> follow the TriG/ABC definition.
>
> Ditto N-Quads - in a quadstore/database dump or extract you don't
> necessary know the semantics.
>
> - - - - - - - - - -
>
> I find the name TriG/REST confusing because, for me, identifying the
> dereference action is modelling REST which is the other

... the other, naming equality, style discussed.

<uuid:nnnnn> { ... triples fetched in operation 8671 }

	Andy

>
> It's more like "TriG/WebCache" -- only one instance of the graph
> containers state is possible.
>
> Andy
>
>>
>> Regards,
>> Dave
>>
>>
>> On Jan 4, 2012, at 1:45 PM, Sandro Hawke<sandro@w3.org> wrote:
>>
>>> While it's fresh in my mind, let me write down the view I came to during
>>> today's telecon. (And, carry it a bit farther.) Guus, I don't know if
>>> you still want to write up your understanding of it, or if this obviates
>>> your action.
>>>
>>>
>>> * Use Case 1: (presented by cygri at 21 Dec meeting)
>>>
>>> Several systems want to use the data gathered by one RDF crawler. They
>>> don't need simultaneous access to older versions of the data.
>>>
>>> Solution A: use TriG or N-Quads with the fourth column (graph label)
>>> being the URL the content was fetched from.
>>>
>>> <http://example.org> { ... triples recently fetched from there }
>>>
>>> * Use Case 2: (brought up in questions by sandro at 21 Dec meeting)
>>>
>>> Several systems want to use the data gathered by one RDF crawler. They
>>> need simultaneous access to older versions of the data.
>>>
>>> Solution B: use TriG or N-Quads with the fourth column being some
>>> identifier created at the time the retrieval was done. Then, some other
>>> data connects that identifier with the URL the content was fetched from.
>>>
>>> <http://crawler.example.org/r8571> { ... triples fetched in retrieval
>>> 8671 }
>>> {
>>> <http://crawler.example.org/r8571> eg:source<http://example.org>;
>>> eg:date "2011-01-04T00:03:11"^^xs:dateTime
>>> }
>>>
>>> * Use Case 3: (suggested by sandro at 4 Jan meeting)
>>>
>>> A system wants to convey to another system in RDF that some person
>>> agrees with or disagrees with certain RDF triples.
>>>
>>> Solution C: use TriG or N-Quads with the fourth column being an
>>> identifier for an RDF Graph (g-snap), so that it can be referred to in
>>> the default graph.
>>>
>>> { eg:sandro eg:endorses<g1> }
>>> <g1> { ... the triples I'm endorsing ... }
>>>
>>> ====
>>>
>>> So, here we have two different semantics for TriG clearly motivated and
>>> expressed. The TriG document:
>>>
>>> g { s p o }
>>>
>>> is understood in Solution A to mean (in N3):
>>>
>>> g log:semantics { s p o }. # TriG "REST" semantics
>>>
>>> but in Solution C it means (in N3):
>>>
>>> g owl:sameAs { s p o }. # TriG "Equality" semantics
>>>
>>> ====
>>>
>>> It looks like it's possible to solve all three uses cases with either
>>> semantics, although it gets a bit tricky.
>>>
>>> With TriG/REST:
>>>
>>> UC1 -- as reported by Richard; the URL used by the crawler is
>>> the fourth column URL
>>>
>>> UC2 -- as implemented in Sandro's semwalker code; the crawler
>>> makes a new URL in its own web space, mirrors the content there,
>>> and puts that URL in the fourth column
>>>
>>> UC3 -- rather than endorsing an RDF Graph, I endorse a Graph
>>> Container on the condition that it never changes (or something
>>> like that -- needs to be fleshed out more).
>>>
>>> { eg:sandro eg:endorses<g1>.
>>> <g1> a rdf:StaticGraphContainer.
>>> }
>>> <g1> { ... the triples I'm endorsing ... }
>>>
>>>
>>> With TriG/Equality:
>>>
>>> UC1 -- A layer of indirection is needed, as new URIs need to be
>>> created for the different RDF Graphs.
>>>
>>> {<http://example.org> rdf:graphState<uuid:nnnnn> }
>>> uuid:nnnnn { ... triples fetched from example.org }
>>>
>>> Maybe there's some clever way to do it without this, involving
>>> URL mangling or something to eliminate the second lookup.
>>>
>>> I used uuid:nnnnn as a URI for the RDF Graph, but I could just
>>> as easily have used a hash of the graph or graph serialization.
>>> I *could* use an http URL, I think, but that's likely to lead to
>>> confusion and breakage, especially when someone gets the bright
>>> idea of changing what triples are served at that address. (I'm
>>> sure it will have seemed like a good idea at the time.)
>>>
>>> UC2 -- Pretty straightforward, since we already have that layer of
>>> indirection in UC1. We can't quite use the r8571 example as is,
>>> because graph
>>> equality could smoosh the two retrieval operations together. So we
>>> need something like this:
>>>
>>> <uuid:nnnnn> { ... triples fetched in operation 8671 }
>>> {
>>> [ a eg:Retrieval;
>>> eg:gotGraph<uuid:nnnnn>;
>>> eg:source<http://example.org>;
>>> eg:date "2011-01-04T00:03:11"^^xs:dateTime;
>>> ]
>>> }
>>>
>>> UC3 -- easy:
>>>
>>> { eg:sandro eg:endorses<uuid:nnnnn>. }
>>> <uuid:nnnnn> { ... the triples I'm endorsing ... }
>>>
>>> Here you can see why I want a blank node as the graph
>>> label, rather than making up uuids.
>>>
>>> Between these two, I have a preference for TriG/REST over
>>> TriG/Equality, I think. I think people are too likely to get the
>>> semantics of TriG/Equality wrong in practice. Of course, spelling out
>>> the semantics of TriG/REST will be a tricky given it has a some
>>> contextual qualities, as we've discussed.
>>>
>>> MEANWHILE, we have a third solution, where we name the relation
>>> explicitly. This is the one I prefer.
>>>
>>> UC1
>>>
>>> <http://example.org> rdf:graphState { ... triples recently fetched
>>> from there }
>>>
>>> UC2 -- either of the styles given above, depending whether the
>>> harvester wants to publish its copies on the web or not.
>>>
>>> UC3
>>>
>>> eg:sandro eg:endorses<uuid:nnnnn>.
>>> <uuid:nnnnn> owl:sameAs { ... the triples I'm endorsing ... }
>>>
>>> or, logically:
>>>
>>> eg:sandro eg:endorses { ... the triples I'm endorsing ... }
>>>
>>> (Then, I would probably get rid of the curly braces around the default
>>> graph, so it becomes Turtle with Nesting.)
>>>
>>> Do those three solution designs make sense? Any strong preferences
>>> among them? Are there more use cases that people think the group will
>>> find compelling and which cannot be solved by all three of these
>>> solutions? (I think the next use case I'd approach would be "Tracing
>>> Inference Results", mostly because it motivates shared blank nodes.
>>> But I'm out of time for today.)
>>>
>>> -- Sandro
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
Received on Thursday, 5 January 2012 14:49:38 UTC