Re: Three solution designs to the first three Graphs use cases from Guus Schreiber on 2012-01-05 (public-rdf-wg@w3.org from January 2012)

From: Guus Schreiber <guus.schreiber@vu.nl>
Date: Thu, 05 Jan 2012 15:13:32 +0100
To: Sandro Hawke <sandro@w3.org>
CC: public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <4F05B00C.5010707@vu.nl>
On 04-01-2012 19:45, Sandro Hawke wrote:
> While it's fresh in my mind, let me write down the view I came to during
> today's telecon.   (And, carry it a bit farther.)  Guus, I don't know if
> you still want to write up your understanding of it, or if this obviates
> your action.

Sandro,

Very clear, thanks for this (and I indeed see no more need for my action).

I understand why you like third solution. However, it means we have to 
come up with a name-to-graph relation vocabulary. I don't think we're in 
a a position to standardize that. Also, it means we have to introduce 
new syntax and thus invalidate/deprecate/mark as archaic/...  the 
current quad stores out there. That would not be in the spirit of our 
charter.

The first solution (your TriG/REST) has the advantage of being the most 
conservative extension, i.e. providing a hook for explicating 
name-to-graph semantics without enforcing it.  We can write a 
non-normative section/appendix/note with suggested practices for 
vocabulary to be used there.

If we can get consensus on (some variant of) the first solution, I see 
us moving on quickly. The path forward would be to continue writing down 
further use-case examples.
I have asked Antoine Isaac to do this for the Europeana data model [1].

Guus

[1] 
http://www.europeana-libraries.eu/web/europeana-project/technicaldocuments/

>
>
> * Use Case 1:   (presented by cygri at 21 Dec meeting)
>
> Several systems want to use the data gathered by one RDF crawler.  They
> don't need simultaneous access to older versions of the data.
>
> Solution A: use TriG or N-Quads with the fourth column (graph label)
> being the URL the content was fetched from.
>
>          <http://example.org>  { ... triples recently fetched from there }
>
> * Use Case 2:   (brought up in questions by sandro at 21 Dec meeting)
>
> Several systems want to use the data gathered by one RDF crawler.  They
> need simultaneous access to older versions of the data.
>
> Solution B: use TriG or N-Quads with the fourth column being some
> identifier created at the time the retrieval was done.  Then, some other
> data connects that identifier with the URL the content was fetched from.
>
>          <http://crawler.example.org/r8571>  { ... triples fetched in retrieval 8671 }
>          {
>             <http://crawler.example.org/r8571>  eg:source<http://example.org>;
>                                                eg:date "2011-01-04T00:03:11"^^xs:dateTime
>          }
>
> * Use Case 3:   (suggested by sandro at 4 Jan meeting)
>
> A system wants to convey to another system in RDF that some person
> agrees with or disagrees with certain RDF triples.
>
> Solution C: use TriG or N-Quads with the fourth column being an
> identifier for an RDF Graph (g-snap), so that it can be referred to in
> the default graph.
>
>          { eg:sandro eg:endorses<g1>  }
>          <g1>  { ... the triples I'm endorsing ... }
>
> ====
>
> So, here we have two different semantics for TriG clearly motivated and
> expressed.  The TriG document:
>
>          g { s p o }
>
> is understood in Solution A to mean (in N3):
>
>          g log:semantics { s p o }.       # TriG "REST" semantics
>
> but in Solution C it means (in N3):
>
>          g owl:sameAs { s p o }.          # TriG "Equality" semantics
>
> ====
>
> It looks like it's possible to solve all three uses cases with either
> semantics, although it gets a bit tricky.
>
> With TriG/REST:
>
>          UC1 -- as reported by Richard; the URL used by the crawler is
>          the fourth column URL
>
>          UC2 -- as implemented in Sandro's semwalker code; the crawler
>          makes a new URL in its own web space, mirrors the content there,
>          and puts that URL in the fourth column
>
>          UC3 -- rather than endorsing an RDF Graph, I endorse a Graph
>          Container on the condition that it never changes (or something
>          like that -- needs to be fleshed out more).
>
>                  { eg:sandro eg:endorses<g1>.
>                    <g1>  a rdf:StaticGraphContainer.
>                  }
>              <g1>  { ... the triples I'm endorsing ... }
>
>
> With TriG/Equality:
>
>          UC1 -- A layer of indirection is needed, as new URIs need to be
>          created for the different RDF Graphs.
>
>                  {<http://example.org>  rdf:graphState<uuid:nnnnn>  }
>                  uuid:nnnnn { ... triples fetched from example.org }
>
>          Maybe there's some clever way to do it without this, involving
>          URL mangling or something to eliminate the second lookup.
>
>          I used uuid:nnnnn as a URI for the RDF Graph, but I could just
>          as easily have used a hash of the graph or graph serialization.
>          I *could* use an http URL, I think, but that's likely to lead to
>          confusion and breakage, especially when someone gets the bright
>          idea of changing what triples are served at that address.  (I'm
>          sure it will have seemed like a good idea at the time.)
>
>          UC2 -- Pretty straightforward, since we already have that layer of
>          indirection in UC1.  We can't quite use the r8571 example as is, because graph
>          equality could smoosh the two retrieval operations together.  So we
>          need something like this:
>
>                  <uuid:nnnnn>  { ... triples fetched in operation 8671 }
>              {
>                [ a eg:Retrieval;
>                  eg:gotGraph<uuid:nnnnn>;
>                  eg:source<http://example.org>;
>                  eg:date "2011-01-04T00:03:11"^^xs:dateTime;
>                ]
>              }
>
>          UC3 -- easy:
>
>                  { eg:sandro eg:endorses<uuid:nnnnn>. }
>                  <uuid:nnnnn>  { ... the triples I'm endorsing ... }
>
>                  Here you can see why I want a blank node as the graph
>                  label, rather than making up uuids.
>
> Between these two, I have a preference for TriG/REST over
> TriG/Equality, I think.   I think people are too likely to get the
> semantics of TriG/Equality wrong in practice.  Of course, spelling out
> the semantics of TriG/REST will be a tricky given it has a some
> contextual qualities, as we've discussed.
>
> MEANWHILE, we have a third solution, where we name the relation
> explicitly.   This is the one I prefer.
>
>          UC1
>
>                  <http://example.org>  rdf:graphState { ... triples recently fetched from there }
>
>          UC2 -- either of the styles given above, depending whether the
>          harvester wants to publish its copies on the web or not.
>
>          UC3
>
>                  eg:sandro eg:endorses<uuid:nnnnn>.
>          <uuid:nnnnn>  owl:sameAs { ... the triples I'm endorsing ... }
>
>                  or, logically:
>
>                  eg:sandro eg:endorses { ... the triples I'm endorsing ... }
>
>                  (Then, I would probably get rid of the curly braces around the default
>                  graph, so it becomes Turtle with Nesting.)
>
> Do those three solution designs make sense?   Any strong preferences
> among them?  Are there more use cases that people think the group will
> find compelling and which cannot be solved by all three of these
> solutions?  (I think the next use case I'd approach would be "Tracing
> Inference Results", mostly because it motivates shared blank nodes.
> But I'm out of time for today.)
>
>      -- Sandro
>
>
>
>
>
>
>
Received on Thursday, 5 January 2012 14:14:01 UTC