Three solution designs to the first three Graphs use cases from Sandro Hawke on 2012-01-04 (public-rdf-wg@w3.org from January 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 04 Jan 2012 13:45:08 -0500
To: public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <1325702708.2589.269.camel@waldron>
While it's fresh in my mind, let me write down the view I came to during
today's telecon.   (And, carry it a bit farther.)  Guus, I don't know if
you still want to write up your understanding of it, or if this obviates
your action.


* Use Case 1:   (presented by cygri at 21 Dec meeting)

Several systems want to use the data gathered by one RDF crawler.  They
don't need simultaneous access to older versions of the data.

Solution A: use TriG or N-Quads with the fourth column (graph label)
being the URL the content was fetched from.

        <http://example.org> { ... triples recently fetched from there }
        
* Use Case 2:   (brought up in questions by sandro at 21 Dec meeting)

Several systems want to use the data gathered by one RDF crawler.  They
need simultaneous access to older versions of the data.

Solution B: use TriG or N-Quads with the fourth column being some
identifier created at the time the retrieval was done.  Then, some other
data connects that identifier with the URL the content was fetched from.

        <http://crawler.example.org/r8571> { ... triples fetched in retrieval 8671 }
        { 
           <http://crawler.example.org/r8571> eg:source <http://example.org>;
                                              eg:date "2011-01-04T00:03:11"^^xs:dateTime
        }
         
* Use Case 3:   (suggested by sandro at 4 Jan meeting)

A system wants to convey to another system in RDF that some person
agrees with or disagrees with certain RDF triples.

Solution C: use TriG or N-Quads with the fourth column being an
identifier for an RDF Graph (g-snap), so that it can be referred to in
the default graph.   

        { eg:sandro eg:endorses <g1> }
        <g1> { ... the triples I'm endorsing ... }
        
====

So, here we have two different semantics for TriG clearly motivated and
expressed.  The TriG document:

        g { s p o }
        
is understood in Solution A to mean (in N3):

        g log:semantics { s p o }.       # TriG "REST" semantics
        
but in Solution C it means (in N3):

        g owl:sameAs { s p o }.          # TriG "Equality" semantics

====

It looks like it's possible to solve all three uses cases with either
semantics, although it gets a bit tricky.

With TriG/REST:

        UC1 -- as reported by Richard; the URL used by the crawler is
        the fourth column URL
        
        UC2 -- as implemented in Sandro's semwalker code; the crawler
        makes a new URL in its own web space, mirrors the content there,
        and puts that URL in the fourth column
        
        UC3 -- rather than endorsing an RDF Graph, I endorse a Graph
        Container on the condition that it never changes (or something
        like that -- needs to be fleshed out more).

                { eg:sandro eg:endorses <g1>.
                  <g1> a rdf:StaticGraphContainer.
                }
            <g1> { ... the triples I'm endorsing ... }
                
        
With TriG/Equality:

        UC1 -- A layer of indirection is needed, as new URIs need to be
        created for the different RDF Graphs.
        
                { <http://example.org> rdf:graphState <uuid:nnnnn> }
                uuid:nnnnn { ... triples fetched from example.org }
                        
        Maybe there's some clever way to do it without this, involving
        URL mangling or something to eliminate the second lookup.
                
        I used uuid:nnnnn as a URI for the RDF Graph, but I could just
        as easily have used a hash of the graph or graph serialization.
        I *could* use an http URL, I think, but that's likely to lead to
        confusion and breakage, especially when someone gets the bright
        idea of changing what triples are served at that address.  (I'm
        sure it will have seemed like a good idea at the time.)
                
        UC2 -- Pretty straightforward, since we already have that layer of
        indirection in UC1.  We can't quite use the r8571 example as is, because graph
        equality could smoosh the two retrieval operations together.  So we
        need something like this:

                <uuid:nnnnn> { ... triples fetched in operation 8671 }
            {
              [ a eg:Retrieval;
                eg:gotGraph <uuid:nnnnn>;
                eg:source <http://example.org>;
                eg:date "2011-01-04T00:03:11"^^xs:dateTime;
              ]
            }
        
        UC3 -- easy:
        
                { eg:sandro eg:endorses <uuid:nnnnn>. }
                <uuid:nnnnn> { ... the triples I'm endorsing ... }
                                        
                Here you can see why I want a blank node as the graph
                label, rather than making up uuids.
                
Between these two, I have a preference for TriG/REST over
TriG/Equality, I think.   I think people are too likely to get the
semantics of TriG/Equality wrong in practice.  Of course, spelling out
the semantics of TriG/REST will be a tricky given it has a some
contextual qualities, as we've discussed.

MEANWHILE, we have a third solution, where we name the relation
explicitly.   This is the one I prefer.   

        UC1 

                <http://example.org> rdf:graphState { ... triples recently fetched from there }
                
        UC2 -- either of the styles given above, depending whether the
        harvester wants to publish its copies on the web or not.
        
        UC3 

                eg:sandro eg:endorses <uuid:nnnnn>.
        <uuid:nnnnn> owl:sameAs { ... the triples I'm endorsing ... }
                
                or, logically:

                eg:sandro eg:endorses { ... the triples I'm endorsing ... }
                
                (Then, I would probably get rid of the curly braces around the default
                graph, so it becomes Turtle with Nesting.)
                
Do those three solution designs make sense?   Any strong preferences
among them?  Are there more use cases that people think the group will
find compelling and which cannot be solved by all three of these
solutions?  (I think the next use case I'd approach would be "Tracing
Inference Results", mostly because it motivates shared blank nodes. 
But I'm out of time for today.)

    -- Sandro
Received on Wednesday, 4 January 2012 18:47:18 UTC