Re: Graph store protocol editor's draft updated

On Tue, 2012-02-14 at 22:19 +0000, Andy Seaborne wrote:
> 
> On 14/02/12 15:56, Sandro Hawke wrote:
> > Looking more closely, it's not 5.8 that I want back, it's this sentence:
> >
> >          Within a service description document for an implementation of
> >          this protocol, the object of an sd:defaultDataset statement is
> >          understood to be the identifier of the Graph Store
> 
> Where do you expect to read the service description from?
> 
> Could you write out a concrete example, with URIs and actions, so I can 
> understand the process you are envisaging that is behind your comments? 
>   I'm quite confused as to the information flow you are looking for.

Imagine a national government wants data feeds about water quality from
each of its regional governments.    Each region is responsible for
running a SPARQL endpoint serving the data, broken up into a different
graph for each km^2 and month.  In the default graph is to be metadata
about each of those other graphs, saying when it's valid and what area
it covers.

Now, the national government collects the endpoint addresses, one per
region, looking something like this:

   http://northeast-region.example.gov/wqdat/sparql
   http://northwest-region.example.gov/~smith/fed/sparql
   http://northcentral.example.gov/water/sparql    

Following normal SPARQL practice, some of the regions pick graph names
which are not actually working URLs which can be used to fetch the
associated data.  Instead one region use tag URIs, one uses UUIDs, one
uses the URI of the most prominent geographic feature in the block, and
another uses a homegrown URI scheme which produces URIs like this:

        block:34.2234-34.2547,81.3331,80.9830:2010-01-01
        
This all works.    Given this list of SPARQL endpoints, the nation govt
can write various clients which query each region's data as necessary.
They can also publish this list of endpoint addresses, and let the
general public query as they will.

But there are some things we'd like to be able to do that we can't:

* Alice wants to download all the graphs concerning a certain area and
time-range, crossing several regions, without knowing SPARQL.  She just
wants a REST interface for GET'ing the default graphs and then the other
data graphs.

* Bob is doing analysis for which he needs to provide provenance.  He
wants a single URI for each of the graphs he's using, so he can put it
into the "source" field for that part of the analysis.

* Charlie is on a data-quality crusade.   He's getting people to double
check the data against other private data sources and their own
experience.  He's built a system for flagging questionable blocks of
data, and even submitting corrections (patches).  For this system, he
needs some way to refer to each graph which has been flagged for
correction.

I think the simplest solution would be to just let everyone know they
can always use:

    ${endpoint_addr}?graph=${graph_name}
or
    ${endpoint_addr}?default

as a URI for the indicated graph.   I'd hope most endpoints would
implement at least HTTP GET on those addresses, if not the whole GSP.
Even if they just having this convention -- with no code changes -- it
would address Bob and Charlie's problems.   And Alice will know what to
try, in case GSP happens to be implemented.


Alternatively, if for some reason the SPARQL WG is not okay with using
the endpoint address this way, we could use Service Description, as was
in GSP until the most recent change [1].   With this, to get the URI of
the default graph for the first region, Alice would:

1.  GET http://northeast-region.example.gov/wqdat/sparql

and get back a SPARQL service description that includes triples like
this:

        @prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
        
        <> a sd:Service;
             sd:defaultDataset  <http://northeast-region.example.gov/wqdat/dataset> .
        
2.  Given this, and the text that used to be GSP, plus what's still there,
Alice knows the URL of the default graph for the northeast region is:

        http://northeast-region.example.gov/wqdat/dataset?default
        
She can do a GET on this to get the contents of the default graph, which
has something like this:

        <urn:uuid:eee02beb-eca7-4cb7-839c-9fc6206caae0> geo:lon0 34.2234;
                                                        geo:lon1 34.2547;
                                                        geo:lat1 81.3331;
                                                        geo:lat0 80.9830;
                                                        dc:temporal "2010-01-01"^xs:datetime.
        
3.  Now she can construct a URL from which she can fetch the data for
that region and that time, like this:

        http://northeast-region.example.gov/wqdat/dataset?graph=urn:uuid:eee02beb-eca7-4cb7-839c-9fc6206caae0
        

And that's about it.  Repeat 3 for each block in the region; repeat 1-2
for each region.

    -- Sandro

[1]
https://cvs.w3.org/Team/~checkout~/WWW/2009/sparql/docs/http-rdf-update/Overview.html?rev=1.81;content-type=text%2Fhtml#http-post and scroll down toward the end of that POST section.

Received on Wednesday, 15 February 2012 04:05:39 UTC