Re: Indirect Graph Identification from James Leigh on 2012-09-04 (public-rdf-dawg-comments@w3.org from September 2012)

From: James Leigh <james@3roundstones.com>
Date: Tue, 04 Sep 2012 13:38:15 -0400
To: Sandro Hawke <sandro@w3.org>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <1346780295.1961.102.camel@james-PBL21>
Hi Sandro,

Thanks for resuming this. I have included more particulars as why this
is important at the bottom of this message.

On Tue, 2012-09-04 at 09:43 -0400, Sandro Hawke wrote:
> Sorry I dropped this thread. Resuming...
> 
> On 05/08/2012 01:11 PM, James Leigh wrote:
> > On Tue, 2012-05-08 at 11:08 -0400, Sandro Hawke wrote:
> >> James, I must apologize for failing to get back to you with an official
> >> response to your comment [1].   Since then, a new draft has been
> >> published:
> >>
> >>          http://www.w3.org/TR/2012/WD-sparql11-http-rdf-update-20120501/
> >>          
> >> ... which may not address your comment, but does change somewhat how
> >> indirect graph identification is done.
> >>
> >> Please let us know if you're satisfied with this response.  If not,
> >> please try to suggest some specific change to the document which would
> >> address your concern.   Thank you!
> >>
> > Thanks Sandro, but I am not satisfied. I would like to be assured that
> > the working has discussed generalizing indirect requests to allow
> > indirect graph requests to co-exists with other types of indirect
> > requests. I believe indirect requests is going to become a significant
> > social issue for our community and needs a common approach.
> 
> I don't think the group has talked about it.  I can request it be on the 
> agenda, but I'd like to understand the case better, first.
> 
> > I suggest that the working group consider providing a way for the
> > indirect graph URL template to be discoverable and avoid explicitly hard
> > coding a template pattern in the specification.
> >
> > I suggest that the graph protocol adopt a service description that will
> > include a URL to the default graph and a URL template that can be used
> > for indirect graph identification, if the service support such
> > indirection.
> >
> > For example a request like the one below would return a service
> > description that would include triples that point to the default graph
> > URL and a URL template for indirect graph identification.
> >
> >
> > Given the HTTP request:
> >
> >          GET /rdf-graph-store HTTP/1.1
> >          Host: example.com
> >          Accept: text/turtle; charset=utf-8
> >
> > the graph service responds with a service description.
> >
> >          HTTP/1.1 200 OK
> >          Date: Fri, 09 Oct 2009 17:31:12 GMT
> >          Server: Apache/1.3.29 (Unix) PHP/4.3.4 DAV/1.0.3
> >          Connection: close
> >          Content-Type: text/turtle
> >          
> >          @prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
> >          @prefix ent: <http://www.w3.org/ns/entailment/> .
> >          @prefix prof: <http://www.w3.org/ns/owl-profile/> .
> >          @prefix scovo: <http://purl.org/NET/scovo#> .
> >          @prefix void: <http://rdfs.org/ns/void#> .
> >          
> >          [] a sd:Service ;
> >              sd:defaultGraphIdentification </rdf-graph-store?default> ;
> >              sd:indirectGraphIdentification "http://www.example.com/rdf-graph-store?graph={sd:graphIri}" ;
> >              sd:endpoint <http://www.example.com/sparql/> ;
> >              sd:supportedLanguage sd:SPARQL11Query ;
> >              sd:resultFormat <http://www.w3.org/ns/formats/RDF_XML>, <http://www.w3.org/ns/formats/Turtle> ;
> >              sd:feature sd:DereferencesURIs ;
> >              sd:defaultEntailmentRegime ent:RDFS  .
> >
> >
> > The predicate sd:defaultGraphIdentification would point to the URL that
> > should be used in operations to indirectly identifies the default graph
> > in the Graph Store.
> >
> > The predicate sd:indirectGraphIdentification would have a URL template
> > similar to the URL patterns from the OpenSearch specification[2]. The
> > fully qualified parameter name "{sd:graphIri}" would be replaced with
> > the percent encoded graph IRI the needs to be identified.
> >
> > This would give implementations much more freedom in how they handle
> > indirect requests, if the implementation support indirect requests at
> > all.
> >
> > Regards,
> > James
> 
> I agree this design would work, technically, but it seems like in the 
> common case this requires more work for the client folks (people writing 
> GSP clients would need to understand SD), more work for the server folks 
> (people running SPARQL servers would have to set up the right SD).   A 
> little more load: GSP clients would need to do an extra round-trip once 
> in a while.   And generally more complexity and possibilities for things 
> to go wrong.
> 
> And what does it buy us?    Clearly, as you say, more flexibility; but I 
> can't think of a plausible situation where that flexibility would be 
> important (and yet the other SPARQL protocol elements would be 
> acceptable.)    Can you give me some stories that might convince the WG 
> and everyone struggling with the additional complexity that this was all 
> worth it?
> 


In a round about way I am arguing that the URL of the RDF Graph Store
service is not needed at all. The URL pattern is all the clients need to
make indirect requests.

Perhaps the specification might be a tad more complicated with what I am
proposing (we are talking indirectly here ;), but the client and server
implementations would have the same complexity. What it comes down to is
should the client use string concatenation or string substitution to
build the indirect graph URL?

There is no extra round trip with what I am proposing. The current spec
does not address how the client obtains the RDF Graph Store URL.
However, I propose that the spec make it explicit on how an indirect
graph URL pattern can be obtained. This does not mean there is an extra
request as the client might already know the URL pattern, just as a
client might already know the RDF Graph Store URL. If the client does
NOT know the URL pattern and does needs to discover it, well then at
least there would be a standard way to discover it (as opposed to the
current spec which appears not to address the issue).

The current version of the spec says the client should concatenate the
URL with "?graph=" and the percent encoded graph URI. I am arguing that
instead, the client should substitute "{sd:graphIri}" for the percent
encoded graph URI.

The current specification describes query string parameter names and a
range of values, that when appended to a graph store URL, identify
different resources. This is less explicit, less flexible, and less
discoverable then publishing the URL pattern outright.

Encoding URIs into indirect URL patterns is not uncommon. However, most
cases to date have been about encoding resource URIs into indirect URL
descriptions.

One such case can be found in void:uriLookupEndpoint, here the pattern
is published in a void description. In this case the W3C interest Group
Note says the object of the void:uriLookupEndpoint should be prefixed to
the Urlencoded resource URI. This is very different then the current
Graph Store spec as one does not need to consult the spec to find out
the query parameter names (everything is there in the term itself).
http://www.w3.org/TR/void/#lookup

Virtuoso documents the following patterns (among others) as a way to
retrieve indirect resource descriptions:
"http://linkeddata.uriburner.com/about/html/{URIscheme}/{authority}/{local-path}"
"http://linkeddata.uriburner.com/about/data/{URIscheme}/{authority}/{local-path}"

Some of the WG members may already be familiar with DOI identifiers.
These are URIs that identify resources using the doi scheme. However,
these URIs are not resolvable directly (they do not start with http:).
Instead to resolve a doi URI one must take the URL "http://dx.doi.org/"
and append the scheme specific part of the URI to create a URL that will
resolve to a description of the resource.

Many PURL (Persistent URL) Servers support what is call partial PURLs
that allow indirect resource resolution.
http://purl.oclc.org/docs/long_intro.html#partial

        As a side note: In a new implementation of the PURL server, my
        team and I are working on, the partial PURLs will use a set of
        patterns of the form of an optional regular expression (to
        select and extract URI components) following by a URL pattern
        with substitutions to construct the resulting URL.

In Callimachus (an open source project I lead) to resolve any resource
URI indirectly, one takes the URL of the Callimachus origin and appends
"/diverted;" + percent encoded URI.

The following pattern should be familiar with all the member and in
practise is an indirect way to describe a resource:
"http://dbpedia.org/sparql?query=DESCRIBE%3C{resource-uri}%3E".

I propose SPARQL 1.1 Graph Store protocol should not create a new
protocol (for indirect graphs), but instead provide a way for publishers
to optionally publish how agents can create indirect graph requests
(possibly using the same underlying mechanisms as indirect resource
requests).

The RDF Graph Store URL is used for two things in the current spec: 1)
to provide indirect access to graphs in a store and 2) provide indirect
access to the default graph in a store.

I did not consider this above, but the indirect access to the default
graph could use the same pattern URL as other indirect graphs (if the
default graph is named in some way).

So, all that is needed to support indirect graphs is 1) a way to
identify the default graph and 2) a URL pattern to do
GET/PUT/DELETE/POST operations on graphs, indirectly.

Perhaps the sd:name (or a new property) in the Service Description could
be used to identify the default dataset.

Assuming the default graph has a name, we only need the URL pattern for
indirect graph identification.

I propose, instead of clients taking a URL add appending '?graph=' +
encoded graph name, that they take the URL pattern and substitute the
encoded graph name.

As stated above a new property like sd:indirectGraphIdentification could
be used to identify the indirect graph pattern. Alternatively, a name
like sd:indirectGraphPattern might be more self explanatory.

It is not clear to me how one would discover the current RDF Graph Store
URL from the current spec. Here I assume the Service Description is an
appropriate way to discover the indirect graph URL pattern.

The main advantages of this approach is a clear connection between the
Service Description and the indirect graph URL pattern. So much so, that
the indirect graph URL pattern is almost self descriptive. That is, many
developers looking at the URL pattern will know what to do with it right
away, even if they haven't read the spec. I believe this would make the
specification even easier for clients to implement (vs the current
spec).

By using URL pattern substitution (as apposed to concatenation) the
pattern could be open to supporting other forms of substitution. In the
future it may be desirable to support DIO and Virtuoso's form of
indirect pattern (scheme, scheme specific part, authority, and path are
substituted separately) or potentially even partial PURL's pattern,
which will includes a regex to extract segments of the URI for
substitution.

This approach also allows server implementations to utilize the
same/existing resource description mechanisms, which have been around
since at least 1995, when OCLC created the first redirecting service,
now known as purl.org.

Thank you for considering my suggestion.

Regards,
James


> 
> >> [1]
> >> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011Nov/0048
> >>
> > [2] http://www.opensearch.org/Specifications/OpenSearch/1.1#OpenSearch_URL_template_syntax
> >
> >
>
Received on Tuesday, 4 September 2012 17:38:44 UTC