Re: Indirect Graph Identification from Sandro Hawke on 2012-09-11 (public-rdf-dawg-comments@w3.org from September 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 11 Sep 2012 09:51:33 -0400
To: James Leigh <james@3roundstones.com>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <504F41E5.4090608@w3.org>
Thanks for the details.

On 09/05/2012 11:13 AM, James Leigh wrote:
> Hi Sandro,
>
> If the GSP spec was changed to allow publication of URL patterns,
> instead of the RDF Graph Store URL (as it is today), 3 Round Stones
> would implement GSP in Callimachus and it would be included (by default)
> with every Callimachus deployment. However, in the spec's current form
> we would not implement the indirect graph identification portion.

Am I right in understanding that Callimachus is not a Graph Store 
(SPARQL database), but sits on top of one?     You're talking about the 
possibility of Callimachus providing a GSP interface to that underlying 
Graph Store?     What if it already implements one?

> Callimachus is a Linked Data Management System and as such security is a
> big concern. Callimachus' current authorization mechanism is based on
> the target request uri (before the query string). In the event the
> request is an indirect request, the authorization is based on the
> indirect target, which is encoded in the request path (not the query
> string).

Perhaps this is an irrelevant detail, but maybe not.    If the 
authorization is based on the indirect target, then why does it matter 
how it's encoded?

For example, let's say the underlying engine contains, in a graph named 
"http://example.com/g1", the triple <a> <b> <c>.

Let's say the "Graph Store" URI is http://s1.example.com/data. (Note 
that there is currently no standard way to find this out from anywhere.  
It might be in some Service Description, but not in any standard way.)

So, GSP says I could get that <a> <b> <c> by doing a GET from (ignoring 
URL encoding):

http://s1.example.com/data?graph=http://example.com/g1

I think you're saying Callimachus will be answering URIs starting 
"http://s1.example.com/data", and will need to make some access control 
decision.

I understand you want the URL to be able to look more like:

http://s1.example.com/data/indirect/http://example.com/g1

So, the idea is that the access control rule can be based on 
"http://s1.example.com/data/indirect/http://example.com/g1", but it 
can't be based on 
"http://s1.example.com/data?graph=http://example.com/g1" because the 
query part is stripped off before the access control logic sees the URL?

That would make sense, but I'm not actually sure that's what you said, 
so please confirm that I have that right.

      -- Sandro

>   The current authorization mechanism could not easily be adapted
> if the indirect target is in the query string.
> While a 307 could safely be used with GET requests, to redirect to a URL
> that can more directly be authorized, many clients do not support 307
> responses from PUT/DELETE/POST/PATCH requests. This makes the current
> spec hard to implement securely.
>
> Callimachus already includes a complicated mechanism for authorization
> of indirect requests and the URL pattern outlined in the spec would not
> work with Callimachus' authorization mechanism. At least not without
> by-passing the graph resource authorization mechanism.
>
> Regards,
> James
>
> On Tue, 2012-09-04 at 18:00 -0400, Sandro Hawke wrote:
>> Thanks for the reply.
>>
>> You're right that the mechanism you propose is only about 1% more
>> complicated; I was misremembering where the graph-store URI came from,
>> to be used in constructing the indirect IRIs.    As you say the
>> difference between concatenating three strings and interpolating one
>> string into another is very small.
>>
>> Still, in order to motivate a change, can you help explain to me why the
>> greater flexibility of the template form would really matter to
>> someone?   You point out some examples of people using different
>> indirection formats, but I'm fairly confident anyone running a Graph
>> Store Protocol system *could* do indirection in a "?graph=" way.   Is
>> there someone for whom this would be a hardship, someone who somehow is
>> implementing GSP but can't control the format of their URIs?
>>
>> The easiest example would be if this applies directly to you.  Are you
>> planning to implement GSP or use a GSP implementation?   Will this make
>> your system harder to build or operate?     If not, can you think of
>> someone we could bring into this conversation whose life is made
>> significantly harder due to the design in the current draft?    These
>> discussions tend to go much more smoothly when the affected party is
>> part of the conversation.
>>
>> Also, on the point about naming default graphs: I agree it would be
>> simpler if the default graph had a URI, but by definition in SPARQL it
>> does not.   The WG has not been willing to change this; I don't think
>> they would consider a simpler template mechanism to be justification,
>> when GSP's existing indirection mechanism was not.
>>
>>       -- Sandro
>>
>>
>>
>> On 09/04/2012 01:38 PM, James Leigh wrote:
>>> Hi Sandro,
>>>
>>> Thanks for resuming this. I have included more particulars as why this
>>> is important at the bottom of this message.
>>>
>>> On Tue, 2012-09-04 at 09:43 -0400, Sandro Hawke wrote:
>>>> Sorry I dropped this thread. Resuming...
>>>>
>>>> On 05/08/2012 01:11 PM, James Leigh wrote:
>>>>> On Tue, 2012-05-08 at 11:08 -0400, Sandro Hawke wrote:
>>>>>> James, I must apologize for failing to get back to you with an official
>>>>>> response to your comment [1].   Since then, a new draft has been
>>>>>> published:
>>>>>>
>>>>>>            http://www.w3.org/TR/2012/WD-sparql11-http-rdf-update-20120501/
>>>>>>            
>>>>>> ... which may not address your comment, but does change somewhat how
>>>>>> indirect graph identification is done.
>>>>>>
>>>>>> Please let us know if you're satisfied with this response.  If not,
>>>>>> please try to suggest some specific change to the document which would
>>>>>> address your concern.   Thank you!
>>>>>>
>>>>> Thanks Sandro, but I am not satisfied. I would like to be assured that
>>>>> the working has discussed generalizing indirect requests to allow
>>>>> indirect graph requests to co-exists with other types of indirect
>>>>> requests. I believe indirect requests is going to become a significant
>>>>> social issue for our community and needs a common approach.
>>>> I don't think the group has talked about it.  I can request it be on the
>>>> agenda, but I'd like to understand the case better, first.
>>>>
>>>>> I suggest that the working group consider providing a way for the
>>>>> indirect graph URL template to be discoverable and avoid explicitly hard
>>>>> coding a template pattern in the specification.
>>>>>
>>>>> I suggest that the graph protocol adopt a service description that will
>>>>> include a URL to the default graph and a URL template that can be used
>>>>> for indirect graph identification, if the service support such
>>>>> indirection.
>>>>>
>>>>> For example a request like the one below would return a service
>>>>> description that would include triples that point to the default graph
>>>>> URL and a URL template for indirect graph identification.
>>>>>
>>>>>
>>>>> Given the HTTP request:
>>>>>
>>>>>            GET /rdf-graph-store HTTP/1.1
>>>>>            Host: example.com
>>>>>            Accept: text/turtle; charset=utf-8
>>>>>
>>>>> the graph service responds with a service description.
>>>>>
>>>>>            HTTP/1.1 200 OK
>>>>>            Date: Fri, 09 Oct 2009 17:31:12 GMT
>>>>>            Server: Apache/1.3.29 (Unix) PHP/4.3.4 DAV/1.0.3
>>>>>            Connection: close
>>>>>            Content-Type: text/turtle
>>>>>            
>>>>>            @prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
>>>>>            @prefix ent: <http://www.w3.org/ns/entailment/> .
>>>>>            @prefix prof: <http://www.w3.org/ns/owl-profile/> .
>>>>>            @prefix scovo: <http://purl.org/NET/scovo#> .
>>>>>            @prefix void: <http://rdfs.org/ns/void#> .
>>>>>            
>>>>>            [] a sd:Service ;
>>>>>                sd:defaultGraphIdentification </rdf-graph-store?default> ;
>>>>>                sd:indirectGraphIdentification "http://www.example.com/rdf-graph-store?graph={sd:graphIri}" ;
>>>>>                sd:endpoint <http://www.example.com/sparql/> ;
>>>>>                sd:supportedLanguage sd:SPARQL11Query ;
>>>>>                sd:resultFormat <http://www.w3.org/ns/formats/RDF_XML>, <http://www.w3.org/ns/formats/Turtle> ;
>>>>>                sd:feature sd:DereferencesURIs ;
>>>>>                sd:defaultEntailmentRegime ent:RDFS  .
>>>>>
>>>>>
>>>>> The predicate sd:defaultGraphIdentification would point to the URL that
>>>>> should be used in operations to indirectly identifies the default graph
>>>>> in the Graph Store.
>>>>>
>>>>> The predicate sd:indirectGraphIdentification would have a URL template
>>>>> similar to the URL patterns from the OpenSearch specification[2]. The
>>>>> fully qualified parameter name "{sd:graphIri}" would be replaced with
>>>>> the percent encoded graph IRI the needs to be identified.
>>>>>
>>>>> This would give implementations much more freedom in how they handle
>>>>> indirect requests, if the implementation support indirect requests at
>>>>> all.
>>>>>
>>>>> Regards,
>>>>> James
>>>> I agree this design would work, technically, but it seems like in the
>>>> common case this requires more work for the client folks (people writing
>>>> GSP clients would need to understand SD), more work for the server folks
>>>> (people running SPARQL servers would have to set up the right SD).   A
>>>> little more load: GSP clients would need to do an extra round-trip once
>>>> in a while.   And generally more complexity and possibilities for things
>>>> to go wrong.
>>>>
>>>> And what does it buy us?    Clearly, as you say, more flexibility; but I
>>>> can't think of a plausible situation where that flexibility would be
>>>> important (and yet the other SPARQL protocol elements would be
>>>> acceptable.)    Can you give me some stories that might convince the WG
>>>> and everyone struggling with the additional complexity that this was all
>>>> worth it?
>>>>
>>> In a round about way I am arguing that the URL of the RDF Graph Store
>>> service is not needed at all. The URL pattern is all the clients need to
>>> make indirect requests.
>>>
>>> Perhaps the specification might be a tad more complicated with what I am
>>> proposing (we are talking indirectly here ;), but the client and server
>>> implementations would have the same complexity. What it comes down to is
>>> should the client use string concatenation or string substitution to
>>> build the indirect graph URL?
>>>
>>> There is no extra round trip with what I am proposing. The current spec
>>> does not address how the client obtains the RDF Graph Store URL.
>>> However, I propose that the spec make it explicit on how an indirect
>>> graph URL pattern can be obtained. This does not mean there is an extra
>>> request as the client might already know the URL pattern, just as a
>>> client might already know the RDF Graph Store URL. If the client does
>>> NOT know the URL pattern and does needs to discover it, well then at
>>> least there would be a standard way to discover it (as opposed to the
>>> current spec which appears not to address the issue).
>>>
>>> The current version of the spec says the client should concatenate the
>>> URL with "?graph=" and the percent encoded graph URI. I am arguing that
>>> instead, the client should substitute "{sd:graphIri}" for the percent
>>> encoded graph URI.
>>>
>>> The current specification describes query string parameter names and a
>>> range of values, that when appended to a graph store URL, identify
>>> different resources. This is less explicit, less flexible, and less
>>> discoverable then publishing the URL pattern outright.
>>>
>>> Encoding URIs into indirect URL patterns is not uncommon. However, most
>>> cases to date have been about encoding resource URIs into indirect URL
>>> descriptions.
>>>
>>> One such case can be found in void:uriLookupEndpoint, here the pattern
>>> is published in a void description. In this case the W3C interest Group
>>> Note says the object of the void:uriLookupEndpoint should be prefixed to
>>> the Urlencoded resource URI. This is very different then the current
>>> Graph Store spec as one does not need to consult the spec to find out
>>> the query parameter names (everything is there in the term itself).
>>> http://www.w3.org/TR/void/#lookup
>>>
>>> Virtuoso documents the following patterns (among others) as a way to
>>> retrieve indirect resource descriptions:
>>> "http://linkeddata.uriburner.com/about/html/{URIscheme}/{authority}/{local-path}"
>>> "http://linkeddata.uriburner.com/about/data/{URIscheme}/{authority}/{local-path}"
>>>
>>> Some of the WG members may already be familiar with DOI identifiers.
>>> These are URIs that identify resources using the doi scheme. However,
>>> these URIs are not resolvable directly (they do not start with http:).
>>> Instead to resolve a doi URI one must take the URL "http://dx.doi.org/"
>>> and append the scheme specific part of the URI to create a URL that will
>>> resolve to a description of the resource.
>>>
>>> Many PURL (Persistent URL) Servers support what is call partial PURLs
>>> that allow indirect resource resolution.
>>> http://purl.oclc.org/docs/long_intro.html#partial
>>>
>>>           As a side note: In a new implementation of the PURL server, my
>>>           team and I are working on, the partial PURLs will use a set of
>>>           patterns of the form of an optional regular expression (to
>>>           select and extract URI components) following by a URL pattern
>>>           with substitutions to construct the resulting URL.
>>>
>>> In Callimachus (an open source project I lead) to resolve any resource
>>> URI indirectly, one takes the URL of the Callimachus origin and appends
>>> "/diverted;" + percent encoded URI.
>>>
>>> The following pattern should be familiar with all the member and in
>>> practise is an indirect way to describe a resource:
>>> "http://dbpedia.org/sparql?query=DESCRIBE%3C{resource-uri}%3E".
>>>
>>> I propose SPARQL 1.1 Graph Store protocol should not create a new
>>> protocol (for indirect graphs), but instead provide a way for publishers
>>> to optionally publish how agents can create indirect graph requests
>>> (possibly using the same underlying mechanisms as indirect resource
>>> requests).
>>>
>>> The RDF Graph Store URL is used for two things in the current spec: 1)
>>> to provide indirect access to graphs in a store and 2) provide indirect
>>> access to the default graph in a store.
>>>
>>> I did not consider this above, but the indirect access to the default
>>> graph could use the same pattern URL as other indirect graphs (if the
>>> default graph is named in some way).
>>>
>>> So, all that is needed to support indirect graphs is 1) a way to
>>> identify the default graph and 2) a URL pattern to do
>>> GET/PUT/DELETE/POST operations on graphs, indirectly.
>>>
>>> Perhaps the sd:name (or a new property) in the Service Description could
>>> be used to identify the default dataset.
>>>
>>> Assuming the default graph has a name, we only need the URL pattern for
>>> indirect graph identification.
>>>
>>> I propose, instead of clients taking a URL add appending '?graph=' +
>>> encoded graph name, that they take the URL pattern and substitute the
>>> encoded graph name.
>>>
>>> As stated above a new property like sd:indirectGraphIdentification could
>>> be used to identify the indirect graph pattern. Alternatively, a name
>>> like sd:indirectGraphPattern might be more self explanatory.
>>>
>>> It is not clear to me how one would discover the current RDF Graph Store
>>> URL from the current spec. Here I assume the Service Description is an
>>> appropriate way to discover the indirect graph URL pattern.
>>>
>>> The main advantages of this approach is a clear connection between the
>>> Service Description and the indirect graph URL pattern. So much so, that
>>> the indirect graph URL pattern is almost self descriptive. That is, many
>>> developers looking at the URL pattern will know what to do with it right
>>> away, even if they haven't read the spec. I believe this would make the
>>> specification even easier for clients to implement (vs the current
>>> spec).
>>>
>>> By using URL pattern substitution (as apposed to concatenation) the
>>> pattern could be open to supporting other forms of substitution. In the
>>> future it may be desirable to support DIO and Virtuoso's form of
>>> indirect pattern (scheme, scheme specific part, authority, and path are
>>> substituted separately) or potentially even partial PURL's pattern,
>>> which will includes a regex to extract segments of the URI for
>>> substitution.
>>>
>>> This approach also allows server implementations to utilize the
>>> same/existing resource description mechanisms, which have been around
>>> since at least 1995, when OCLC created the first redirecting service,
>>> now known as purl.org.
>>>
>>> Thank you for considering my suggestion.
>>>
>>> Regards,
>>> James
>>>
>>>
>>>>>> [1]
>>>>>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011Nov/0048
>>>>>>
>>>>> [2] http://www.opensearch.org/Specifications/OpenSearch/1.1#OpenSearch_URL_template_syntax
>>>>>
>>>>>
>>>
>
>
Received on Tuesday, 11 September 2012 13:51:48 UTC