Re: Graph store protocol editor's draft updated from Sandro Hawke on 2012-02-14 (public-rdf-dawg@w3.org from January to March 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 14 Feb 2012 10:56:23 -0500
To: Chimezie Ogbuji <chimezie@gmail.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <1329234983.16744.349.camel@waldron>
On Tue, 2012-02-14 at 00:33 -0500, Chimezie Ogbuji wrote:
> We discussed these changes in our last teleconference, so I'm
> surprised that you weren't expecting them.  See my response below.

I must have either misunderstood or I'm forgetting, sorry.

> On Mon, Feb 13, 2012 at 2:34 PM, Sandro Hawke <sandro@w3.org> wrote:
> >>  - Removed Protocol service discovery section 5.8 (addressing issue of
> >> confusion regarding SPARQL protocol URL and that of a GSP
> >> implementation)
> >>  - Changed URL used to for indirect identification to reflect that it
> >> identifies a graph store (removed all references to 'service')
> >
> > I wasn't expecting these changes together, like this.   Like this, it is
> > impossible for a client to construct an indirect graph IRIs, since (as
> > the spec says) the graph store IRI needs to be known "a priori".
> 
> As I've said (many times) before:  without a discovery method, this
> will remain a problem.  I don't see how how using the service as the
> 'base' for constructing indirect IRIs makes this problem go away since
> the 'service' URL will still need to be known 'a priori' and a) there
> is no mechanism to discover it and b) the GSP is explicitely divorced
> from the SPARQL protocol.  As you allude to below, this also muddles
> the protocol model. The combination is more problematic than these
> changes, in my opinion.

Let's back up a step.    The reason we need Indirect Identification is
because Direct Identification often doesn't work.  As you explain in
the draft, DI fails in the cases where people name graphs with URIs
they don't control.    Now, this violates webarch, but the SPARQL
specifications, implementations and community allow it.  For local
deployments, it's useful, so it's become common, and supporting these
private namespaces is now seen as a design requirement.

But private namespaces leave folks unable to refer to graphs in other
contexts; by violating webarch, they've lost some advantages of the
Web. 

Indirect Identification addresses this.  It's a great workaround,
putting us back in line with webarch, without users having to change
anything.

This allows metadata to work again.  It allows provenance.  It allows
access control.   And it allows GSP.

The key point here is that GSP isn't the only one who needs this.  It
could be fixed in the GSP spec, but to me it makes a lot more sense to
fix in core SPARQL itself.    I think of it as SPARQL's responsibility
to provide global identifiers, since SPARQL opened the door to local
identifiers.   If the two are divorced, as you say, SPARQL should
logically get this issue in the divorce.

And the solution is trivial.  We just say that every graph behind an
endpoint is identified by <endpoint>?graph=<name>, or <endpoint>?default
for the default graph.

I wish I'd understood this long ago (back in the days of SPARQL 1.0,
even!), or at least been able to convey this understanding before Last
Call for the other SPARQL specs.  Procedurally, however, I think it's
okay to put it into the GSP draft, since that's the document we can
still edit and it is certainly related.  Once all the documents are at
CR, we can move text between the drafts if we want, I believe.

If for some reason, you all still are not convinced, the other option:

> > I really liked indirect IRIs, and I think the Provenance WG was counting
> > on them, since they let folks use an IRI to talk about a graph in
> > SPARQL-land.  But now they don't do that any more.
> >
> > I'd be okay with either:
> >  (1) putting 5.8 back
> 
> Note, Greg is not okay with this.

Looking more closely, it's not 5.8 that I want back, it's this sentence:

        Within a service description document for an implementation of
        this protocol, the object of an sd:defaultDataset statement is
        understood to be the identifier of the Graph Store

and the accompanying illustration in section 5.5, showing how sd:default
Dataset is used (but modified to be clear it's the endpoint, not the GSP
service, which no longer exists).   To clarify, I mean this to be
talking about a SPARQL SD, not some kind of graphstore SD.  

As far as I can tell this is conceptually redundant.  I believe the WG
agrees that the object of an sd:defaultDataset statement *is* understood
to be the identifier of the Graph Store, it's just... we don't actually
say that in any specs any more.

And by not saying that, we make it so clients who know how to get to a
graph via SPARQL 1.1 (given the endpoint address and the graph name)
cannot get at that same graph via the HTTP Graph Store Protocol.

So there's no way to do metadata.  There's no way to do provenance.
There's no way for a client doing SPARQL protocol to move over to GSP.

Again, this second option is an editorial question, not a design
question, but without the editorial fix, no one will be able to figure
it out from the specs.

    -- Sandro

> > or
> >  (2) building the indirect IRIs off the endpoint address instead of the
> > graphstore address.   (I think this is the vastly preferable solution,
> > BTW, because it's just so simple, even if the modeling isn't quite as
> > elegant.)
> 
> Unfortunately, I don't agree for the reasons stated above.
Received on Tuesday, 14 February 2012 15:56:38 UTC