Re: PROV-AQ SPARQL endpoint discovery (ISSUE-609) from Graham Klyne on 2012-12-16 (public-prov-wg@w3.org from December 2012)

From: Graham Klyne <GK@ninebynine.org>
Date: Sun, 16 Dec 2012 13:08:44 +0000
To: Timothy Lebo <lebot@rpi.edu>
CC: W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <50CDC7DC.6010406@ninebynine.org>
Tim,

This has been a while coming.  I wanted to have time to fully review and 
understand your position - I think I'm coming to better understand it, but I'm 
still concerned about the client complexity incurred by using a single link 
relation, for which I am not entirely seeing adequate benefits.

I'm also wondering if we shouldn't present access and query as distinct *kinds* 
of service, which I think they are, but that doesn't entirely resolve the main 
point of our discussion here.

On 11/12/2012 17:10, Timothy Lebo wrote:
> Graham,
>
> On Dec 10, 2012, at 9:32 AM, Graham Klyne <GK@ninebynine.org> wrote:
>
>> Indicating SPARQL endpoints fort provenance information is one of the points in PROV-AQ that I was actioned to raise for discussion as of the last teleconference.
>>
>> See also: http://www.w3.org/2011/prov/track/issues/609
>>
>> The question is asked:  how do we discover a SPARQL endpoint for provenance querying?
>>
>> The latest version has removed all text about using SPARQL to query provenance, as this has been considered more appropriate for the FAQ (it adds no new specification).  But the question of discovering an endpoint remains on the table.
>>
>> Here are some options (not necessarily exhaustive):
>> 1. Say nothing: treat it as out of scope
>
> +.5 (since HTTP GET'ing the service would give me an RDF description that says it's a SPARQL endpoint --
> -.5 since what I just said would be a best practice that people would need to figure out themselves.
>
>> 2. Use a new link relation
>
> -1 it's superfluous proliferation that is more eloquently handled.
>
>> 3. Make it part of the provenance service description (different URI from current REST provenance service)
>> 4. Make it part of the provenance service as an option (same URI as current REST provenance service)
>
> I don't see how the distinction of "same or different" URI applies.

I meant one URI as entry point REST or SPARQL options vs separate URIs for these 
cases (separately from whether or not the different URIs are presented with 
different link relations).

> I think you need to demote your "REST provenance service" as an option among many, instead of the "requirement" that you're framing it as now.

I don't think the "REST provenance service" is presented as a requirement - is 
there text that implies that it is?

I would certainly agree that the REST service and SPARQL endpoint are both 
options that MAY be provided independently of each other, and there is no 
requirement that either MUST be provided.  (Indeed, this is part of what drives 
my proposal to use different link relations, so a client can see immediately 
what options it has.)

> When you do, "same or different URI" becomes moot, and it would either report back that it's a URITemplate service or a SPARQL endpoint service or a JDBC service or a TheNextWay service.

That is a possible design, which I'm trying to avoid.

My reason is that it requires a client to dereference the service document in 
order to know if it is able to use the provenance service provided.  IMO, this 
leads to more complex client implementation code paths compared with having this 
level of information conveyed directly in the link relation.

(E.g. if a client has only SPARQL query capability, and a resource has both 
SPARQL and REST options, the client must potentially retrieve multiple service 
descriptions to decide which one it can use.)

[...]

>> I think option 3, or something very similar, is proposed by Tim at http://lists.w3.org/Archives/Public/public-prov-wg/2012Nov/0324.html
>>
>> The idea as I understand it is that on retrieving the service description indicated by a provenance-service link, the description indicates a SPARQL endpoint.
>>
>> Tim proposes that the service description returned should be (per http://www.w3.org/TR/sparql11-service-description/):
>>
>>   <service-URI> a prov:ProvenanceService, sd:Service ;
>>     ...
>>
>> I could go with this with one change: DO NOT return prov:ProvenanceService as a type for this endpoint, as that causes confusion with the RESTful service specification we already have; i.e. return just:
>>
>>   <service-URI> a sd:Service ;
>>     …
>
>
> Then I think prov:ProvenanceService as it stands is over specified.
> prov:ProvenanceService should be defined as "A service that provides provenance information",
> and a SPARQL endpoint that returns PROV-O should be a legitimate instance of this class.
> (THIS has the beginnings of a blocker issue)
> Otherwise, you're demanding *how* it should behave, instead of permitting anyone's service to
> indicate that it is indeed something that PROV-savy clients can "poke around in".
> I suggest you relax your "RESTful service specification" to a suggestion among many options and
 > make it less of a requirement (similar to how I'm suggesting you treat the 
URI template in another thread).

Well, I thought the whole point of introducing prov:ProvenanceService as a type 
was to indicate this level of specification.

I don't see any problem with the content of the service description telling a 
client HOW to behave (or, more strictly, providing information that a client can 
use to guide its behaviour).  I thought that was the very point of HATEOAS.  And 
I see the service type here is just part of the content of the service description.

At some point, someone has to specify what the service description actually 
means, from which may be derived what options for subsequent processing are 
available to a client.  Just saying  'here's something that PROV-savy clients 
can "poke around in"' doesn't really provide a basis for interoperability.

It's entirely possible that further options may be added in future, and that 
possibility should be part of the description of the service description format, 
which I would say includes things like describing the role of RDF types in an 
RDF-based service description.

[...]

>> It seems to me that link relations are cheap and easy to decode, so I don't fully go with this argument.  But there's a comment by Eric Wilde that I think is worth thinking about:
>>
>> [[
>> while this may be a bit fuzzy, typically link rels and media types serve
>> different needs:
>>
>> - a link rel allows a client to understand why it might want to follow
>> some link from a resource ("get a picture of the product here").
>
>
> The "why" should be "there's a service offering PROV over ?there" (*independent* of implementation),
> not "There's a SPARQL (i.e., specific implementation) endpoint over there".
> Let linked data and REST do the work. If you want PROV, go request the server's URI and find out about
> how it accepts requests.

As a principle, this is fine.

But what I'm not seeing is why this has to be done to the exclusion of using 
link relations to guide a client's choice between possible multiple services, 
particularly when using link relations makes for simpler client implementation.

Or: I am seeing the choice between SPARQL and REST services as part of "why?" a 
client might choose one service over another, not merely an elaboration of how a 
client interacts when it gets there.

>> - a media type then governs the actual interaction, where client and
>> server need to agree on how to interact when the client has made the
>> choice to engage in the interaction ("here's an image/gif, because you
>> told me you know how to handle this media type").
>> ]]
>
>
> Exactly. Follow the rel="provenance" and GET HTTP Accept application/rdf+xml (or text/turtle)
> to get an RDF description of the service.
> Find out it's a SPARQL endpoint (or, JDBC, or uritemplated, or whatever) and then follow suit
 > if you know how to talk SPARQL, JDBC, uritemplated, or whatever.
> If the server uses another technology to describe it's interface, then it can return that
 > regardless of the HTTP Accept. It's just not a great citizen of the 
[semantic] web.

 >> -- http://lists.w3.org/Archives/Public/public-ldp/2012Nov/0030.html

Yes, I accept this can work.  It's the basis on which I could go with option 3, 
despite my reservations of client complexity.  (I think the use of 
prov:ProvenanceService as a service type is a relative detail.)

BUT, and this is where I want to continue the engagement with the LDP group and 
other REST experts, I don't see why also using a link relation to select a 
suitable service URI is a problem, particularly when it makes the client code 
more complex.

>> Which suggests a decomposition into Why? (link del) and How? (resource type - which might conceivable be extended to resource content)
>>
>> But see also some discussion that sheds doubt on the roles of media type:
>> [[
>> Ah, I think I finally understand why you talk about different media
>> types. I've never seen the need, and still can't quite say that I do,
>> at least in the sense of "need" that derives from working within the
>> constraints of REST.
>> ]]
>> -- http://lists.w3.org/Archives/Public/public-ldp/2012Nov/0014.html
>
>
> Yes, REST's dependence on MIME is unfortunately working at the format == frbr:Manifestation level,
 > not the content ==  frbr:Expression level.
> If one wants the same behavior for "all RDF formats", then they need to map those MIMEs into
> "abstract RDF" and behave appropriately regardless of format.

Hmmm... is that a proposal?  I'm lot sure I see it.

>>
>> Option 2 also has the advantage of being direct and simple, even though it may somewhat go against the spirit
 >> of the why/how principle noted by Eric Wilde.
>
>
> Yes, breaking how and why would be pretty bad.

Hmmm... but, as Eric points out, the distinction is not entirely clear cut, and 
I'm concerned that this becomes a point of dogmatism rather than practicality.

>>
>> From, a purely pragmatic viewpoint, I'd make a case that finer-grained link relations are more efficient:  if there are (say) a provenance REST service (as defined by PROV_AQ) and a provenance query service following the SPARQL service description, different link relations would allow me to pick the one I want and get on with accessing the provenance.  But if the same link relation is used for each, I have to read the service documents to decide which one to use.  This has two problems:
>> 1. Extra round-trips.
>> 2. Increased client code complexity.
>>
>> Of these, I think the latter is more compelling.
>
>
> Yes, this is a good argument for superfluously proliferating your link types.
> But you're just pushing the logic from one place to another, so why not keep it where it is most natural?

It's not just pushing logic from place to placve that I'm seeing, it's creating 
the requirement for additional logic (to read multiple service descriptions and 
make a choice based on what one finds there).

> Practically, how many different service implementations would one resource have? I would think few.

Yes.  Ideally just one (for each *kind* of service)

> Also, a well designed service could handle multiple service request types,
> so it's not clear that the service implementations would proliferate.

I agree, and that wasn't my concern, so I'm not sure what point you're making here.

...

Reflecting on all this,

(1) the main advantage of the approach you propose is that additional access 
mechanisms can be added to the service description, all through a common link 
relation.

(2) the disadvantage is that one must retrieve the service description to find 
out if it matches a client capability

...

I think part of the problem here is due to overloading of the media type 
parameter for both format specification and content interpretation.

AIUI, the "traditional" approach to REST is to use content type to signal the 
service description type, and possibly use negotiation to get an appropriate 
form.  When RDF is used for multiple service types, this no longer works and it 
becomes necessary to actually interpret the content retrieved to find out what 
service is being described.  (Noting that RDF has multiple formats that are 
semantically equivalent, so to use media types in the same way would require a 
new media type for each combination of service description type and RDF syntax 
used.)

Maybe falling back to the link relation is not the ideal option, but at this 
stage I'm not seeing anything else that can keep the client side code really simple.

...

I think there's a possible finesse (or fudge, depending on your PoV) possible 
here:  what we are talking about are actually two distinct kinds of service: 
access (or lookup) and query.  The REST service is about access to provenance 
about a resource, and the SPARQL service allows provenance, including about a 
given resource, to be queried.  On this basis, there's some justification for 
using two link relations because we're talking about two different *kinds* of 
service, not just two mechanisms.  In future, if new access and/or query 
mechanisms are introduced, they would be introduced through extensions the 
corresponding description formats (vocabularies).  As long as there's only one 
access and one query mechanisms, things remain simple for clients, but as more 
are introduced then clients may need to get more sophisticated in their handling 
of the service descriptions (or just fail).

#g
--
Received on Sunday, 16 December 2012 13:14:17 UTC