Re: RDF dataset/SPARQL endpoint descriptions from Gregory Williams on 2009-09-14 (public-rdf-dawg-comments@w3.org from September 2009)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Mon, 14 Sep 2009 17:50:13 -0400
To: Andreas Langegger <al@jku.at>
Cc: RDF Data Access Working Group <public-rdf-dawg-comments@w3.org>, axel@polleres.net, michael.hausenblas@deri.org
Message-Id: <2E14337C-184F-46F8-85F0-FFB9139DA900@evilfunhouse.com>
On Sep 11, 2009, at 6:47 AM, Andreas Langegger wrote:

> Hi Greg,
>
> Gregory Williams wrote:
>> It looks like a "DESCRIBE DATASET/SERVICE" won't be the path taken,  
>> as there are some concerns about this operating at the query engine  
>> level, when it's really a protocol operation. The exact method for  
>> how to do it hasn't been nailed down yet, but some of the options  
>> under discussion are:
>
> I also had some concerns when changing the grammar, because parts of  
> a description may be "protocol"-specific. Why should a non-HTTP  
> query engine (e.g. RDF store) provide SPARQL endpoint descriptions  
> that are only relevant when using the HTTP protocol? I'll explain  
> what I've experienced.
>
> Many features are query engine specific (e.g. fulltext seach, query  
> language, initial bindings, etc.) and should be announced in a query  
> engine specific way and not via the SPARQL HTTP protocol.
>
> I'd advice not to announce them as OPTIONS/X-Headers/etc. since they  
> are relevant to any client using the query engine even without a  
> HTTP endpoint (e.g. via in-process API, ODBC/Virtuoso, etc.) - Can  
> you give me a reason why HTTP OPTIONS/X-Headers makes more sense?

There are two issues here. The first is the argument for why it should  
be a protocol level feature and not a query engine feature. In  
general, many endpoints may use the same query engine, and it will be  
easier for protocol-level code to discover the features of the  
underlying endpoint for inclusion in a service description than for  
the query engine to discover which endpoint has called it.

For non-http access to a SPARQL engine, the thinking is that there can  
be specific API calls for discovering service descriptions, depending  
on the protocol used. I'm not familiar enough with Virtuoso to know  
what that would look like, but a general "in-process API" can  
presumably have an implementation specific call(s) for service  
description.


>> - an HTTP response header linking to a service description document
>> - the use of the HTTP OPTIONS verb on the endpoint URI
>> - using content negotiation on the endpoint URI to request RDF (or  
>> possibly having the endpoint URI return RDFa)
>> - a new protocol operation (/sparql?serviceDescription)
>
> I would also prefer an approach which allows to query endpoint  
> descriptions with sub queries or FROM <uri>. It would allow clients  
> to read descriptions without the need for parsing them, they may not  
> have a SPARQL engine themselves. That won't be possible with HTTP  
> OPTIONS, nor with X-Headers, I'd prefer a new protocol operation, as  
> you suggested ?serviceDescription or maybe just ?desc.

Agreed that this is a desirable feature. I have the same worries about  
HTTP OPTIONS, but a header-based method will give you back a URI that  
could presumably be used in a FROM clause. Again, much of this is  
still under discussion, but the group is aware of these concerns.

> I think there are many non-LOD applications using SPARQL without  
> HTTP. They should also be able to check out if a query engine  
> supports full-text search, etc.!

Again, if this is via an in-process API, this is something that will  
either be known ahead of time or could be discovered without needing  
to deal with an RDF-based service description.

> Dataset descriptions such as voiD are not protocol specific either.  
> They exclusively relate to the dataset served. Why not provide such  
> meta data in a more generic form than via HTTP SPARQL? A DESCRIBE  
> DATASET would really make sense. If the query engine has no such  
> information it would return an empty model, which would be more than  
> correct. Are there any concerns with that?

Again, it's probably easier for the endpoint to know where to find  
statistics about a dataset than for the engine to do it. I can imagine  
implementations for which this would be relatively simple at the  
engine level, but I suspect that's not the general case.

> I would keep in mind that dataset descriptions may become large  
> cause users want to include statistics, summaries, etc. Since it is  
> cumbersome to send HTTP cache headers upon specific queries  
> (DESCRIBE DATASET), it may be better to just return a voiD dataset  
> URI which can be retrieved (or not if it hasn't changed).

Understood, and I think we'll be discussing this. The same argument  
could also be made for the service description as well.

>>> For query federation, it would be very useful if the future SPARQL  
>>> REC supports BINDINGS such as introduced by Eric [2] before. My  
>>> proposal works with a set of bindings with a special "null"  
>>> keyword for unbound variables, e.g.:
>> ...
>>> It is not much effort for implementers and a federated query  
>>> processor can then process pipelined blocks of queries more  
>>> efficiently.
>
>> Unfortunately, this won't be part of the next SPARQL version, but  
>> service descriptions should allow any implementations to declare  
>> that they support such an extension.
>
> Well, no good news but I understand. Can I find some chat log about  
> that? Just would like to get a picture of the reasons apart from  
> lack of time (it's a fairly easy feature and simple to implement).

This was briefly discussed in [1] in the context of the Parameters  
feature[2], but I think it came down to time constraints, more  
important features, and the lack of existing implementations of this  
feature.

> The main bottleneck for large scale query federation is lack of  
> statistics anyway. But these can be generated periodically remotely.  
> If we add support for initial bindings to the SPARQL spec it would  
> be much better than advertise it as a feature, nobody will do that  
> (lack of incentives), and thus, impossible to do large scale query  
> federation in the end.


I'm not convinced of this. If it's a compelling extension, getting  
implementations to support it isn't impossible. It just didn't seem as  
ready for standardization as other features.

.greg

[1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JanMar/0128.html
[2] http://www.w3.org/2009/sparql/wiki/Feature:Parameters
Received on Monday, 14 September 2009 21:50:51 UTC