Re: RDF dataset/SPARQL endpoint descriptions from Andreas Langegger on 2009-09-29 (public-rdf-dawg-comments@w3.org from September 2009)

From: Andreas Langegger <al@jku.at>
Date: Tue, 29 Sep 2009 20:41:56 +0200
To: Gregory Williams <greg@evilfunhouse.com>, Axel Polleres <axel.polleres@deri.org>
Cc: RDF Data Access Working Group <public-rdf-dawg-comments@w3.org>
Message-Id: <15C81366-976F-4D58-AD87-ACC4C71704A2@jku.at>
Hello again,

sorry for my late reply. Will there be a dedicated meeting at ISWC to  
discuss about future SPARQL?

Regards,
AndyL

> There are two issues here. The first is the argument for why it  
> should be a protocol level feature and not a query engine feature.  
> In general, many endpoints may use the same query engine, and it  
> will be easier for protocol-level code to discover the features of  
> the underlying endpoint for inclusion in a service description than  
> for the query engine to discover which endpoint has called it.
>
> For non-http access to a SPARQL engine, the thinking is that there  
> can be specific API calls for discovering service descriptions,  
> depending on the protocol used. I'm not familiar enough with  
> Virtuoso to know what that would look like, but a general "in- 
> process API" can presumably have an implementation specific call(s)  
> for service description.
>
>
>>> - an HTTP response header linking to a service description document
>>> - the use of the HTTP OPTIONS verb on the endpoint URI
>>> - using content negotiation on the endpoint URI to request RDF (or  
>>> possibly having the endpoint URI return RDFa)
>>> - a new protocol operation (/sparql?serviceDescription)
>>
>> I would also prefer an approach which allows to query endpoint  
>> descriptions with sub queries or FROM <uri>. It would allow clients  
>> to read descriptions without the need for parsing them, they may  
>> not have a SPARQL engine themselves. That won't be possible with  
>> HTTP OPTIONS, nor with X-Headers, I'd prefer a new protocol  
>> operation, as you suggested ?serviceDescription or maybe just ?desc.
>
> Agreed that this is a desirable feature. I have the same worries  
> about HTTP OPTIONS, but a header-based method will give you back a  
> URI that could presumably be used in a FROM clause. Again, much of  
> this is still under discussion, but the group is aware of these  
> concerns.
>
>> I think there are many non-LOD applications using SPARQL without  
>> HTTP. They should also be able to check out if a query engine  
>> supports full-text search, etc.!
>
> Again, if this is via an in-process API, this is something that will  
> either be known ahead of time or could be discovered without needing  
> to deal with an RDF-based service description.
>
>> Dataset descriptions such as voiD are not protocol specific either.  
>> They exclusively relate to the dataset served. Why not provide such  
>> meta data in a more generic form than via HTTP SPARQL? A DESCRIBE  
>> DATASET would really make sense. If the query engine has no such  
>> information it would return an empty model, which would be more  
>> than correct. Are there any concerns with that?
>
> Again, it's probably easier for the endpoint to know where to find  
> statistics about a dataset than for the engine to do it. I can  
> imagine implementations for which this would be relatively simple at  
> the engine level, but I suspect that's not the general case.
>
>> I would keep in mind that dataset descriptions may become large  
>> cause users want to include statistics, summaries, etc. Since it is  
>> cumbersome to send HTTP cache headers upon specific queries  
>> (DESCRIBE DATASET), it may be better to just return a voiD dataset  
>> URI which can be retrieved (or not if it hasn't changed).
>
> Understood, and I think we'll be discussing this. The same argument  
> could also be made for the service description as well.
>
>>>> For query federation, it would be very useful if the future  
>>>> SPARQL REC supports BINDINGS such as introduced by Eric [2]  
>>>> before. My proposal works with a set of bindings with a special  
>>>> "null" keyword for unbound variables, e.g.:
>>> ...
>>>> It is not much effort for implementers and a federated query  
>>>> processor can then process pipelined blocks of queries more  
>>>> efficiently.
>>
>>> Unfortunately, this won't be part of the next SPARQL version, but  
>>> service descriptions should allow any implementations to declare  
>>> that they support such an extension.
>>
>> Well, no good news but I understand. Can I find some chat log about  
>> that? Just would like to get a picture of the reasons apart from  
>> lack of time (it's a fairly easy feature and simple to implement).
>
> This was briefly discussed in [1] in the context of the Parameters  
> feature[2], but I think it came down to time constraints, more  
> important features, and the lack of existing implementations of this  
> feature.
>
>> The main bottleneck for large scale query federation is lack of  
>> statistics anyway. But these can be generated periodically  
>> remotely. If we add support for initial bindings to the SPARQL spec  
>> it would be much better than advertise it as a feature, nobody will  
>> do that (lack of incentives), and thus, impossible to do large  
>> scale query federation in the end.
>
>
> I'm not convinced of this. If it's a compelling extension, getting  
> implementations to support it isn't impossible. It just didn't seem  
> as ready for standardization as other features.
>
> .greg
>
> [1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JanMar/0128.html
> [2] http://www.w3.org/2009/sparql/wiki/Feature:Parameters
>
>


http://www.langegger.at
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
FAW - Institute for Application-oriented Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69
Received on Tuesday, 29 September 2009 18:42:49 UTC