Re: RDF dataset/SPARQL endpoint descriptions from Andreas Langegger on 2009-09-11 (public-rdf-dawg-comments@w3.org from September 2009)

From: Andreas Langegger <al@jku.at>
Date: Fri, 11 Sep 2009 12:47:19 +0200
To: Gregory Williams <greg@evilfunhouse.com>
CC: RDF Data Access Working Group <public-rdf-dawg-comments@w3.org>, axel@polleres.net, michael.hausenblas@deri.org
Message-ID: <4AAA2AB7.8080005@jku.at>
Hi Greg,

Gregory Williams wrote:
> It looks like a "DESCRIBE DATASET/SERVICE" won't be the path taken, as 
> there are some concerns about this operating at the query engine level, 
> when it's really a protocol operation. The exact method for how to do it 
> hasn't been nailed down yet, but some of the options under discussion are:

I also had some concerns when changing the grammar, because parts of a 
description may be "protocol"-specific. Why should a non-HTTP query 
engine (e.g. RDF store) provide SPARQL endpoint descriptions that are 
only relevant when using the HTTP protocol? I'll explain what I've 
experienced.

Many features are query engine specific (e.g. fulltext seach, query 
language, initial bindings, etc.) and should be announced in a query 
engine specific way and not via the SPARQL HTTP protocol.

I'd advice not to announce them as OPTIONS/X-Headers/etc. since they are 
relevant to any client using the query engine even without a HTTP 
endpoint (e.g. via in-process API, ODBC/Virtuoso, etc.) - Can you give 
me a reason why HTTP OPTIONS/X-Headers makes more sense?

> - an HTTP response header linking to a service description document
> - the use of the HTTP OPTIONS verb on the endpoint URI
> - using content negotiation on the endpoint URI to request RDF (or 
> possibly having the endpoint URI return RDFa)
> - a new protocol operation (/sparql?serviceDescription)

I would also prefer an approach which allows to query endpoint 
descriptions with sub queries or FROM <uri>. It would allow clients to 
read descriptions without the need for parsing them, they may not have a 
SPARQL engine themselves. That won't be possible with HTTP OPTIONS, nor 
with X-Headers, I'd prefer a new protocol operation, as you suggested 
?serviceDescription or maybe just ?desc.

e.g.
http://example.com/sparql?query=select+*+from+<http%3A%2F%2Fexample.com%2Fsparql%3Fdesc>+where+{+%3Fs+%3Fp+%3Fo+}
would work (if FROM is allowed, should always allow local URIs)

Some features are SPARQL protocol-only, such as result formats. I would 
suggest a way where the SPARQL endpoint can inject statements into the 
description generated by the query engine originally and pass ith 
trough. Those parts of the description will only be provided when a 
client connects via HTTP/SPARQL protocol.

I think there are many non-LOD applications using SPARQL without HTTP. 
They should also be able to check out if a query engine supports 
full-text search, etc.!

Dataset descriptions such as voiD are not protocol specific either. They 
exclusively relate to the dataset served. Why not provide such meta data 
in a more generic form than via HTTP SPARQL? A DESCRIBE DATASET would 
really make sense. If the query engine has no such information it would 
return an empty model, which would be more than correct. Are there any 
concerns with that?

I would keep in mind that dataset descriptions may become large cause 
users want to include statistics, summaries, etc. Since it is cumbersome 
to send HTTP cache headers upon specific queries (DESCRIBE DATASET), it 
may be better to just return a voiD dataset URI which can be retrieved 
(or not if it hasn't changed).

> Can you explain why returning a dataset description as SPARQL results 
> would be better than returning it as RDF?

I ment SPARQL DESCRIBE results, which is RDF (not XML results).

>> For query federation, it would be very useful if the future SPARQL REC 
>> supports BINDINGS such as introduced by Eric [2] before. My proposal 
>> works with a set of bindings with a special "null" keyword for unbound 
>> variables, e.g.:
> ...
>> It is not much effort for implementers and a federated query processor 
>> can then process pipelined blocks of queries more efficiently.

> Unfortunately, this won't be part of the next SPARQL version, but 
> service descriptions should allow any implementations to declare that 
> they support such an extension.

Well, no good news but I understand. Can I find some chat log about 
that? Just would like to get a picture of the reasons apart from lack of 
time (it's a fairly easy feature and simple to implement). The main 
bottleneck for large scale query federation is lack of statistics 
anyway. But these can be generated periodically remotely. If we add 
support for initial bindings to the SPARQL spec it would be much better 
than advertise it as a feature, nobody will do that (lack of 
incentives), and thus, impossible to do large scale query federation in 
the end. Are there any chances to still talk about it?

Regards,
AndyL
Received on Friday, 11 September 2009 10:48:04 UTC