Re: Is there a way to automatically distinguish SPARQL endpoint and LDF server?

Hi Maxim,

Glad you're asking about this!

> Is there a way (more reliable than the one described below) to
> distinguish SPARQL endpoint and LDF server based on URL or an response
> to a request sent to this URL?

First of all, a little nitpick:
you probably want to distinguish SPARQL endpoints from TPF servers.
With "LDF", we mean all servers that publish Linked Data in some way
(so this also includes SPARQL endpoints).
A "TPF" (Triple Pattern Fragments) server is a specific kind of LDF server
that offers access to triples by triple pattern.

The reliable way to detect a TPF interface is to look inside of the response.
The TPF interface is self-describing; it literally says clients what it does.
For example, take the resource with URL http://bit.ly/1I0eNgt
(I purposely used a URL shortener here so we can't see).
If you get an RDF-based representation
    curl -L -H "Accept: text/turtle" http://bit.ly/1I0eNgt
it will contain the following triples (reformatted for readability):

    <http://fragments.dbpedia.org/2015/en#dataset> a void:Dataset, hydra:Collection;
        void:subset <>;
        hydra:search [
          hydra:template "http://fragments.dbpedia.org/2015/en{?subject,predicate,object}";
          hydra:mapping [
            hydra:variable "subject";
            hydra:property rdf:subject.
          ],[
            hydra:variable "predicate";
            hydra:property rdf:predicate.
          ],[
            hydra:variable "object";
            hydra:property rdf:object.
          ]
        ].

Or, in human language:
"This resource is a subset of the DBpedia 2015 dataset.
 You can search it by RDF subject, predicate, and object.”
In other words: "this server supports the TPF interface".

A SPARQL endpoint would not tell you any of this,
because its interface is not self-describing.

Summarizing: if a server replies with the above, it supports the TPF interface.
If responses do not contain this, it is certainly not a TPF interface.
Might be a SPARQL endpoint, might be something else.

> One heuristic which could help is the status code of the response to a
> request with empty query parameter. If the server responded with 5xx
> or 4xx code then it's a SPARQL endpoint, because it expects non-empty
> query parameter.

So what we're discussing here is to test whether something is a SPARQL endpoint.
According to the SPARQL 1.1 Protocol (http://www.w3.org/TR/sparql11-protocol/#query-operation):
    Client requests for this operation must include
    exactly one SPARQL query string (parameter name:query)
So when no query is specified, the server should give an error
(which, *if* RFC2616 is followed, should be 400, not 5xx).

However, any non-SPARQL server is free to respond with any status code
when an empty "query" parameter is appended to any of its URLs.
For example, nothing in the TPF spec stops a server at
    http://example.org/fragments
to give a 404 error if a user tries
    http://example.org/fragments?query=
because that behavior is (purposely) unspecified.

So finding out whether something is a SPARQL endpoint
with 100% certainty is not possible with the current SPARQL 1.1 spec.

Hope this helps, don't hesitate to ask more!

Best,

Ruben

Received on Friday, 21 August 2015 20:17:47 UTC