service descriptions: comparison of VoiD, DARQ, and SADDLE from Gregory Williams on 2009-03-31 (public-rdf-dawg@w3.org from January to March 2009)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Tue, 31 Mar 2009 09:48:43 -0400
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <F9FEAB30-39F7-4C52-825A-BCA99B205B0B@evilfunhouse.com>
Below, a brief summary of the three main vocabularies discussed last  
week in the context of service descriptions.

.greg



===Summary===

SADDLE is the most relevant vocabulary to the desire to formalize  
SPARQL service descriptions, but is mostly just a sketch (lacking a  
formal spec or any code that uses it). DARQ and VoiD both have terms  
that would complement those in SADDLE, allowing describing statistical  
properties of the underlying data. VoiD has terms that are most  
relevant, but possibly not usable as-is due to VoiD's focus on  
datasets and not endpoints.

===VoiD===

Link: http://semanticweb.org/wiki/VoiD

Primary focus:

VoiD is a vocabulary for describing linked datasets, including  
prototypical resources in a dataset, the topic(s) of a dataset, what  
terms and how many triples are used to connect two datasets, and  
various other facts about the dataset (homepage, sparql endpoint,  
etc.). It can also be used to describe statistical information about a  
dataset such as total number of triples, resources, subjects,  
predicates, and objects, and the number of triples per class (this can  
be extended to talk about more complex slices of data).

Coverage of "service descriptions":

VoiD has two primary terms that support service descriptions:  
void:feature (Technical Description), and void:sparqlEndpoint.

void:feature may be used to describe, for example, that the dataset is  
available in certain RDF serializations (DBpedia void:feature  
[ dcterms:format "application/rdf+xml" ; ]), but is meant to describe  
features of the dataset, not a SPARQL endpoint. As such, I'm not sure  
this would be a general enough term to describe supported features of  
an endpoint (supported functions or syntax extensions).

void:sparqlEndpoint can be used to link a dataset to the URI of a  
SPARQL endpoint (with SPARQL protocol support). With a specific  
endpoint in mind, then, it is possible to discover datasets that the  
endpoint provides, and look up information about those datasets (such  
as statistical information described with void:statItem). However,  
since such information would be defined in the context of a dataset,  
it would represent only a subset (possibly a proper subset) of the  
data provided by the endpoint.

Status:

VoiD is actively maintained, and has seen the most widespread  
adoption. It has a spec and proper RDFS/OWL schema[1], documenting the  
VoiD classes and properties. There is also a VoiD guide[2] describing  
how to use the features of the vocabulary.

The VoiD wiki lists a number of projects that are using VoiD[3],  
including the OpenLink Virtuosos SPARQL endpoint for DBPedia.

===SADDLE===

Link: http://www.w3.org/2001/sw/DataAccess/proto-wd/saddle.html

Primary focus:

SADDLE is a vocabulary for SPARQL service descriptions. It has terms  
for describing a SPARQL endpoint, its URI, supported query languages,  
result formats, datasets (here identified by individual RDF files, but  
might also be suitable for linking to VoiD datasets), extension  
functions, and "vocabularies" (of the saddle:vocabulary term, the  
SADDLE webpage explains:

	this service invites queries that use predicates and classes that  
start with <...foaf/> (aka "in the foaf namespace")
	
Coverage of "service descriptions":

SADDLE contains many of the terms important for basic service  
descriptions. The core terms for describing supported languages,  
result formats, and extension functions seem particularly important as  
a point of extensibility for SPARQL as a spec. With these in place, it  
would be possible for implementations to converge on future language  
extensions in an interoperable way.

The biggest area SADDLE does not cover is in describing the data  
provided by the endpoint (obvious link to VoiD here). It has a basic  
saddle:dataSet term to point to RDF data present in the underlying  
store, but has very little in the way of terms for describing commonly  
used terms, classes, or mroe general statistical properties of the  
data (with the aforementioned saddle:vocabulary term being the one  
exception). Such terms are important for work on federated queries,  
but may be outside the scope of the current DAWG work (with terms for  
statistical information able to develop outside the DAWG process but  
within a DAWG-supported framework for service descriptions).
	
Status:

SADDLE was described by Kendall Clark during the previous round of  
DAWG development work (at which time the group postponed the service  
descriptions issue). There is no formal spec or detailed description,  
only a brief introduction, a namespace declaration, and sample RDF  
which demonstrates use of terms in that namespace.

===DARQ===

Link: http://darq.sourceforge.net/

Primary focus:

DARQ is a vocabulary for describing basic statistics of an endpoint's  
dataset and basic requirements for queries over the dataset (beyond  
what SPARQL can enforce).

The DARQ vocabulary has terms for describing a SPARQL endpoint and its  
URI. Other terms describe total number of triples, the total number of  
triples with specific predicates and the selectivity of subjects  
(objects, resp.) of triples with bound predicates and objects  
(subjects).

DARQ also has terms for describing basic requirements for graph  
patterns in a query. For example, an endpoint may be described as  
requiring a triple pattern with foaf:mbox as predicate and a bound  
object.

Coverage of "service descriptions":

DARQ's terms for describing triple counts and selectivities are useful  
for federated query work, but probably less immediately needed than  
the terms for extensions and features. The ability to describe  
statistical information about an endpoint's dataset (and not a  
specific subset in the VoiD "dataset" sense) is useful.

Status:

Like SADDLE, DARQ lacks a spec, but does have code that uses it. The  
webpage is a bit dated (listed as updated in 2006), but it was  
published at ESWC last year[4].



[1] http://rdfs.org/ns/void/
[2] http://rdfs.org/ns/void-guide
[3] http://semanticweb.org/wiki/VoiD#Examples_in_the_Wild
[4] http://www.eswc2008.org/final-pdfs-for-web-site/qpII-2.pdf
Received on Tuesday, 31 March 2009 13:49:21 UTC