- From: Gregory Williams <greg@evilfunhouse.com>
- Date: Tue, 31 Mar 2009 09:48:43 -0400
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
Below, a brief summary of the three main vocabularies discussed last week in the context of service descriptions. .greg ===Summary=== SADDLE is the most relevant vocabulary to the desire to formalize SPARQL service descriptions, but is mostly just a sketch (lacking a formal spec or any code that uses it). DARQ and VoiD both have terms that would complement those in SADDLE, allowing describing statistical properties of the underlying data. VoiD has terms that are most relevant, but possibly not usable as-is due to VoiD's focus on datasets and not endpoints. ===VoiD=== Link: http://semanticweb.org/wiki/VoiD Primary focus: VoiD is a vocabulary for describing linked datasets, including prototypical resources in a dataset, the topic(s) of a dataset, what terms and how many triples are used to connect two datasets, and various other facts about the dataset (homepage, sparql endpoint, etc.). It can also be used to describe statistical information about a dataset such as total number of triples, resources, subjects, predicates, and objects, and the number of triples per class (this can be extended to talk about more complex slices of data). Coverage of "service descriptions": VoiD has two primary terms that support service descriptions: void:feature (Technical Description), and void:sparqlEndpoint. void:feature may be used to describe, for example, that the dataset is available in certain RDF serializations (DBpedia void:feature [ dcterms:format "application/rdf+xml" ; ]), but is meant to describe features of the dataset, not a SPARQL endpoint. As such, I'm not sure this would be a general enough term to describe supported features of an endpoint (supported functions or syntax extensions). void:sparqlEndpoint can be used to link a dataset to the URI of a SPARQL endpoint (with SPARQL protocol support). With a specific endpoint in mind, then, it is possible to discover datasets that the endpoint provides, and look up information about those datasets (such as statistical information described with void:statItem). However, since such information would be defined in the context of a dataset, it would represent only a subset (possibly a proper subset) of the data provided by the endpoint. Status: VoiD is actively maintained, and has seen the most widespread adoption. It has a spec and proper RDFS/OWL schema[1], documenting the VoiD classes and properties. There is also a VoiD guide[2] describing how to use the features of the vocabulary. The VoiD wiki lists a number of projects that are using VoiD[3], including the OpenLink Virtuosos SPARQL endpoint for DBPedia. ===SADDLE=== Link: http://www.w3.org/2001/sw/DataAccess/proto-wd/saddle.html Primary focus: SADDLE is a vocabulary for SPARQL service descriptions. It has terms for describing a SPARQL endpoint, its URI, supported query languages, result formats, datasets (here identified by individual RDF files, but might also be suitable for linking to VoiD datasets), extension functions, and "vocabularies" (of the saddle:vocabulary term, the SADDLE webpage explains: this service invites queries that use predicates and classes that start with <...foaf/> (aka "in the foaf namespace") Coverage of "service descriptions": SADDLE contains many of the terms important for basic service descriptions. The core terms for describing supported languages, result formats, and extension functions seem particularly important as a point of extensibility for SPARQL as a spec. With these in place, it would be possible for implementations to converge on future language extensions in an interoperable way. The biggest area SADDLE does not cover is in describing the data provided by the endpoint (obvious link to VoiD here). It has a basic saddle:dataSet term to point to RDF data present in the underlying store, but has very little in the way of terms for describing commonly used terms, classes, or mroe general statistical properties of the data (with the aforementioned saddle:vocabulary term being the one exception). Such terms are important for work on federated queries, but may be outside the scope of the current DAWG work (with terms for statistical information able to develop outside the DAWG process but within a DAWG-supported framework for service descriptions). Status: SADDLE was described by Kendall Clark during the previous round of DAWG development work (at which time the group postponed the service descriptions issue). There is no formal spec or detailed description, only a brief introduction, a namespace declaration, and sample RDF which demonstrates use of terms in that namespace. ===DARQ=== Link: http://darq.sourceforge.net/ Primary focus: DARQ is a vocabulary for describing basic statistics of an endpoint's dataset and basic requirements for queries over the dataset (beyond what SPARQL can enforce). The DARQ vocabulary has terms for describing a SPARQL endpoint and its URI. Other terms describe total number of triples, the total number of triples with specific predicates and the selectivity of subjects (objects, resp.) of triples with bound predicates and objects (subjects). DARQ also has terms for describing basic requirements for graph patterns in a query. For example, an endpoint may be described as requiring a triple pattern with foaf:mbox as predicate and a bound object. Coverage of "service descriptions": DARQ's terms for describing triple counts and selectivities are useful for federated query work, but probably less immediately needed than the terms for extensions and features. The ability to describe statistical information about an endpoint's dataset (and not a specific subset in the VoiD "dataset" sense) is useful. Status: Like SADDLE, DARQ lacks a spec, but does have code that uses it. The webpage is a bit dated (listed as updated in 2006), but it was published at ESWC last year[4]. [1] http://rdfs.org/ns/void/ [2] http://rdfs.org/ns/void-guide [3] http://semanticweb.org/wiki/VoiD#Examples_in_the_Wild [4] http://www.eswc2008.org/final-pdfs-for-web-site/qpII-2.pdf
Received on Tuesday, 31 March 2009 13:49:21 UTC