- From: Andreas Langegger <al@jku.at>
- Date: Fri, 10 Jul 2009 19:03:13 +0200
- To: public-rdf-dawg-comments@w3.org, public-lod-request@w3.org
Hello, (CCed to public-lod, I suggest to continue the discussion in the DAWG list) I would like to kick off a general discussion regarding RDF dataset meta data (e.g. voiD [1]), SPARQL endpoint descriptions [2] and the possible integration of statistics (e.g. RDFStats [3]) to support Semantic Web middleware/applications. The general opinion regarding SPARQL extensions for the next REC seems to be staying as simple as possible (e.g. recent discussion regarding fulltext search [4]). I think, while the standard should remain simple (SPARQL-compliant implementations should be possible with little effort and small size), it would be very useful to provide at least extension points that can be further standardized by the community. For instance, full-text search capability, aggregates, initial bindings, etc. could be announced by the endpoint such as here: http://kasei.us/sparql?about=1 Similarly, it should be possible to announce the availability of voiD metadata and statistics. While [2] is targeted to SPARQL endpoints only including non-HTTP protocols such as ODBC (Virtuoso), voiD [1] is targeted to LOD datasets in general which may be available in different forms (single RDF documents, data dumps, RDFa, and SPARQL endpoints. I think it is very important to find a best-practice solution which integrates voiD, future SPARQL endpoint descriptions, and a consensus on statistics (possibly based on SCOVO [5] - which should be improved, see [6]). Main questions include: Q1) How to provide/consume endpoint descriptions in general? The authors of voiD [1] suggest back-linking from resources (documents, dumps, etc.) to a voiD dataset. In case of a SPARQL endpoint they suggest discovery via sitemap.xml ([1] 5.2.) Problem: Only works via HTTP, only works for 1 endpoint per domain. Sub-questions: a) Should non-HTTP protocols be supported? b) Should multiple SPARQL endpoints per domain be possible? - In my opinion it should. Other suggestions based on [2]: 1. SPARQL extension, like "DESCRIBE SELF" (by AndyS) 1.1. could return a resolvale URI of the void:Dataset 1.2. could return the URI of a named graph to query (works with non-HTTP protocols) 2. HTTP header, e.g. X-endpoint-description: http://kasei.us/sparql?about=1 3. new protocol operation: HTTP OPTIONS for returning the description 4. Named graph 4.1. graph IRI retrieved with DESCRIBE SELF (see 1.2.) 4.2. graph IRI == SPARQL endpoint URI Q2) Which metadata (w/o statistics) to include? 1. Is there any problem with voiD or what would be missing in voiD for SPARQL endpoints? Q3) Which statistics to include? 1. simple counts for resources in total, per class / untyped 2. number of documents in case of data dumps 3. selectivities for properties (untyped and with given class) 4. histograms for property values (untyped and with given class, can be generated with [3]) 5. Is SCOVO sufficient? 5.1. A bit verbose: dimensions should be simplyfied [6]. 5.2. Encoding histograms not trivial: SCOVO min/max only useful for integers and nominal scales, RDFStats uses base64-encoded literals Q4) Some metadata are SPARQL-endpoint specific and irrelevant for datasets in general (i.e. collections/dumps/etc) - but common metadata should be reused from voiD for maximum interoperability. We should define a separate vocabulary for SPARQL endpoint descriptions, but re- use common properties from voiD. Which metadata is SPARQL specific? a) full-text search... Since this may become a larger mindmap, it would be better to work it out in a wiki. Any suggestions where to continue? Regards, AndyL [1] http://rdfs.org/ns/void-guide [2] http://www.w3.org/2009/sparql/wiki/Feature:ServiceDescriptions [3] http://rdfstats.sourceforge.net [4] http://www.nabble.com/Free-text-search-and-SPARQL-New-Features-and-Rationale-draft-to24324606.html [5] http://purl.org/NET/scovo# [6] http://code.google.com/p/void-impl/issues/detail?id=18 http://www.langegger.at ---------------------------------------------------------------------- Dipl.-Ing.(FH) Andreas Langegger FAW - Institute for Application-oriented Knowledge Processing Johannes Kepler University Linz A-4040 Linz, Altenberger Straße 69
Received on Friday, 10 July 2009 17:04:02 UTC