- From: Andreas Langegger <al@jku.at>
- Date: Fri, 10 Jul 2009 19:03:13 +0200
- To: public-rdf-dawg-comments@w3.org, public-lod-request@w3.org
Hello,
(CCed to public-lod, I suggest to continue the discussion in the DAWG
list)
I would like to kick off a general discussion regarding RDF dataset
meta data (e.g. voiD [1]), SPARQL endpoint descriptions [2] and the
possible integration of statistics (e.g. RDFStats [3]) to support
Semantic Web middleware/applications. The general opinion regarding
SPARQL extensions for the next REC seems to be staying as simple as
possible (e.g. recent discussion regarding fulltext search [4]). I
think, while the standard should remain simple (SPARQL-compliant
implementations should be possible with little effort and small size),
it would be very useful to provide at least extension points that can
be further standardized by the community. For instance, full-text
search capability, aggregates, initial bindings, etc. could be
announced by the endpoint such as here: http://kasei.us/sparql?about=1
Similarly, it should be possible to announce the availability of voiD
metadata and statistics. While [2] is targeted to SPARQL endpoints
only including non-HTTP protocols such as ODBC (Virtuoso), voiD [1] is
targeted to LOD datasets in general which may be available in
different forms (single RDF documents, data dumps, RDFa, and SPARQL
endpoints. I think it is very important to find a best-practice
solution which integrates voiD, future SPARQL endpoint descriptions,
and a consensus on statistics (possibly based on SCOVO [5] - which
should be improved, see [6]).
Main questions include:
Q1) How to provide/consume endpoint descriptions in general?
The authors of voiD [1] suggest back-linking from resources
(documents, dumps, etc.) to a voiD dataset.
In case of a SPARQL endpoint they suggest discovery via
sitemap.xml ([1] 5.2.)
Problem:
Only works via HTTP, only works for 1 endpoint per domain.
Sub-questions:
a) Should non-HTTP protocols be supported?
b) Should multiple SPARQL endpoints per domain be possible? -
In my opinion it should.
Other suggestions based on [2]:
1. SPARQL extension, like "DESCRIBE SELF" (by AndyS)
1.1. could return a resolvale URI of the void:Dataset
1.2. could return the URI of a named graph to query (works
with non-HTTP protocols)
2. HTTP header, e.g. X-endpoint-description: http://kasei.us/sparql?about=1
3. new protocol operation: HTTP OPTIONS for returning the
description
4. Named graph
4.1. graph IRI retrieved with DESCRIBE SELF (see 1.2.)
4.2. graph IRI == SPARQL endpoint URI
Q2) Which metadata (w/o statistics) to include?
1. Is there any problem with voiD or what would be missing in
voiD for SPARQL endpoints?
Q3) Which statistics to include?
1. simple counts for resources in total, per class / untyped
2. number of documents in case of data dumps
3. selectivities for properties (untyped and with given class)
4. histograms for property values (untyped and with given class,
can be generated with [3])
5. Is SCOVO sufficient?
5.1. A bit verbose: dimensions should be simplyfied [6].
5.2. Encoding histograms not trivial: SCOVO min/max only
useful for integers and nominal scales, RDFStats uses base64-encoded
literals
Q4) Some metadata are SPARQL-endpoint specific and irrelevant for
datasets in general (i.e. collections/dumps/etc) - but common metadata
should be reused from voiD for maximum interoperability. We should
define a separate vocabulary for SPARQL endpoint descriptions, but re-
use common properties from voiD.
Which metadata is SPARQL specific?
a) full-text search...
Since this may become a larger mindmap, it would be better to work it
out in a wiki. Any suggestions where to continue?
Regards,
AndyL
[1] http://rdfs.org/ns/void-guide
[2] http://www.w3.org/2009/sparql/wiki/Feature:ServiceDescriptions
[3] http://rdfstats.sourceforge.net
[4] http://www.nabble.com/Free-text-search-and-SPARQL-New-Features-and-Rationale-draft-to24324606.html
[5] http://purl.org/NET/scovo#
[6] http://code.google.com/p/void-impl/issues/detail?id=18
http://www.langegger.at
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
FAW - Institute for Application-oriented Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69
Received on Friday, 10 July 2009 17:04:02 UTC