Re: Metadata about single triples from David Booth on 2012-02-22 (public-lod@w3.org from February 2012)

From: David Booth <david@dbooth.org>
Date: Wed, 22 Feb 2012 13:39:44 -0500
To: carsten.kessler@uni-muenster.de
Cc: public-lod@w3.org, Chad Hendrix <hendrix@un.org>
Message-ID: <1329935984.6353.52290.camel@dbooth-laptop>
Carsten,

On Wed, 2012-02-22 at 12:02 +0100, Carsten Keßler wrote:
[ . . . ]
> The aspect we are currently working on is a metadata section that will
> include classes and properties to state who has reported a certain
> piece of information, when it was reported, whether it was approved
> (and at which level), and so forth. The current idea is to create
> named graphs that can be described by these metadata elements. 

FWIW, I absolutely agree that named graphs are the way to go.  There may
be some circumstances in which you'll have only one triple per graph,
but I would think that in most cases you would have a collection of
triples that resulted from a single approval action.

The big issue in my view is how to deal with all of the resulting named
graphs, since a typical query will need to use data from a number of
named graphs, but not all of them.  This means that you need the ability
to define and sets of named graphs and then query them.  The SPARQL 1.1
Service Description Language 
http://www.w3.org/TR/sparql11-service-description/ 
can be used to define sets of named graphs, but the query part is
harder.

In principle, SPARQL 1.1 allows you to specify any number of named
graphs to be used in a query, using the "FROM NAMED" syntax.  However,
this is not likely to work well when the number of named graphs gets
large.  It would be nice to be able to define a virtual graph as the
union of an arbitrary set of named graphs.  (Some SPARQL servers define
their "default graph" to be the union of all named graphs in the store,
and this is the basic idea, but we need to be able to more selectively
specify which named graphs should be included in a particular virtual
graph, and we need to be able to define multiple virtual graphs -- not
just one per SPARQL server.)  I described this need at the recent W3C
Linked Data workshop (see slide 21):
http://tinyurl.com/7fnlpmb

When I discussed this need with Andy Seaborne at the last SemTech
Conference in San Francisco, he mentioned that some RDF stores do
support this capability.  (I don't know off hand which ones.)  However,
it is not (yet) standardized.  I requested this feature in SPARQL 1.1
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011Jul/0017.html
but the working group does not seem inclined to include it "at this late
stage".  However, AFAICT the WG does not appear to have closed the issue
either:
http://www.w3.org/2009/sparql/wiki/CommentResponse:DB-5
So maybe there is still some hope in this feature being added.

In the meantime, you still need to build a working system using
available tools.  So I see two ways to go: (1) choose a SPARQL server
that does support virtual graphs; or (2) create a data production
pipeline that will dynamically merge the set of named graphs that you
want to query, into another graph.  These two approaches can also be
used in combination.  If you pursue the virtual graph approach, you'll
want to do some stress testing to find out whether the SPARQL server
really will perform the way you need.  Even if you go with the virtual
graph approach, I think it is likely that you'll also want to use the
data pipeline approach for some aspects, so that you can cache commonly
needed graph combinations.

If you use the data pipeline approach, the SPARQL 1.1 graph operations
can help (CREATE, DROP, COPY, MOVE, ADD).  I've also been working on a
data production pipeline framework ("RDF Pipeline") that will
automatically cache and refresh data data in a data production pipeline.
The ideas were described at my last SemTech SF talk:
http://dbooth.org/2011/pipeline/
and I'll be speaking again about this at the upcoming SemTech SF
conference.  An open source implementation has been started on google
code:
http://code.google.com/p/rdf-pipeline/
However, it is only at the proof-of-concept stage thus far.  I am hoping
it will reach the beta testing stage soon.  Anyone interested in using
it or helping with its development is invited to contact me.

Good luck on your project.  It sounds like you are doing great things,
and it sounds like you are on the right path technically.


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Wednesday, 22 February 2012 18:40:13 UTC