Re: Metadata about single triples from Carsten Keßler on 2012-02-23 (public-lod@w3.org from February 2012)

From: Carsten Keßler <carsten.kessler@uni-muenster.de>
Date: Thu, 23 Feb 2012 13:20:51 +0100
To: David Booth <david@dbooth.org>
Cc: public-lod@w3.org, Chad Hendrix <hendrix@un.org>
Message-ID: <CANqMnYOYP9GtfeLReUBTHTLibQe19KJL7whwB80E7=82aF=WUw@mail.gmail.com>

Hi David,

> The big issue in my view is how to deal with all of the resulting named
> graphs, since a typical query will need to use data from a number of
> named graphs, but not all of them.  This means that you need the ability
> to define and sets of named graphs and then query them.  The SPARQL 1.1
> Service Description Language
> http://www.w3.org/TR/sparql11-service-description/
> can be used to define sets of named graphs, but the query part is
> harder.

Yes, building those queries will be a bit trickier. However, those
SPARQL queries are something that we'll want to hide from the common
user anyway, so that does not really worry me so much,

> In principle, SPARQL 1.1 allows you to specify any number of named
> graphs to be used in a query, using the "FROM NAMED" syntax.  However,
> this is not likely to work well when the number of named graphs gets
> large.

What exactly do you mean by "not likely to work well"? Is there any
evidence that this is really the case?

> It would be nice to be able to define a virtual graph as the
> union of an arbitrary set of named graphs.  (Some SPARQL servers define
> their "default graph" to be the union of all named graphs in the store,
> and this is the basic idea, but we need to be able to more selectively
> specify which named graphs should be included in a particular virtual
> graph, and we need to be able to define multiple virtual graphs -- not
> just one per SPARQL server.)  I described this need at the recent W3C
> Linked Data workshop (see slide 21):
> http://tinyurl.com/7fnlpmb
>
> When I discussed this need with Andy Seaborne at the last SemTech
> Conference in San Francisco, he mentioned that some RDF stores do
> support this capability.  (I don't know off hand which ones.)

> However,
> it is not (yet) standardized.  I requested this feature in SPARQL 1.1
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011Jul/0017.html
> but the working group does not seem inclined to include it "at this late
> stage".  However, AFAICT the WG does not appear to have closed the issue
> either:
> http://www.w3.org/2009/sparql/wiki/CommentResponse:DB-5
> So maybe there is still some hope in this feature being added.
>
> In the meantime, you still need to build a working system using
> available tools.  So I see two ways to go: (1) choose a SPARQL server
> that does support virtual graphs; or (2) create a data production
> pipeline that will dynamically merge the set of named graphs that you
> want to query, into another graph.  These two approaches can also be
> used in combination.  If you pursue the virtual graph approach, you'll
> want to do some stress testing to find out whether the SPARQL server
> really will perform the way you need.

Yes, we'll need to do some stress testing anyway, no matter which way
we choose to go.

> Even if you go with the virtual
> graph approach, I think it is likely that you'll also want to use the
> data pipeline approach for some aspects, so that you can cache commonly
> needed graph combinations.
> If you use the data pipeline approach, the SPARQL 1.1 graph operations
> can help (CREATE, DROP, COPY, MOVE, ADD).  I've also been working on a
> data production pipeline framework ("RDF Pipeline") that will
> automatically cache and refresh data data in a data production pipeline.
> The ideas were described at my last SemTech SF talk:
> http://dbooth.org/2011/pipeline/
> and I'll be speaking again about this at the upcoming SemTech SF
> conference.  An open source implementation has been started on google
> code:
> http://code.google.com/p/rdf-pipeline/

I'll have a look at this, thanks!

Carsten

Received on Thursday, 23 February 2012 12:21:23 UTC