- From: Carsten Keßler <carsten.kessler@uni-muenster.de>
- Date: Thu, 23 Feb 2012 13:20:51 +0100
- To: David Booth <david@dbooth.org>
- Cc: public-lod@w3.org, Chad Hendrix <hendrix@un.org>
Hi David, > The big issue in my view is how to deal with all of the resulting named > graphs, since a typical query will need to use data from a number of > named graphs, but not all of them. This means that you need the ability > to define and sets of named graphs and then query them. The SPARQL 1.1 > Service Description Language > http://www.w3.org/TR/sparql11-service-description/ > can be used to define sets of named graphs, but the query part is > harder. Yes, building those queries will be a bit trickier. However, those SPARQL queries are something that we'll want to hide from the common user anyway, so that does not really worry me so much, > In principle, SPARQL 1.1 allows you to specify any number of named > graphs to be used in a query, using the "FROM NAMED" syntax. However, > this is not likely to work well when the number of named graphs gets > large. What exactly do you mean by "not likely to work well"? Is there any evidence that this is really the case? > It would be nice to be able to define a virtual graph as the > union of an arbitrary set of named graphs. (Some SPARQL servers define > their "default graph" to be the union of all named graphs in the store, > and this is the basic idea, but we need to be able to more selectively > specify which named graphs should be included in a particular virtual > graph, and we need to be able to define multiple virtual graphs -- not > just one per SPARQL server.) I described this need at the recent W3C > Linked Data workshop (see slide 21): > http://tinyurl.com/7fnlpmb > > When I discussed this need with Andy Seaborne at the last SemTech > Conference in San Francisco, he mentioned that some RDF stores do > support this capability. (I don't know off hand which ones.) > However, > it is not (yet) standardized. I requested this feature in SPARQL 1.1 > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011Jul/0017.html > but the working group does not seem inclined to include it "at this late > stage". However, AFAICT the WG does not appear to have closed the issue > either: > http://www.w3.org/2009/sparql/wiki/CommentResponse:DB-5 > So maybe there is still some hope in this feature being added. > > In the meantime, you still need to build a working system using > available tools. So I see two ways to go: (1) choose a SPARQL server > that does support virtual graphs; or (2) create a data production > pipeline that will dynamically merge the set of named graphs that you > want to query, into another graph. These two approaches can also be > used in combination. If you pursue the virtual graph approach, you'll > want to do some stress testing to find out whether the SPARQL server > really will perform the way you need. Yes, we'll need to do some stress testing anyway, no matter which way we choose to go. > Even if you go with the virtual > graph approach, I think it is likely that you'll also want to use the > data pipeline approach for some aspects, so that you can cache commonly > needed graph combinations. > If you use the data pipeline approach, the SPARQL 1.1 graph operations > can help (CREATE, DROP, COPY, MOVE, ADD). I've also been working on a > data production pipeline framework ("RDF Pipeline") that will > automatically cache and refresh data data in a data production pipeline. > The ideas were described at my last SemTech SF talk: > http://dbooth.org/2011/pipeline/ > and I'll be speaking again about this at the upcoming SemTech SF > conference. An open source implementation has been started on google > code: > http://code.google.com/p/rdf-pipeline/ I'll have a look at this, thanks! Carsten
Received on Thursday, 23 February 2012 12:21:23 UTC