- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Fri, 28 Jan 2005 19:48:58 +0000
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
rq23 defines the term "RDF dataset" as being the thing that a query is against. I chose the term as it isn't "knowledge base" or other term that has been widely used. There is never a perfect choice of word/phrase but I hope this terminology gives us the chance to define what SPARQL does with less baggage. rq23 provides both a formal description of SPARQL/query and an informal description through examples. They should agree although the examples can never be complete. Please be warned that the editors' draft (v1.181) is not fully up to date with the F2F but I have started by restructuring and put the definition of "RDF Dataset" in a section by itself. http://www.w3.org/2001/sw/DataAccess/rq23/#rdfDataset ------------------------------------------------ An RDF dataset is defined as a set of background graph and a number of named graphs. RDF dataset = { G , zero or more (<Ui>, Gi) } G - background graph Gi - an RDF graph <Ui> - URI reference ------------------------------------------------ that is, a set of a graph and a number of pairs of URIref and graph. There may be no named graphs. The background graph may be empty. "background graph" was the agreed term from the F2F. It was called the unnamed graph before. In querying an RDF dataset, the pattern "(s p o)" accesses G, the background graph, the form "GRAPH ?g (s p o)" accesses the (<Ui>, Gi)'s, and the form "GRAPH <U> (s p o)" applies the pattern to just the graph with name <U>. ("GRAPH" was "SOURCE"). This is described in: http://www.w3.org/2001/sw/DataAccess/rq23/#queryDataset There is a section after that to define how the query itself can describe the dataset. That section is what I think we are mainly discussing. It uses the keywords from the F2F but I regard it as unfinished - I will not cover this in this message in order to concentrate on the idea of "RDF dataset". In the idea of "RDF dataset", there is no assumption about the setup of the dataset. G may include the union of the Gi. G may be disjoint from any Gi or subset of Gi. Both are allowed - the definition of RDF dataset does not imply one way over another; it is more general which we may be choose to restrict. The concept "RDF dataset" does not say how the dataset is built. Therefore, it should be able to express all the various different use cases we have. I have avoided use of the terms "trusted" and "untrusted" to concentrate on naming. As SteveH points out, an approach to make these orthogonal is to have a flag associated with graphs. Examples of RDF datasets: Suppose we have graphs A and B: These are some possible datasets: there are different datasets made up from A and B. -- Dataset example 1: A single background graph is the RDF merge of A and B. (I use "RDF merge" to be clear that the background graph is accessed as an RDF graph. It also makes it clear what happens about bNodes.) The RDF dataset is { merge[A,B] } with no pairs of name/graph. -- Dataset example 2: A background graph that is the merge of A and B, together with access to the graphs by names URI <u1> and URI <u2> An implementation could avoid copies but exactly how will depend on the implementation. The RDF dataset is { merge[A,B], (<u1>, A), (<u2>, B) } This is the example of (s p o) accessing all triples, and further having access to the individual named graphs. -- Dataset example 3: The background graph has some provenance information, graph P, about A and B. This is the example in rq23 "GRAPH and a single, unnamed graph": http://www.w3.org/2001/sw/DataAccess/rq23/#sourcePlainGraph where A and B are the same URL read at different times, obtaining different graphs. The RDF dataset is { P, (<u1>, A), (<u2>, B) } and <u1>, <u2> are internal names : P records the mapping of internal names to original location (the same location in the example). -- Dataset example 4: The background graph is empty. A plain "(s p o)" does not match - the background graph has no triples. The RDF dataset is { empty, (<u1>, A) (<u2>, B) }. or omit the background graph { (<u1>, A) (<u2>, B) } -- "(s p o)" does not match in either case. -- Dataset example 5: The background graph contains A, but not B. Both A and B are available as named graphs. The RDF dataset is { A, (<u1>, A) (<u2>, B) }. -- Dataset example 6: The graph A is known by <u1> and also by <u3> and is in the background graph. B is not in the dataset. The RDF dataset is { A, (<u1>, A) (<u3>, A) }. -- Dataset example 7: If we have a third graph C, then we can have the RDF merge of A and C, give it a name <u4> and still retain access to just A: { empty, (<u4>, merge[A, C]), (<u1>, A) , (<u2>, B) } End of examples. There is also the matter how, if at all, the query itself can describe a dataset. I think this is the main area of difference and there is subtly different terminology so I have avoided that matter here and await people's responses to the idea of "RDF dataset". If we have a common definition of "RDF dataset", then we can write the various options out for query keywords in terms of the effect on the RDF dataset for the query. Does this definition of RDF dataset, and the relation to graph patterns, form a basis for defining how a query might define a dataset within the SPARQL syntax? Is this definition of RDF dataset missing anything? Andy
Received on Friday, 28 January 2005 19:49:13 UTC