- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 29 Nov 2004 09:53:19 +0000
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
-- Situation This note describes one way in which we might change the SPARQL query spec to handle untrusted graphs. It does not provide everything - graphs can not be loaded mid-query based on earlier parts of the patten matching process. Queries only execute in an environment set at the start of query execution. See also: UC&R: http://www.w3.org/2001/sw/DataAccess/UseCases#d4.2 There are some straw poll questions at the end of this message. -- Conceptual changes A query is executed against a single unnamed graph (the default graph) and a collection of named graphs. The change is that there is no merge relationship between the default graph and the named graphs. Called this default graph + named graphs, an "RDF dataset". UC&R Design Objective 4.2 points 1 & 2 are covered. The 3rd point suggests a single (trusted) interpretation when specifying multiple graphs (while it does not actually say that the sources need be available in the aggregation/merge case, I think this was the intent). The test framework needs changing to express this but it needs changing anyway. SOURCE is unchanged : much of this message is examples of changes to FROM. -- Usages There are two classes of use I have in mind: query against existing data sets and querying against ad hoc datasets. Large data providers may well have a fixed dataset and it is implicit in using the query service. The dataset is not named in the query - no FROM needed. The only possible use of FROM is restriction within the graphs already available at the service point and even this is optional. The other case is more query-as-script: the dataset is built for the query and so there is a need for some construction mechanism to describe it. -- SOURCE Changes None. SOURCE works as it always does - it accesses labels of graphs. It's the thing being accessed that has changed. The examples in rq23 9.1 and 9.2 need to change though. The current example in 9.3 looks OK. -- FROM Changes If we want syntax for construction of the dataset, then we have to consider placing graphs in the dataset and defining the default graph. Version 1: Uses new keywords to define the dataset. FROM for adding to the default graph GRAPH to add a (named) graph Version 2: Uses a compact syntax in the FROM clause. The design should work well in the simple cases, and be tolerable for more complex examples (and it need not cover all cases) - I'm assuming that the more complicated setups would be the ones where the dataset is passed in from the query context, and less often defined by the query. In other words, if your dataset definitions are longer than your query patterns, it may be time for a redesign :-) Use of a URI means "the graph associated with" - not necessarily "load current"; it does not imply access at query time. It may be a restriction over the graphs in the query context and causes an error if it can't be satisfied. -- Keyword syntax for datasets Examples: # Ex1 - put the graph identified by <u1> in the default graph FROM <u1> # Ex2 - put the graphs identified by <u1> and <u2> in the default graph FROM <u1> <u2> That is, merge them into the default graph. Unnamed. Other RDF triples may be present in the default graph. # Ex3 - use graphs associated with <w1> and <w2> # as named graphs with names <w1> and <w2> GRAPH <w1> <w2> -- Compact syntax for datasets Compact representation: a dataset is FROM <u1> (<w1> <w2>) is the same as: FROM <u1> GRAPH <w1> <w2> I think this is rather cryptic when URIs are long and prefer (mildly) the keyword form. -- What's lost In the trusted graph (the default graph) there is no tracking of where triples came from. The data provider should publish a dataset and let the client decide whether they trust the named graphs or not. If the publisher is publishing the believed aggregation, it should put its name on it. -- Protocol The protocol will need to reflect the construction of datasets or leave handling of it to the query language. There isn't a protocol proper in local use so it would be useful in the query language. This suggests a service oriented protocol paradigm. Either the dataset is implicit because the request was directed to a particular service instance, or the query language expresses the dataset and the service offers various degrees of dataset formation. -- Summary This note outlines a possible solution so that we are deciding between two proposals: the current situation with merge and a change to no default merge. (please express your opinion here) +1 from Andy to change to no default merge Within that we can choose to drop FROM and protocol graph naming. At the moment, I think we should explore a mechanism for dataset building (FROM and in the protocol). (please express your opinion here) +1 from Andy to at least attempt a design here. If it does not look like converging, by LC, we drop it. -- Next steps If this looks plausible, based on WG members opinions (and anyone else reading this), then we can start with alternative versions of rq23 sections 8 and 9 and work on test cases. Both versions of sections 8 & 9 could be published next working draft, then we pick one and go with that.
Received on Monday, 29 November 2004 09:53:33 UTC