- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Thu, 31 Mar 2005 10:17:43 +0100
- To: Dan Connolly <connolly@w3.org>
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Dan Connolly wrote: > Here's another comment that I'm not quite sure what > to do with... > > Named- and background graphs, triples vs quads, trust, etc. > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/0097.html See also: http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2004Nov/0020.html > > It is perhaps a request that we reconsider the SOURCE issue... > http://www.w3.org/2001/sw/DataAccess/issues#SOURCE > > I'm not in a good position to advocate the WG's decision on that issue; > that was the first of N issues that I tried, without success, to get > the WG to postpone. (hmm... I'm not on record as abstaining on the > decision we took... I wonder why not...) > > The comment suggests "move the choice of arrangement into the > query language," which I don't think we considered. Perhaps that's > sufficient new information to re-open the issue. I read that as a request for FROM/WITH in the query language which we decided not to do. In another comments list message, they were pointed at the protocol spec: http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/0072.html > > The comment says it's a follow-up from discussion with Andy, so I doubt > he's in a position to defend the current design to the satisfaction > of the commentor; it seems he's already tried. The discussions were mainly about counting and what it means to count bNodes - the only mention of named graphs was for attaching probabilities to triples (statings, not statements, presumably). We haven't been talking much about datasets. > > DaveB, you were involved in some proposals that led up to the WG's > decision... you're more than welcome to give it a try. > > The comment is also perhaps input to our most long-standing open issue > fromUnionQuery. > http://www.w3.org/2001/sw/DataAccess/issues#fromUnionQuery Without FROM/WITH in the query language, I think this is a protocol issue (sorry Kendall!). > > I don't have any actions assigned about that one... I don't really > have any plan for addressing it. I'm all ears. > > ------------------------------------------------------------- I believe that the message from Arjohn (2005Mar/0097.html) does not take into account a difference between a closed system and a web system. There is no mention of the information publisher, just tool maker and the query. Arjohn wrote: > My main concern with the current spec is that it leaves the choice of > the arrangement for RDF Datasets up to the implementer of the query > engine. The choice of the arrangement is up to the person/organisation publishing the data, not the query engine. If the publisher is building a system where they wish to have all triples in the background graph, they will choose their query engine provider in one way; if they wish not to make any trust claim about the triples in the named graphs, they will choose their query engine another way. Arjohn wrote: > Keeping the > SPARQL spec as it is today can have disastrous effects on the > interoperability of SPARQL-aware tools. Interoperability is about the same dataset behaving the same. If one system automatically merges all the named graphs and one doesn't, it isn't the same dataset. If we have a query like: SELECT * WHERE { ?s ?p ?o } is it answered from information that the publisher asserts or is it something the publisher is just serving up should not depend on whether there are any named graphs in the dataset. In closed system, the application and the publisher are often the same or part of the same organisation. So saying "you must check the origin of all triples" can be applied. On the web, this is not true. The user/application/client can be unconnected to the publisher/server. By defaulting to accessing all triples, all queries are "caveat emptor" - no client can rely on trusting any publisher. > I see two possible ways to solve this issue: > 1/ standardize on a single arrangement (preferably the latter), or > 2/ move the choice of arrangement into the query language. We do have a single arrangement - the background graph is separate from the named graphs. The publisher is free to create a background graph based on their beliefs of who to trust and who not to. Maybe I should make one of the examples in rq23 have no background graph. 2/ places the choice with the application, not the information publisher. But it's the information publisher who is asserting the statements. > If the first arrangement of named and background graphs is considered, > then this query mechanism essentially is a mechanism for querying quads, > not triples! The graph name is no longer just an ignorable attribute of > triples, but is now an essential part of it. It appears to me that there > is a mismatch between RDF and SPARQL here. This seems key - the graph name is not ignorable. If a data provider publishes an RDF graph without further information, then that data provider is responsible for that information. That is the background graph (default knowledge base). The unnamed graph is being published without further information (it's just a graph on the web) and as such it is the data provider who is publishing it. By providing named graphs, we provide a way to export a graph without it going under the label of coming from the data provider. So the two choices are to require all information to be checked ("caveat emptor" - or trust until proven not to be trustworthy, default is to trust) or to not trust information until its provenance is verified (publishers are responsible for information they publish - applications add things into the space of things they trust, not remove them later). [[ To refer to a different area: The Guardian newspapers styleguide: http://www.guardian.co.uk/styleguide/article/0,5817,354123,00.html for a discussion on naming sources in the newspaper : the question is how can the reader evaluate who to believe without information about the source]] Automatically, putting all triples in the unnamed graph is defaulting to trusting them because SELECT * WHERE { ?s ?p ?o } is taking the default for triples. That query should work whether there are additional named graphs in the dataset (which the application may not be aware of) or not, and also whether named graphs are added to the dataset later. Having it vary by whether the publisher has choosen not to place the necessary information for checking in the dataset is very dangerous. It then comes down to whether the application writer is responsible for checking all triples (the legal principle of caveat emptor) or whether the publisher is responsible for the background graphs they publish. There is a further technical issue as well: SELECT * WHERE { :foo :p ?x . :foo :q ?v } may find solutions but if the first triple pattern matches only in one graph and the second triple pattern only matches in a second graph, then there is no graph that matches the full pattern and you can't ask where it came from yet the query returns variable bindings. Why is the combined pattern a graph match? Because the publisher put all the triples together. Andy
Received on Thursday, 31 March 2005 09:18:36 UTC