- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 27 Jan 2005 10:30:25 -0500
- To: public-rdf-dawg@w3.org
- Message-ID: <20050127153025.GA28735@w3.org>
You may want COFFEE before plowing through this mail on LOAD, FROM and GRAPH. We have decided as a WG that we need access to provenance information in the knowlege base (KB). Day 1 of the Helsinki (Espoo) face to face ended with a bit of education and a debate on how to use that provenance information. Here are, I believe, both sides of the argument. I favor the SINGLE KB option outlined first: == SINGLE KB == The simplest way to model provenance is to tag triples in the KB with their origin. Once can do this with a KB containing potentially overlapping sets of triples [FORMULAS], a single set of quads [QUADS], or a set of triples with a provenance list associated with each triple. Regardless of the implementation, there is a single KB that knows everything the system know. (There are probably other practical ways to do this as well.) A query like DEFAULT TRUST ------------- PREFIX f : <http://accounting.example/schema#> SELECT ?payee, ?amount, ?refStr LOAD <http://accountant.example/bobsBills> <http://joe.example/accounts/Bob> WHERE { (?check f:payTo ?payee) (?check f:amount ?amount) (?check f:reference ?refStr) } reads bobsBills and accounts/Bob into the KB where it is available for matching the graphPattern (?check f:payTo ?payee) (?check f:amount ?amount) (?check f:reference ?refStr) . Per discussions earlier in this WG, the graphPattern still matches if (?check f:payTo ?payee) and (?check f:amount ?payee) come from bobsBills and (?check f:reference ?refStr) comes from accounts/Bob. If one only trusts statements from the accountant, one can phrase the question as SINGLE TRUST DOMAIN ------------------- PREFIX f : <http://accounting.example/schema#> PREFIX a : <http://accountant.example/> PREFIX h : <http://joe.example/accounts> SELECT ?payee, ?amount, ?refStr LOAD a:bobsBills h:Bob WHERE { GRAPH d:bobsBills { (?check f:payTo ?payee) (?check f:amount ?amount) (?check f:reference ?refStr) } If, as is more likely, Bob trusts his accountant to write the name and amount on the checks but lets the Joe specify what is in the memo field, he can write the query to reflect that predicated trust: PREDICATED TRUST ---------------- PREFIX f : <http://accounting.example/schema#> PREFIX a : <http://accountant.example/> PREFIX h : <http://joe.example/accounts> SELECT ?payee, ?amount, ?refStr LOAD a:bobsBills h:Bob WHERE { GRAPH d:bobsBills { (?check f:payTo ?payee) (?check f:amount ?amount) } GRAPH j:Bob { (?check f:reference ?refStr) } SIDE EFFECTS ------------ In systems that continually learn (Ontaria, the Annotea database, various Googles of the semantic web), the notion of a single trust domain is dangerous. Without controlling what sort of data is in the KB, one shouldn't let it write checks without you checking the data. The users of the KBs listed above have practical queries "give me the annotations for a page X, or, tell me about schema Y" that are well served by "trusting" everyone for the application's notion of trust. Users can rely on predicated trust when they require more security. == MULTIPLE KBS == TimBL raised the default trust issue [TIMBL]. The basic issue was that the import of a resource into the KB implied the trust in the assertions from that document. We can presume one would not import a document with no potentially intersting statements. Multiple KBs provides a way to query a subset of the statements in a resource without having other statements in that resource give us potentially misleading information. For instance, a semantic google query of documents about X should not cause us to believe everything we read on the net. This is accomplished by having a verb FROM <resource> that imports data that is only matchable by graphPatterns that explicilty identify that resource. Thus FROM h.Bob WHERE { (?check f:reference ?refStr) } will not match any statements from h.Bob. Only FROM h.Bob WHERE { GRAPH h:Bob (?check f:reference ?refStr) } will match those statements. DEFAULT TRUST ------------- Without using FROM and explicit GRAPH constraints, queries behave the same as in the single KB model. The default trust query above will still match (?check f:payTo ?payee), (?check f:amount ?payee) and (?check f:reference ?refStr) coming from any combination of a:bobsBills and h:Bob . SINGLE TRUST DOMAIN ------------------- PREFIX f : <http://accounting.example/schema#> PREFIX a : <http://accountant.example/> PREFIX h : <http://joe.example/accounts> SELECT ?payee, ?amount, ?refStr LOAD a:bobsBills FROM h:Bob WHERE { (?check f:payTo ?payee) (?check f:amount ?amount) (?check f:reference ?refStr) } This simplifies trusting a single document. In addition, it makes it possible to trust the interaction between triples in a LOAD'd document with the triples in the default KB, while still not trusting triples from FROM'd documents. PREDICATED TRUST ---------------- Partial trust of a set of documents behave the same was in either approach. SIDE EFFECTS ------------ One can't rely on the single trust domain model if the database allows side effects. In fact, the user must specifically know that nothing in the database could be harmful. One query could LOAD <X> into the database and a subsequent query could use FROM <X>, expecting the data from <X> to *not* be in the database. == COMMENTS == The Multiple KBs creates alternate KBs, or, if you will, creates a subset of the KB which graphPatterns without a GRAPH target can match. (The difference is just a matter of what you call the KB.) Mulitple DB++ The single trust domain case is terser and more expressive in the multiple DB. In order to access the interaction between LOAD'd triples and the default DB, the service provider would have to provide a GRAPH name for those triples. Multiple DB-- It is either impossible to query interaction between triples from FROM'd documents, or it is at least ill-defined. Should PREFIX f : <http://accounting.example/schema#> PREFIX a : <http://accountant.example/> PREFIX h : <http://joe.example/accounts> SELECT ?payee, ?amount, ?refStr FROM a:bobsBills h:Bob WHERE { GRAPH ?d { (?check f:payTo ?payee) (?check f:amount ?amount) (?check f:reference ?refStr) } (?d URI= a:bobsBills || ?d URI= h:Bob) ask the graphPattern of both documents, or of the aggregation of those documents. If the former, users will have to be aware that the interaction behavior of FROM is different and less expressive. If that idiom forces the aggregation of the two documents, the burden on implementations is much higher as they need to both detect the aggregation patterns and create an arbitrary number of aggregate KBs (rather than have a single KB and enforce GRAPH constraints as simple row restrictions (also inscrutably called "SELECT" in relational algebra). In short, I don't think that multiple DB approach is worth the impelementation/specification burden. I doubt that the aggregation of FROM'd graphs is the only screw case I can come up with. [FORMULAS] http://www.w3.org/2001/12/attributions/#formulas [QUADS] http://www.w3.org/2001/12/attributions/#quads [TIMBL] http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2004Nov/0020.html -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Thursday, 27 January 2005 15:30:26 UTC