Re: LOAD, FROM, GRAPH and COFFEE from Steve Harris on 2005-01-27 (public-rdf-dawg@w3.org from January to March 2005)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Thu, 27 Jan 2005 20:25:47 +0000
To: DAWG public list <public-rdf-dawg@w3.org>
Message-ID: <20050127202547.GA27719@login.ecs.soton.ac.uk>
Nice, summary.

There is a 3rd option, that I mentioned briefly at the FTF:

EXPLICIT TRUST

Graphs must be specified trusted/untrusted at load time, graphs may or
may not be labelled regardless of thier trust status* (personally I dont
see the need for unlabelled graphs, but its not important).

* I think Andy argued against untrusted, unlabelled graphs - that seems
  reasonable.

By default query results will only be returned for trusted graphs, the
UNTRUSTED keyword may be added to include untrusted results, as in:

    SELECT ?check ?refStr
    FROM UNTRUSTED h.Bob
    WHERE UNTRUSTED { (?check f:reference ?refStr) }

or

    SELECT ?check ?refStr
    FROM TRUSTED h.Bob
    WHERE (?check f:reference ?refStr)

which is equiv. to

    SELECT ?check ?refStr
    FROM TRUSTED h.Bob
    WHERE UNTRUSTED { TRUSTED { (?check f:reference ?refStr) } }

This has the advantage of the multi KB system, in that you can have a
default trust level where appropriate, but does so without conflating the
graph indentification issue. The dowside is that it adds another two
keywords to the language, though we can loose one of FROM/LOAD.

FROM could have some default behaviour of (UN)TRUSTED to make things less
verbose, but I dont know what the default should be.


I'm not sure of the value of untrusted graphs in the general case, it
seems somewhat simplistic, but I can imagine particular cases where its
convienient.

- Steve

On Thu, Jan 27, 2005 at 10:30:25AM -0500, Eric Prud'hommeaux wrote:
> You may want COFFEE before plowing through this mail on LOAD, FROM and
> GRAPH.
> 
> We have decided as a WG that we need access to provenance information
> in the knowlege base (KB). Day 1 of the Helsinki (Espoo) face to face
> ended with a bit of education and a debate on how to use that
> provenance information. Here are, I believe, both sides of the
> argument. I favor the SINGLE KB option outlined first:
> 
> 
> == SINGLE KB ==
> 
> The simplest way to model provenance is to tag triples in the KB with
> their origin. Once can do this with a KB containing potentially
> overlapping sets of triples [FORMULAS], a single set of quads [QUADS],
> or a set of triples with a provenance list associated with each
> triple. Regardless of the implementation, there is a single KB that
> knows everything the system know. (There are probably other practical
> ways to do this as well.) A query like
> 
> 
>   DEFAULT TRUST
>   -------------
>   PREFIX f : <http://accounting.example/schema#>
>   SELECT ?payee, ?amount, ?refStr
>     LOAD <http://accountant.example/bobsBills>
>          <http://joe.example/accounts/Bob>
>    WHERE { (?check f:payTo ?payee)
> 	   (?check f:amount ?amount)
> 	   (?check f:reference ?refStr) }
> 
> reads bobsBills and accounts/Bob into the KB where it is available for
> matching the graphPattern
>   (?check f:payTo ?payee)
>   (?check f:amount ?amount)
>   (?check f:reference ?refStr) .
> 
> Per discussions earlier in this WG, the graphPattern still matches
> if (?check f:payTo ?payee) and (?check f:amount ?payee) come from
> bobsBills and (?check f:reference ?refStr) comes from accounts/Bob.
> 
> If one only trusts statements from the accountant, one can phrase the
> question as
> 
> 
>   SINGLE TRUST DOMAIN
>   -------------------
>   PREFIX f : <http://accounting.example/schema#>
>   PREFIX a : <http://accountant.example/>
>   PREFIX h : <http://joe.example/accounts>
>   SELECT ?payee, ?amount, ?refStr
>     LOAD a:bobsBills h:Bob
>    WHERE { GRAPH d:bobsBills { (?check f:payTo ?payee)
> 			       (?check f:amount ?amount)
> 			       (?check f:reference ?refStr) }
> 
> If, as is more likely, Bob trusts his accountant to write the name and
> amount on the checks but lets the Joe specify what is in the memo
> field, he can write the query to reflect that predicated trust:
> 
> 
>   PREDICATED TRUST
>   ----------------
>   PREFIX f : <http://accounting.example/schema#>
>   PREFIX a : <http://accountant.example/>
>   PREFIX h : <http://joe.example/accounts>
>   SELECT ?payee, ?amount, ?refStr
>     LOAD a:bobsBills h:Bob
>    WHERE { GRAPH d:bobsBills { (?check f:payTo ?payee)
> 			       (?check f:amount ?amount) }
> 	   GRAPH j:Bob       { (?check f:reference ?refStr) }
> 
> 
>   SIDE EFFECTS
>   ------------
> In systems that continually learn (Ontaria, the Annotea database,
> various Googles of the semantic web), the notion of a single trust
> domain is dangerous. Without controlling what sort of data is in the
> KB, one shouldn't let it write checks without you checking the
> data. The users of the KBs listed above have practical queries "give
> me the annotations for a page X, or, tell me about schema Y" that are
> well served by "trusting" everyone for the application's notion of
> trust. Users can rely on predicated trust when they require more
> security.
> 
> 
> 
> == MULTIPLE KBS ==
> 
> TimBL raised the default trust issue [TIMBL]. The basic issue was that
> the import of a resource into the KB implied the trust in the
> assertions from that document. We can presume one would not import a
> document with no potentially intersting statements. Multiple KBs
> provides a way to query a subset of the statements in a resource
> without having other statements in that resource give us potentially
> misleading information. For instance, a semantic google query of
> documents about X should not cause us to believe everything we read on
> the net.
> 
> This is accomplished by having a verb FROM <resource> that imports
> data that is only matchable by graphPatterns that explicilty identify
> that resource. Thus
>     FROM h.Bob
>    WHERE { (?check f:reference ?refStr) }
> will not match any statements from h.Bob. Only 
>     FROM h.Bob
>    WHERE { GRAPH h:Bob (?check f:reference ?refStr) }
> will match those statements.
> 
> 
>   DEFAULT TRUST
>   -------------
> Without using FROM and explicit GRAPH constraints, queries behave the
> same as in the single KB model. The default trust query above will
> still match (?check f:payTo ?payee), (?check f:amount ?payee) and
> (?check f:reference ?refStr) coming from any combination of
> a:bobsBills and h:Bob .
> 
> 
>   SINGLE TRUST DOMAIN
>   -------------------
>   PREFIX f : <http://accounting.example/schema#>
>   PREFIX a : <http://accountant.example/>
>   PREFIX h : <http://joe.example/accounts>
>   SELECT ?payee, ?amount, ?refStr
>     LOAD a:bobsBills
>     FROM h:Bob
>    WHERE { (?check f:payTo ?payee)
> 	   (?check f:amount ?amount)
> 	   (?check f:reference ?refStr) }
> 
> This simplifies trusting a single document. In addition, it makes it
> possible to trust the interaction between triples in a LOAD'd document
> with the triples in the default KB, while still not trusting triples
> from FROM'd documents.
> 
> 
>   PREDICATED TRUST
>   ----------------
> Partial trust of a set of documents behave the same was in either
> approach.
> 
> 
>   SIDE EFFECTS
>   ------------
> 
> One can't rely on the single trust domain model if the database allows
> side effects. In fact, the user must specifically know that nothing
> in the database could be harmful. One query could LOAD <X> into
> the database and a subsequent query could use FROM <X>, expecting the
> data from <X> to *not* be in the database.
> 
> 
> 
> == COMMENTS ==
> 
> The Multiple KBs creates alternate KBs, or, if you will, creates a
> subset of the KB which graphPatterns without a GRAPH target can match.
> (The difference is just a matter of what you call the KB.)
> 
> Mulitple DB++
> The single trust domain case is terser and more expressive in the
> multiple DB. In order to access the interaction between LOAD'd triples
> and the default DB, the service provider would have to provide a GRAPH
> name for those triples.
> 
> Multiple DB--
> 
> It is either impossible to query interaction between triples from
> FROM'd documents, or it is at least ill-defined. Should 
>   PREFIX f : <http://accounting.example/schema#>
>   PREFIX a : <http://accountant.example/>
>   PREFIX h : <http://joe.example/accounts>
>   SELECT ?payee, ?amount, ?refStr
>     FROM a:bobsBills h:Bob
>    WHERE { GRAPH ?d { (?check f:payTo ?payee)
> 		      (?check f:amount ?amount)
> 		      (?check f:reference ?refStr) }
>     (?d URI= a:bobsBills || ?d URI= h:Bob)
> ask the graphPattern of both documents, or of the aggregation of
> those documents. If the former, users will have to be aware that
> the interaction behavior of FROM is different and less expressive.
> If that idiom forces the aggregation of the two documents, the
> burden on implementations is much higher as they need to both
> detect the aggregation patterns and create an arbitrary number of
> aggregate KBs (rather than have a single KB and enforce GRAPH
> constraints as simple row restrictions (also inscrutably called
> "SELECT" in relational algebra).
> 
> In short, I don't think that multiple DB approach is worth the
> impelementation/specification burden. I doubt that the aggregation of
> FROM'd graphs is the only screw case I can come up with.
> 
> 
> [FORMULAS] http://www.w3.org/2001/12/attributions/#formulas
> [QUADS] http://www.w3.org/2001/12/attributions/#quads
> [TIMBL] http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2004Nov/0020.html
> -- 
> -eric
> 
> office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
>                         Shonan Fujisawa Campus, Keio University,
>                         5322 Endo, Fujisawa, Kanagawa 252-8520
>                         JAPAN
>         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
> cell:   +81.90.6533.3882
> 
> (eric@w3.org)
> Feel free to forward this message to any list for any purpose other than
> email address distribution.
Received on Thursday, 27 January 2005 20:25:52 UTC