Re: Named graphs etc from Jeremy Carroll on 2004-03-12 (www-archive@w3.org from March 2004)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Fri, 12 Mar 2004 14:43:29 +0000
To: Patrick Stickler <patrick.stickler@nokia.com>
Cc: ext Pat Hayes <phayes@ihmc.us>, www-archive@w3.org, chris@bizer.de
Message-ID: <4051CC91.3090909@hplb.hpl.hp.com>
I am having a lot of difficulty in understanding the problems that Pat and 
Patrick see with bootstrapping, and needing to distinguish asserting (by 
the publisher) from affirming (by anyone). I don't see why bootstrapping 
stands outside the normal MT.

Here is my understanding of a SemWeb agent A007

A007 has access to a set of named graphs.
(e.g. from a SemWeb Crawler where each name is a URL and each graph is 
given by the RDF/XML at that URL, or from some Trix files or whatever).


A007 has access to a set of named graphs.
e.g.

using some of Patrick's example, or Chris's example, ...

> :X
> {
>   :X trix:assertedBy ex:Bob .

>   :X trix:signature "..." .          -> verifiable signature 
> for :X+ex:Bob
> }
> 
> :Y
> {

 >   :X trix:assertedBy ex:Jane .

>   :Y trix:asserted "true"^^xsd:boolean .      -> authoritative assertion 
> of :Y
>   :Y trix:assertedBy ex:Jane .       

>   :Y trix:signature "..." .                   -> verifiable signature 
> for :Y+ex:Jane
> }
> 
> :Z
> {
>   :X trix:asserted "false"^^xsd:boolean .     -> third-party 
> non-assertion of :X
>   :Z trix:asserted "true"^^xsd:boolean .      -> authoritative assertion 
> of :Z
>   :Z trix:assertedBy ex:Bill .                -> authority for :Z
>   :Z trix:signature "..." .                   -> verifiable signature 
> for :Z+ex:Bill
> }
> 

etc.

I would ideally want the trix:asserted predicate to be say trix:affirmed,
and have an agent URL as its object.

We also want to have signatures in there, which as patrick points out are 
functions of agents+graphs

We also may want to have verification chains of agent identities (including 
their public keys, as already available as part of Public Key infrastructure)

We may also want to have some verifiable relationships between URLs and 
agents, indicating ability to publish. To some extent this information is 
already public - registries of who owns which domain name linked with 
public key registries. To some extent this is simply more meta-information.

In keeping with Chris's view, just how much of this A007 chooses to use is 
A007's business, and not fundamental.

So

Assume A007 has a policy of trust anything for which he can identify a 
party to sue. (I think this is implicit in Pat's view of publication).

The algorithm used by A007 may work like this:

1) Non deterministic choose a named graph g from the input

2) Hypothesise g provisional adding it to A007's knowledge base KB

3) If g trix:affirmedBy UUU (is a consequence of KB) where UUU is an 
identifiable party, and all the signatures are good (a signature by UUU 
affirming g, and a trusted chain of signatures, from some root body such as 
verisign or microsoft, affirming the public key and identity of UUU), then 
the hypotesis is good and we confirm g in the knowledge base.
    Otherwise fail, and go back to 1, for a different choice of g.

4) If knowledge base is contradictory then someone is lying and A007 
engages lawyers, otherwise repeat from 1, to add more graphs to the 
knowledge base

5) Terminate when no more graphs can be added.


(Sorry algorithm is somewhat unpolished)
Note - the only way that graphs have any meaning is in step 3, which uses 
RDF MT.
Also note - actually having signed graphs in there is not going to happen 
by accident.

A more conservative A008 can require at step 3 that the agent UUU has 
publication rights over the URL naming g. (And that these publication 
rights are known, signed and verified).

The whole things gets off the ground by the usual public key trick of 
having some well-known facts, like the public key of verisign.


I also believe that, in practice, most SemWeb applications can be less 
paranoid, and could use a policy of say, believe anything your friends say, 
where your friends are as defined in your own local foaf file. Also the dig 
sig stuff is only relevant if handling financially relevant material; and 
even then not very - just knowing that the URL were appropriate is 
typically enough (e.g. I jsut spent £100 at www.ryanair.com, with only DNS 
to convince me that I as really dealing with the same people who have 
previously carried me on an aeroplane - possible target of fraud, in 
practice fraudsters find easier or bigger targets)

In this framework asserting some RDF is simply about how much trouble you 
are prepared to go to in order to convince your reader.

If the RDF is not intended as commerically relevant, the answer is probably 
very little, so just adding a single triple saying you affirm it, is enough 
for a minimalist trusting algorithm.
If the RDF is adevertising a webservice with Ts&Cs then we wheel out the 
PKI machinery, and make sure that a paranoid customer can check everything.

So every act of publishing is an assertion, but we can make more forceful 
assertions or less forceful ones depending on how we do it. I guess it is 
useful to have some way of explicitly marking a graph as false (seems a bt 
strong) or unaffirmed by the author (my preference). Notice that the 
algorithm above works fine to permit third party affirmation even with 
author denial, and the third party is liable not the author.

Jeremy
Received on Friday, 12 March 2004 09:44:29 UTC