Re: Named graphs etc from Patrick Stickler on 2004-03-15 (www-archive@w3.org from March 2004)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Mon, 15 Mar 2004 09:42:35 +0200
To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>
Cc: ext Pat Hayes <phayes@ihmc.us>, www-archive@w3.org, chris@bizer.de
Message-Id: <5361C0A8-7654-11D8-A56E-000A95EAFCEA@nokia.com>
On Mar 12, 2004, at 16:43, ext Jeremy Carroll wrote:

>
> I am having a lot of difficulty in understanding the problems that Pat 
> and Patrick see with bootstrapping, and needing to distinguish 
> asserting (by the publisher) from affirming (by anyone). I don't see 
> why bootstrapping stands outside the normal MT.
>
> Here is my understanding of a SemWeb agent A007
>
> A007 has access to a set of named graphs.
> (e.g. from a SemWeb Crawler where each name is a URL and each graph is 
> given by the RDF/XML at that URL, or from some Trix files or 
> whatever).
>
>
> A007 has access to a set of named graphs.
> e.g.
>
> using some of Patrick's example, or Chris's example, ...
>
>> :X
>> {
>>   :X trix:assertedBy ex:Bob .
>
>>   :X trix:signature "..." .          -> verifiable signature for 
>> :X+ex:Bob
>> }
>> :Y
>> {
>
> >   :X trix:assertedBy ex:Jane .
>
>>   :Y trix:asserted "true"^^xsd:boolean .      -> authoritative 
>> assertion of :Y
>>   :Y trix:assertedBy ex:Jane .
>
>>   :Y trix:signature "..." .                   -> verifiable signature 
>> for :Y+ex:Jane
>> }
>> :Z
>> {
>>   :X trix:asserted "false"^^xsd:boolean .     -> third-party 
>> non-assertion of :X
>>   :Z trix:asserted "true"^^xsd:boolean .      -> authoritative 
>> assertion of :Z
>>   :Z trix:assertedBy ex:Bill .                -> authority for :Z
>>   :Z trix:signature "..." .                   -> verifiable signature 
>> for :Z+ex:Bill
>> }
>
> etc.
>
> I would ideally want the trix:asserted predicate to be say 
> trix:affirmed,
> and have an agent URL as its object.


I think that it is important to speak of assertion, as 
defined/understood
per the RDF MT, rather than introduce yet another term which essentially
is a synonym for "assert".

If the connotation of "affirmation" is not clear in the definition of
"assertion" in the RDF specs and elsewhere, then we simply have to 
address
that in the documentation -- but keep the vocabulary as tightly aligned
as possible with the terms in the RDF specs.

>
> We also want to have signatures in there, which as patrick points out 
> are functions of agents+graphs
>
> We also may want to have verification chains of agent identities 
> (including their public keys, as already available as part of Public 
> Key infrastructure)

Right. I deliberately avoided discussion of how a given graph is 
authenticated,
working on the presumption that the signature would contain everything 
we'd
need to do that -- though the additional external machinery is of 
course as
important as the signature in the graph.

>
> We may also want to have some verifiable relationships between URLs 
> and agents, indicating ability to publish. To some extent this 
> information is already public - registries of who owns which domain 
> name linked with public key registries. To some extent this is simply 
> more meta-information.
>
> In keeping with Chris's view, just how much of this A007 chooses to 
> use is A007's business, and not fundamental.
>
> So
>
> Assume A007 has a policy of trust anything for which he can identify a 
> party to sue. (I think this is implicit in Pat's view of publication).
>
> The algorithm used by A007 may work like this:
>
> 1) Non deterministic choose a named graph g from the input
>
> 2) Hypothesise g provisional adding it to A007's knowledge base KB
>
> 3) If g trix:affirmedBy UUU (is a consequence of KB) where UUU is an 
> identifiable party, and all the signatures are good (a signature by 
> UUU affirming g, and a trusted chain of signatures, from some root 
> body such as verisign or microsoft, affirming the public key and 
> identity of UUU), then the hypotesis is good and we confirm g in the 
> knowledge base.
>    Otherwise fail, and go back to 1, for a different choice of g.
>
> 4) If knowledge base is contradictory then someone is lying and A007 
> engages lawyers, otherwise repeat from 1, to add more graphs to the 
> knowledge base
>
> 5) Terminate when no more graphs can be added.
>
>
> (Sorry algorithm is somewhat unpolished)
> Note - the only way that graphs have any meaning is in step 3, which 
> uses RDF MT.
> Also note - actually having signed graphs in there is not going to 
> happen by accident.

I think this is essentially compatible with what I've been thinking, but
also seems to miss the key point I've been trying to highlight, that for
step 3, *if* all of the qualifications of the graphs (authority, 
assertiveness,
signature, etc.) are expressed in RDF, then you *have* to have some
mechanism to terminate those signature chains.

Either, you have to have some special graph, as part of the 
configuration of
the agent, that is taken as trusted, and which identifies certain other 
known
graphs as asserted/authenticated/trusted -- or you have to have some 
way to
interpret the qualifications in terms of an individual "terminal" graph.

I find the former approach to be fragile, non-scalable, and impractical
in an application context where agents will be needing to consider 
graphs
from arbitrary sources which have never before been encountered. If they
take the latter approach, whereby they can test the authenticity and
assertiveness of the graph at face value, using a standardized 
methodology
signature/key/graph verification, then we have a fully generic, 
flexible,
architecture that does not require special pre-knowledge to terminate
those signature chains.

>
> A more conservative A008 can require at step 3 that the agent UUU has 
> publication rights over the URL naming g. (And that these publication 
> rights are known, signed and verified).

This seems to me to be a kind of centralized approach requiring known 
registries
of publishing agents, etc. which I'd prefer to avoid if at all possible.

IMO the only reason why the web (or internet for that matter) is 
successful is
because it is decentralized. We'd want key SW machinery be likewise.

>
> The whole things gets off the ground by the usual public key trick of 
> having some well-known facts, like the public key of verisign.

The needed facts could be in the graph itself. I.e., the URI of the 
authority
could provide a means of determining where/how the graph would be 
authenticated
given the graph URI and signature.

I would hope that knowledge of some fixed, centralized agency for 
verification
of signatures would not be manditory *even if* some folks choose to use 
such
centralized services. The architecture should itself be as agnostic as 
possible.

>
>
> I also believe that, in practice, most SemWeb applications can be less 
> paranoid, and could use a policy of say, believe anything your friends 
> say, where your friends are as defined in your own local foaf file. 
> Also the dig sig stuff is only relevant if handling financially 
> relevant material; and even then not very - just knowing that the URL 
> were appropriate is typically enough (e.g. I jsut spent £100 at 
> www.ryanair.com, with only DNS to convince me that I as really dealing 
> with the same people who have previously carried me on an aeroplane - 
> possible target of fraud, in practice fraudsters find easier or bigger 
> targets)
>
> In this framework asserting some RDF is simply about how much trouble 
> you are prepared to go to in order to convince your reader.
>
> If the RDF is not intended as commerically relevant, the answer is 
> probably very little, so just adding a single triple saying you affirm 
> it, is enough for a minimalist trusting algorithm.
> If the RDF is adevertising a webservice with Ts&Cs then we wheel out 
> the PKI machinery, and make sure that a paranoid customer can check 
> everything.

Agreed.

>
> So every act of publishing is an assertion, but we can make more 
> forceful assertions or less forceful ones depending on how we do it.

Or, rather, every act of publishing may be taken to be an assertion by
SW agents, but determination of assertion, authority, and authenticity
can be made more precisely by agents if certain key information is
provided in the graph.

I don't think it's fair to equate the act of publication with assertion,
even if that is the most common presumption, since that suggests that
publishers are liable for making claims that they did not intend to 
make;
they were simply making "data" available on the web for some other 
intended
purpose.

Intent of use needs to be kept somewhat separate from publication 
(accessibility)
in general, I think.

> I guess it is useful to have some way of explicitly marking a graph as 
> false (seems a bt strong) or unaffirmed by the author (my preference).

Or simply as a means of quoting. I don't think that we'd want to say 
that an unasserted
graph is "false" per say -- but rather that the source/publisher is 
making available
statements which it does not itself claim to be true. It's a subtle 
distinction, but I
think an important one.

Not equating "unasserted" with "false" also keeps the door open for 
conditional assertion,
where the default interpretation/intent of the publisher is not to 
assert some graph,
but particular qualification of the graph may indicate scenarios where 
an agent might
choose to consider the graph as asserted -- e.g. in some hypothetical 
scenario, or e.g.
if some system enters some particular state -- such that in that state, 
the statements
are true, but out of that state, they are not (necessarily) true, etc. 
etc.

E.g.

:X (
    ...
    :X ex:scopeOfAssertion ex:EmergencyCondition7218 .
    ex:WidgetA2819 ex:fooThreshhold "8500"^^xsd:integer .
    ...
)

I.e. graph :X is not normally asserted (claimed to reflect truth) 
except in the
context of Emergency Condition #7218, in which case, it is to be 
considered true.

We may, though, want still to include a property to allow agents to 
explicitly
oppose assertion of a given graph (for whatever reason), e.g.

    trix:rejectedBy
       a trix:GraphQualificationProperty ;
       rdfs:description "The specified authority rejects the statements 
expressed in the indicated graph" ;
       rdfs:domain trix:Graph ;
       rdfs:range trix:Authority .

etc.

Leaving it open why the authority/agent rejects the graph (perhaps the 
rejection was
inferred from distrust of the authority of the graph in question, etc. 
rather than
any explicit claim that the particular graph is false).


> Notice that the algorithm above works fine to permit third party 
> affirmation even with author denial, and the third party is liable not 
> the author.

I agree, and think it's essential that we allow this.

It seems we are pretty much in agreement about the key issues (unless 
I've
missed or misunderstood something).

Patrick


--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Monday, 15 March 2004 02:42:56 UTC