Re: Provenance and reification from Graham Klyne on 2001-11-14 (w3c-rdfcore-wg@w3.org from November 2001)

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Wed, 14 Nov 2001 11:35:33 +0000
To: Pat Hayes <phayes@ai.uwf.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <5.1.0.14.2.20011114104427.045deec0@joy.songbird.com>
At 11:43 AM 11/13/01 -0600, Pat Hayes wrote:
>>Following today's teleconference, I was thinking some more about 
>>provenance (statements like document X says Y, possibly with other 
>>qualifications).
>>
>>The question raised was whether the statement (Y) referenced in an 
>>assertion of provenance was a statement token, or some lexically-based 
>>value, or an interpretation of (meaning of) the statement.
>>
>>Consider the case of a contract written in a foreign language.  My lawyer 
>>may tell me that "the contact with abc, dated dmy, that I am about to 
>>sign commits me to pay P pounds in return for some good Q". This is a 
>>statement of provenance, but it is useless to me if it simply quotes the 
>>content of the contract -- I want to know the meaning (expressed in some 
>>language that I understand) of the content of the contract.
>>
>>My point is that there is a clear argument for suggesting that assertions 
>>of provenance should reference the meaning of the referenced statements, 
>>not their lexical form.
>
>Good point. On the other hand, you do want to say it was that particular 
>document you signed, right, not some other document that just happened to 
>mean the same thing (still less, *all* other documents that mean the same 
>thing.) So I think that in a case like this we need at least two notions: 
>the physical (token) document you actually signed, and the content 
>(interpretation) of that token.

You may be right, but let me play devil's advocate...

I have taken a view that it doesn't make sense to have a signature that 
applies to an RDF graph;  in every case I can think of, what one signs is a 
physical manifestation of the graph:  a digital signature on a 
serialization of the graph, or a written signature on a drawing of a 
graph.  In these cases, it seems to me that it is exactly the physical 
representation that is signed.  Thus, if one wanted to embed an actual 
signature (token?) in an RDF graph, one would also need to embed the 
original signed document, complete with all it's syntactic oddities (maybe 
including whitespace, XML comments, etc.);  e.g. the full text of an 
RDF/XML document might appear as a rdf:parseType='literal'.

Of course, this does not (of itself) tell us what the signature actually 
means about the signer's intent.  Assuming one can know what the signer 
meant to convey by the act of signing, I think one would also need a very 
generalized truth function over RDF/XML text (or other RDF syntax) to 
convey such meaning of intent.

To me, it seems more tractable and reasonable that the meaning of the 
signature is encoded.  Suppose:

   Document X says "Y is true".
   Document X is signed by Z with the intent of assuring the accuracy of 
its content.

Then I think what is useful to capture is:

   Z assures { the content of document X }.

which further reduces to:

   Z assures Y

I think this is what is called a modality?  If we trust assurances given by 
Z then from the above we may assume the truth of Y.

>>Coming back to RDF:  the expression of provenance that I favour is one 
>>along the lines of:
>>
>>     X contains statements Y
>>
>>meaning
>>
>>     the content of X entails assert(Y)
>
>Hmm, what does that 'assert' mean? (Why didn't you just say, X entails Y ?
>(Is the idea that assert(Y) means the RDF triples you would get by 
>de-reifying Y, so that Y is an object that can be unpacked, as it were, 
>into triples ?)

Yes... I didn't know how I should present this.  In conventional logic 
notation, X entails Y would work just fine, but I am still a bit hung-up by 
this RDF idea that a statement expressed is usually presumed to be asserted.

>>(there is no interpretation in which I(content of X) is true, and 
>>I(assert(Y)) is false.)
>>
>>where X is an identifiable resource to which other properties can be added:
>>
>>     X saidBy Person .
>>     X saidOn Date .
>>     X approvedBy Authority .
>>
>>etc.
>
>This raises a classical issue in belief/attribution logics. Suppose that X 
>said P and Q is entailed by P; did X say Q as well? The objection is that 
>X may not have realized that P entails Q, and if he had realized it he may 
>not have said P in the first place. We could take the position that it is 
>up to X to check all the entailments of anything he claims to be true, but 
>still I bet some lawyer is going to want to distinguish the case where X 
>actually *said* Q from the case where X said something that entails Q.

Yes, I think all of those are issues if we try to map the entire realm of 
human interaction into RDF.  My feeling is that the most useful bit here is 
to be able to convey the intended meaning separately (or differently) from 
the syntax originally used to express it.  My take on the above is that if 
two agents are to exchange information and accept provable inferences based 
on that information, they will also need to refer (implicitly or 
explicitly) to some commonly accepted baseline.

So in the above case, suppose Y is the agent to whom X says P entails P AND 
Q.  Y may be entitled to deduce that X has thereby asserted Q if it can 
also find in some commonly agreed baseline that P is true.  Further, I 
think the semantics of RDF would have it that:

   there is no interpretation in which P is true and P AND Q is false

also means

   there is no interpretation in which P is true and Q is false

i.e. according to the semantics of RDF, X already asserted that P entails 
Q, and (in using RDF) had no place to make such a statement without 
realizing it.

..

In summary:  I see a clear use-case for a flavour of reification that can 
be used to express modality (if that's the right term?).  I'm not so clear 
about the requirement for conveying signature information at the token level.

..

Background:  my comments here are based on past thinking about two 
different requirement areas:


(a) in CC/PP, we discussed early on how to convey trust in client 
profiles.  E.g. if a client says one thing about their capabilities and an 
intermediary says something else, who is the origin server to trust when 
generating content?  There are two threats to consider here:  (i) 
information gets corrupted in transit by an unauthorized party.  (ii) 
incorrect information is supplied to a party that is authorized to modify 
the profile according top defined rules.

Case (i) is handled by ordinary crypto protocols -- channel-based (e.g. 
SSL) or object-based (e.g. S/MIME or XML signature).  One is forced to 
assume that anyone with the keys needed to participate in the protocol is 
truested to play by the rules.  No special RDF support is required here.

Case (ii) needs to be handled at the semantic level, taking account of 
which parties are trusted to accurately supply what kinds of 
information.  E.g. a proxy may be trusted to accurately describe 
communication capabilities, but a client may be trusted to accurately 
convey rendering capabilities or user preferences.  To make such decisions, 
provenance information needs to be provided along with the actual 
information.  (The parties involved are trusted, per (i), to provide 
accurate provenance information --i.e. to identify themselves properly-- 
and to not meddle with other provenance information or content.)  I think 
this is exactly the kind of information that RDF should be able to handle 
in its stride.


(b) I was involved some time ago in some discussions with folks at 
Rutherford Appleton Laboratory who are interested in trust modelling and 
management for Grid computing, and who were looking to RDF as a way to 
convey this kind of information.  They have an interest in modelling trust 
as a shades-of-grey kind of thing, not simply belief or non-belief.  I have 
also considered that one could combine information from other sources (e.g. 
protocol parameters, data mining, etc.) to provide supporting evidence for 
or against an putative assertion.  All this works at the "semantic" 
level;  some of the information may be signed, some may just be gathered 
from a range of sources.  As in case (a) above, signatures are applied to a 
specific serialization, for communication between systems;  semantic 
processing is presumed to take place within a trusted environment (i.e. 
trusted to faithfully process the information according to agreed rules).

..

#g



------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
       __
      /\ \
     /  \ \
    / /\ \ \
   / / /\ \ \
  / / /__\_\ \
/ / /________\
\/___________/
Received on Wednesday, 14 November 2001 07:37:06 UTC