Re: Trust in statements (still is BioRDF Brainstorming) from Chris Mungall on 2008-02-13 (public-semweb-lifesci@w3.org from February 2008)

From: Chris Mungall <cjm@fruitfly.org>
Date: Wed, 13 Feb 2008 14:45:46 -0800
To: "M. Scott Marshall" <marshall@science.uva.nl>
Cc: Matt Williams <matthew.williams@cancer.org.uk>, Alan Ruttenberg <alanruttenberg@gmail.com>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
Message-Id: <1CCFE387-0C03-4221-A04D-1168250ED0F6@fruitfly.org>
On Feb 13, 2008, at 2:14 PM, M. Scott Marshall wrote:

>
> Dear Matt,
>
> I see 'trust' as a 'view' that can be produced by running a filter  
> over
> the data (provenance). The filter would implement my trust policy, or
> one of them. In other words, my trust in a given 'agent' can be due to
> the fact that it produces data using a certain algorithm. I also  
> place a
> certain level of trust in the instrumentation that produced the data,
> the p-values of an analysis in the processing pipeline, human  
> operators
> involved, etc. So, the weights or confidence measures that you are
> describing and that Alan is qualifying would be the *output* of such a
> trust policy or filter. I would not besmirch the data with my own
> personal trust models nor easily trust those of others. ;) I guess  
> that
> what I'm trying to say is equivalent to Alan's point: I would  
> prefer to
> keep facts and their evidence disclosed symbolically in the data so  
> that
> different 'views' can take them into account.
>
> But, before I go to build such 'views' or filters, I will wait for  
> that
> sort of information to become machine-readable as data provenance. :)
>
> However, I *can* try to make that sort of information available for  
> data
> that I am helping to manage or produce. It seems that having a triple
> store (such as Virtuoso) with named graph support would make it  
> possible
> to produce several types of potentially useful data provenence.

The problem with NGs (and especially with existing RDF support) is  
the close coupling between provenance and the URI from which the  
triples were obtained.

If I wish to make available a collection of triples t1...tn where  
each triple has its own provenance information tp1...tpn then I have  
to have n URIs. If I serve up those triples through a SPARQL endpoint  
then the act of creating a new graph will lose all the original NG  
information.

NGs are not directly supported in the RDF model and it's not clear  
how NGs would be accessed from an OWL-level API such as the OWLAPI.

There are proposed extensions such as Trix/Trig - and there may be  
some relation between NGs and quoting in N3. However, AFAIK the  
meaning of these extensions in the OWL-DL formalism is not clear.

I don't think NGs are so useful beyond SPARQL. I think the only  
option here is to embrace rdf-reification (and to push for better  
syntax, query and tool support). After all, this is how provenance at  
the OWL level will work in OWL1.1 (i.e. annotating axioms)

> -scott
>
> -- 
> M. Scott Marshall
> http://staff.science.uva.nl/~marshall
> http://adaptivedisclosure.org
>
> Matt Williams wrote:
>>
>> Dear Alan,
>>
>> Thank you for making my point much more clearly than I managed. I'm a
>> little wary of probabilities in situations like the one you  
>> describe, as
>> it always seems a little hard to pin down what is meant by them. At
>> least with the symbolic approach, you can give a short paragraph  
>> saying
>> what you mean.
>>
>> I'll try and find a paper on the "p-modals" (possible, probable,  
>> etc.)
>> and ways of combining them tomorrow and put a paragraph on the wiki.
>>
>> Matt
>>
>> Alan Ruttenberg wrote:
>>> I'm personally fond of the symbolic approach - I think it is more
>>> direct and easier to explain what is meant. It's harder to align
>>> people to a numerical system, I would think, and also provides a  
>>> false
>>> sense of precision. Explanations are easier to understand as  
>>> well: "2
>>> sources thought this probable, and 1 thought is doubtful" can be
>>> grokked more easily than score: 70%
>>>
>>> -Alan
>>>
>>> On Feb 12, 2008, at 4:03 PM, Matt Williams wrote:
>>>
>>>>
>>>> Just a quick note that the 'trust' we place in an agent /could/ be
>>>> described probabilistically, but could also be described logically.
>>>> I'm assuming that the probabilities that the trust annotations are
>>>> likely to subjective probabilities (as we're unlikely to have  
>>>> enough
>>>> data to generate objective probabilities for the degree of trust).
>>>>
>>>> If you ask people to annotate with probabilities, the next thing  
>>>> you
>>>> might want to do is to define a set of common probabilities (10  
>>>> - 90,
>>>> in 10% increments, for example).
>>>>
>>>> The alternative is that one could annotate a source, or agent, with
>>>> our degree of belief, chosen from some dictionary of options
>>>> (probable, possible, doubtful, implausible, etc.).
>>>>
>>>> Although there are some formal differences, the two approaches  
>>>> end up
>>>> as something very similar. There is of course a great deal of  
>>>> work on
>>>> managing conflicting annotations and levels of belief in the  
>>>> literature.
>>>>
>>>> Matt
>>>>
>>>> --http://acl.icnet.uk/~mw
>>>> http://adhominem.blogsome.com/
>>>> +44 (0)7834 899570
>>>>
>>>
>>
>
>
>
>
Received on Wednesday, 13 February 2008 22:46:36 UTC