Re: RE: BioRDF Brainstorming from Peter Ansell on 2008-02-12 (public-semweb-lifesci@w3.org from February 2008)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Wed, 13 Feb 2008 06:37:15 +1000
To: public-semweb-lifesci@w3.org
Cc: "Colin Batchelor" <BatchelorC@rsc.org>, "samwald@gmx.at" <samwald@gmx.at>, holger.stenzhorn@deri.org, p.roe@qut.edu.au, j.hogan@qut.edu.au
Message-ID: <a1be7e0e0802121237s5059bbcdx31d2715615622b4e@mail.gmail.com>

On 12/02/2008, samwald@gmx.at <samwald@gmx.at> wrote:
>
> > Good point.  What I was sort of driving at (and failing) was the context
> > in which the facts are mentioned---are they the aim of the paper,
> > background information, mentioned as results and so forth?
>
> In what I see as the ideal scenario, each text/database entry would only be
> annotated with the results and not with background information or the
> citation of secondary sources. The annotations should only consist of what
> the creators of the annotated text/database entry considered to be 'true', and what has not already been stated in other texts/database entries.

I don't think there is harm in placing a list of references in with
each entry. It gives publications a context in the overall knowledge.
I agree that it is not necessary to put every detail of knowledge into
publication annotations, but if a text-mining computer is going to
help with the annotation process they are likely to attempt to do it
all. Seems unavoidable that the end product will be annotations which
contain all knowledge, in anticipation of being able to find out what
part of it is unique through another method, or do that dynamically
when one wants to view the information.

> For example, article pmid:123 contains the text
> "We found that bananas are yellow. This is in conflict with article
> pmid:456, which states that bananas are pink".
>
> Article pmid:123 should only be annotated with
> "banana has_quality yellow .
> pmid:123 in_conflict_with pmid:456 ."
> Article pmid:456 should be annotated with
> "banana has_quality pink ."
>
> "banana has_quality pink" should not be re-iterated in the annotation of
> pmid:123.
>
> For widely-accepted and deployed biomedical annotation it really does not need to be more complicated than that. Trying to capture the whole narrative of each publication in detail in such a scenario is very hard and might actually make the annotation as a whole less usable, since it becomes harder to understand, query, integrate and maintain.

On a broader brainstorming note, it would be nice to have a way of
specifying that  a certain dc:Agent thinks that one is a better
annotation than the other also, with the user deciding to trust
certain Agents to give them useful knowledge, or inversely, to not
trust specific Agents who they find do not annotate things well.

I think the probabalistic ontology concept (links below) is a useful
starting point for determining trust at the annotation level by
pooling metadata from comments about annotations to provide a somewhat
democratic distribution. Hopefully this would not conflict with the
simple is best principle, rather allowing for multiple opinions about
the validity of a certain scientific statement or not.

http://www.pr-owl.org/pr-owl.owl

http://ite.gmu.edu/~klaskey/papers/FOIS2006_CostaLaskey.pdf

Peter

Received on Tuesday, 12 February 2008 20:37:36 UTC