W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > June 2007

RE: Evidence

From: Kashyap, Vipul <VKASHYAP1@PARTNERS.ORG>
Date: Thu, 21 Jun 2007 10:47:45 -0400
Message-ID: <DBA3C02EAD0DC14BBB667C345EE2D1248402E2@PHSXMB20.partners.org>
To: "M. Scott Marshall" <marshall@science.uva.nl>, "Alan Ruttenberg" <alanruttenberg@gmail.com>
Cc: <public-semweb-lifesci@w3.org>, "Pat Hayes" <phayes@ihmc.us>


I was wondering if you could summarize your points and post it on the wiki.




Vipul Kashyap, Ph.D.
Senior Medical Informatician
Clinical Informatics R&D, Partners HealthCare System
Phone: (781)416-9254
Cell: (617)943-7120
To keep up you need the right answers; to get ahead you need the right questions
---John Browning and Spencer Reiss, Wired 6.04.95
> -----Original Message-----
> From: M. Scott Marshall [mailto:marshall@science.uva.nl]
> Sent: Thursday, June 21, 2007 10:24 AM
> To: Alan Ruttenberg
> Cc: Kashyap, Vipul; public-semweb-lifesci@w3.org; Pat Hayes
> Subject: Re: Evidence
> I see evidence as a special type of provenance for "facts",
> "observations", and "conclusions" in a knowledgebase.
> Motivation for evidence is the desire to represent information about an
> experiment, such as the hypothesis. If we want to work with hypotheses,
> then we need to represent hypothetical information. But how? A uniform
> approach would treat all information as propositional or hypothetical
> rather than to have a separate class so that "hypothesis" can be
> promoted to "fact" but I digress.. :) However we represent it, we would
> like to know how our hypothetical fact is supported by evidence, such as
> protocols and methods.
> Alan Ruttenberg wrote:
> > Maybe we can bring this back to the main subject: What problems are we
> > trying to solve by recording evidence? What are the ways we would know
> > that we've made a mistake?
> >
> > (I suspect that there will be a variety of answers to this, and I'm very
> > curious to hear what people think)
> I'll try to answer this:
> We want to record evidence in order to evaluate and weigh the quality of
> data/information, as well as steer and/or evaluate any conclusions that
> are made on the basis of that data. This is especially important in an
> environment for computational experiments. My test: If we can apply our
> own criterion to evaluate our confidence in a given fact, even when it
> is in someone else's knowledgebase, we have succeeded with our
> representation of the evidence. So, an example of how to represent such
> criterion reason with it about example evidence would be nice..
> Evidence in Text mining
> -----------------------
> Suppose that we are trying to distill knowledge provided by a
> scientific article into some representation. Example: "Is the article
> about proteinX?". If so, "How relevant is proteinX to the article?" and
> so forth. If the distillation process is carried out by a person, then
> who? In the case of text mining, we might like to know what algorithms
> and techniques, queries, pattern recognizers (Bayesian or lexical
> patterns?), threshold values, etc. were used to extract knowledge. If a
> person used a text mining workflow to support the distillation process,
> then we would like the URL to the workflow WSDL (from which we can
> usually discover the other details) and to know who the person was.
> In general, we would like to know the resources involved in producing a
> particular piece of data (or "fact"). We would like to know the actors,
> roles, conditions, algorithms, program versions, what rules were fired,
> and information resources.
> An important challenge in the future will be to combine results from
> manual and automated processes. Most of us would tend to view "facts"
> that result from an automated process as more hypothetical or
> questionable than the same coming from a human expert. On the road to
> automation, however, we should eventually reach the point that the
> quality of "text mining"-supported (i.e. not generated!) annotations
> will be generally higher than manual-only annotation.
> Evidence in Microarrays
> -----------------------
> I don't intend to start a debate about the particulars of microarrays
> but I think that evidence comes up in practice here throughout the
> entire process of measurement and analysis. Gene expression, as measured
> by microarrays, is actually a measurement of changes in mRNA levels at a
> particular time, which *indicates* how much change in the process of
> expression has occurred under *specific* *conditions*. So, already we
> have an example of terminology that is not ontologically accurate when
> incorrectly applied (to microarrays) - technically, measuring mRNA
> levels is not equivalent to measuring the quantity of protein product
> ("expression"). But the term has been in use for so long that it remains
> acceptable to refer to microarray analysis as "expression analysis". :)
> In the case of "gene expression", the statistical process of microarray
> analysis only provides a probability that a gene is up or down regulated
>   (e.g. in the common reference model). However, there is a series of
> decisions and conditions that lead up to the "call" (up, down,
> unchanged) for a particular gene and thus the resulting set of
> differentially expressed genes for the array. The following conditions
> can all be relevant to decisions in how much weight to give to the
> resulting data:
> * Experimental design - organism, conditions, disease, phenotype, ..
> * Source of cells, enzymes, ..
> * Materials handling (thawed? how often?)
> * Protocols used such as RNA extraction
> * Operator
> * Array layout and design - including choice of oligos
> * Instrumentation details - array spotter/printer, laser type and
> calibration, ..
> * Ozone levels (I'm not kidding!)
> * Image analysis ("Feature Extraction") software and settings
> * Type of normalization
> * Criteria for discarding data as "outliers"
> * Criteria for classifying gene as differentially expressed (p-value
> cutoff, ANOVA, ..)
> Again, the point that I'm trying to make about microarrays is that
> evidence (as well as uncertainty), can be represented and used, even for
> the measurements ("observations") themselves. But this is not done in
> practice. Even if you wanted to simply "pool" microarray data (most
> people don't), it is very difficult to do because some of the most
> important metadata (e.g. experimental design), if available, is often in
> free text format.
> -scott
> p.s. My introduction to HCLS summarizes the way that I look at evidence
> a lot more succinctly than the above:  ;)
> http://lists.w3.org/Archives/Public/public-semweb-
> lifesci/2006Feb/0131.html
> --
> M. Scott Marshall
> http://staff.science.uva.nl/~marshall
> http://adaptivedisclosure.org

The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information.
Received on Thursday, 21 June 2007 14:52:01 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:20:28 UTC