RE: Size estimates of current LS space (and Introductions) from Nigam Shah on 2006-08-04 (public-semweb-lifesci@w3.org from August 2006)

From: Nigam Shah <nigam@stanford.edu>
Date: Thu, 3 Aug 2006 17:16:07 -0700
To: "'Jeremy Zucker'" <zucker@research.dfci.harvard.edu>
Cc: "'Skinner, Karen $\(NIH/NIDA$\) [E]'" <kskinner@nida.nih.gov>, "'Eric Neumann'" <eneumann@teranode.com>, "'public-semweb-lifesci hcls'" <public-semweb-lifesci@w3.org>
Message-ID: <007801c6b75b$2ed5fa80$8119fea9@stanford.edu>

Hi Jeremy,

Please see inline comments below..

> The semantic web interests me for several reasons.  For one, I
> believe it will be a solid substrate for distributed curation,
> which is a necessary part of the ongoing effort to improve the
> quality of the biological data we use.
> Like wikipedia, we need a way to exploit the wisdom of crowds
> to discover, cross-validate, and annotate the biological data
> that we are currently using.

I would like to get some feedback on the feasibility of distributed
curation. PIs who have years of experience in managing curation
projects are not that enthusiastic about its role. It seems the CS
community is all for it but the actual *users* havent really bought
in. For example, there is a great tool developed by Chitta Baral's
group at ASU called CBioC

http://cbioc.eas.asu.edu/

When you search the PubMed database and display a particular abstract,
CBioC will automatically display the interactions found in the CBioC
database related to the abstract you are viewing. If the abstract has
not been processed by CBioC before, the automatic extraction system
will run "on the fly".

CBioC runs as a web browser extension, not as a stand-alone
application. When you visit the Entrez (PubMed) web site, CBioC
automatically opens within a "web band" at the bottom of the main
browser window in either IE or Firefox. 

To me it appears to be a great tool, something that can actually
exploit the wisdom of the crowds without much effort required (other
than saying yes/no to an automatically extracted interaction).

What do others on the list think about such projects?

> Third, with semantic web technologies such as description
> logics and rules, it should be possible to infer when two data
> sets are really talking about the same biological object, even
> if they use different identifiers to describe the thing.
> To that end, I have been working with Alan Ruttenberg and
> others at York University, UCSD and SRI to develop an
> OWL/Description-logic based method to automate the integration
> of two E. coli databases.

I think with SW technologies it should be possible to go beyond
integration. At Stanford, we did a test project for integrating
ecocyc, reactome and kegg using BioPAX to create the Pathway Knowledge
Base, PKB at http://pkb.stanford.edu [its currently down coz we are
moving machines]. 

Some time back, we also "proofread" reactome (v.10 to v.14) to find
four types of errors. More details at:
http://www.biomedcentral.com/1471-2105/7/196 (A case study in pathway
knowledgebase verification and http://www.hybrow.org/Reactome

Now, putting these two projects togather, it is possible to see how SW
technologies can be leveraged to for both, automated integration AND
proofreading + may be more fancier analyses. It would be great if
people on the HCLSIG provide comments/suggestions for such [possible]
efforts.

Regards,
Nigam.

Received on Friday, 4 August 2006 00:16:45 UTC