- From: Jeremy Zucker <zucker@research.dfci.harvard.edu>
- Date: Wed, 2 Aug 2006 05:28:06 -0400
- To: Jeremy Zucker <zucker@research.dfci.harvard.edu>
- Cc: "Skinner, Karen ((NIH/NIDA)) [E]" <kskinner@nida.nih.gov>, "Eric Neumann" <eneumann@teranode.com>, "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>
Hello folks, It appears that I forgot to put the URL in that last email about the pathguide: http://www.pathguide.org Well, since I've already managed to embarrass myself publicly, I figure I might as well introduce myself formally. My name is Jeremy Zucker, and I am a bioinformatics specialist at the Dana-Farber Cancer Institute and a research fellow at Harvard Medical School in George Church's lab. I have been working mainly with data integration[1] issues that arise from automating the metabolic reconstruction of pathway/genome databases [2] for the purpose of generating flux balance models [3]. I also work with Joanne Luciano and others on a pathway exchange format in OWL/RDF called BioPAX. The semantic web interests me for several reasons. For one, I believe it will be a solid substrate for distributed curation, which is a necessary part of the ongoing effort to improve the quality of the biological data we use. Like wikipedia, we need a way to exploit the wisdom of crowds to discover, cross-validate, and annotate the biological data that we are currently using. Second, the semantic web should make it easier to do distributed "pathway data mashups", such as overlaying expression data onto metabolic, signal transduction, and gene regulation pathways, to understand how the cell controls the production of itself, how certain disease states form, how to alter metabolic pathways to remove toxins from the environment, and how to optimize the metabolic fluxes to produce useful biomolecules. Third, with semantic web technologies such as description logics and rules, it should be possible to infer when two data sets are really talking about the same biological object, even if they use different identifiers to describe the thing. To that end, I have been working with Alan Ruttenberg and others at York University, UCSD and SRI to develop an OWL/Description-logic based method to automate the integration of two E. coli databases. The first database has an extremely well-developed ontology [2]. The other has a highly curated data set specifically tuned for flux balance analysis [3]. By merging them, it should be possible to automatically generate metabolic flux models for any sequenced organism.[4] There, now that I have introduced myself and my interests, let's try to estimate the number of javabeans in the Life sciences jar! Sincerely, Jeremy [1] http://www.freebiology.org/wiki/Debugging_the_bug [2] http://biocyc.org [3] http://gcrg.ucsd.edu [4] http://prelude.bu.edu/publications/Segre_etal_OMICS_2003.pdf On Aug 2, 2006, at 1:17 AM, Jeremy Zucker wrote: > > Hi folks, > > One resource that is likely to be of use in the pathway space is > the pathguide: > It has detailed statistics about the size of each database and > other metadata for about 222 biological pathway databases. > This is the target space for conversion to BioPAX. > > Sincerely, > > Jeremy > > > > On Jul 31, 2006, at 6:35 PM, Skinner, Karen ((NIH/NIDA)) [E] wrote: > >> >> These may be helpful resources: >> >> The Nucleic Acids Research Public Links Directory >> See: >> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? >> db=pubmed&cmd=Retrieve&dop >> t=AbstractPlus&list_uids=16845014&query_hl=6&itool=pubmed_docsum >> >> >> And the Nucleic Acids 2006 Molecular Biology Database Collection >> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? >> itool=abstractplus&db=pubm >> ed&cmd=Retrieve&dopt=abstractplus&list_uids=16381871 >> >> Karen Skinner, Ph.D. >> Deputy Director for Science and Technology Development >> Division of Basic Neuroscience and Behavior Research >> National Institute on Drug Abuse >> Room 4243 >> 6001 Executive Boulevard >> Bethesda, Maryland 20892-9651 >> 301-435-0886 or 301-443-1887 >> ks79x@nih.gov >> >> >> -----Original Message----- >> From: Eric Neumann [mailto:eneumann@teranode.com] >> Sent: Monday, July 31, 2006 10:07 AM >> To: public-semweb-lifesci hcls >> Subject: Size estimates of current LS space >> >> >> >> As per today's Telcon, does any person with genomics knowledge (that >> includes you too Carole) have estimates for the following numbers: >> >> 1. How many bio-molecular and organism-anatomical-functional entities >> and records (broad sense) are currently accessible through the web >> (excluding LIMS entities, such as samples, for now)? >> >> 2. Does this number grow substantially when it is allowed to include >> every variant of protein, gene, etc. per species (i.e., not >> instances of >> real molecules or organisms)? >> >> >> I think these would be quite useful for other W3C members to be aware >> of, since some proposed mechanisms would require their global >> indexing... >> >> Eric >> >> >> > >
Received on Wednesday, 2 August 2006 09:28:28 UTC