- From: William Bug <William.Bug@DrexelMed.edu>
- Date: Fri, 9 Jun 2006 16:57:51 -0400
- To: "Bob Futrelle" <bob.futrelle@gmail.com>
- Cc: public-semweb-lifesci@w3.org, Rob Williams <s2g2@mycingular.blackberry.net>
Those references would be really wonderful to have in hand. Many thanks, Bob. Given the direction we are trying to go in on the BIRN project - very extensive use of FuGO (http://fugo.sourceforge.net/) & PATO (http:// obo.sourceforge.net/cgi-bin/detail.cgi?attribute_and_value) - both of which are to be included in the OBO Foundry (http://obofoundry.org/) - for creating formal, computable descriptions of PRIMARY experimental data, this project very much pricked up my ears when I saw the post on Slashdot. Given the critical role ontologies have played in robotics over the last few decades, I'm not surprised something like this should come from a researcher in that field, especially one who works with the laboratory robotics vendors such as TECAN and others. By the way, I really liked your commentary in response to AJ's request - and your work in this area - especially as relates to knowledge extraction from figures/figure legends in STM publications. This work is extremely valuable, in my humble opinion - work we should all get familiar with. Having said that, I believe the proposal AJ has put out there on behalf of the Sci Pub Task Force addresses a related yet distinct aspect of formalizing descriptions of scientific results. As the Wiki page says at the outset, "(the objective is) to develop a general purpose ontology for self-publishing (a) single experiment in RDF format that will facilitate data sharing, discovery and integration." There is an effort within the community of researchers curating and using biomedical ontologies to create formal descriptions of primary research data to develop a deterministically precise and semantically rich description of research data. The idea - very much critical to some of the objectives we have in the neuro-centric BIRN project - is to use formal descriptions from elemental biomed. knowledge domains (as opposed to complex, pre-coordinated descriptions that can't be easily decomposed by humans or algorithms) in order to start building a web of semantic information as closely tied to the primary data as possible. If one is to perform large-scale, data integration and meta-analysis on data derived from disparate studies (as BLAST and HMM gene finding algorithms can do with genomic sequence data), one will often have to go right back to the primary data - and have complete, formal descriptions of data acquisition provenance and all the processing done on the data prior to any significant reduction/ analysis. This is certainly true both for neuroimaging data sets derived from all imaging modalities used in neuroscience, as well as microarray data. The level to which a system can be expected to re- analyzed pooled results from separate studies will often depend on the completeness and computability of this provenance and initial process-related metadata. In BIRN, we are thinking of adopted a standard not unlike what the NCI caBIG project has developed for certifying software - a compliance standard for formal descriptions of primary data. In other words, if you provide your data in this form with this level of detailed quantitative and semantic description you can expect your data will be utilized by laundry list A of integration and pooled analysis tools. If you only conform to SOME of the requirements, your data will only be available to a lesser list of more general meta-analysis tools and procedures. When I say this "form", what we - in BIRN - are looking to work toward as the "ideal" standard is to use FuGO-related ontologies for the formal descriptions of devices, assays, reagents (and ultimately environmental factors/subject history required to complete the context for the experimental design/appropriate analysis). We would then use PATO to link these descriptions of primary data to the ontological descriptions of phenotype they are intended to represent. For instance, we would take a formal description of data collected using an assay for left-handedness (BIRN has many behavioral & cognitive assays linked to functional brain imaging experiments to contend with) and link that to a formal description of the trait (handedness in primates) using PATO (Phenotype Attribute and Trait Ontology). How does this relate to KR and the expression of experimental data in STM articles? As I see it, we would hope publishers (and/or the relevant research societies associated with various STM publications - and/or NCBI - and the device & reagent vendors) will gradually develop tools to make it TRIVIAL for researchers to represent primary data in this form. To my mind, the best alternative business model the commercial publishers can propose in light of the Open Access debate would be to focus on this critical technology development issue - one very much related to their core business of publishing scientific info & knowledge - would be for them to provide the very valuable service - and work with group such as the W3C SW HCLSIG SciPub Task Force - to establish relevant formats, standards, and KR resources. Certainly, the "open" STM publishers who have evolved in the last 5+ years - Biomed Central & PLoS, in particular - have shown they recognize the importance of this effort. Taking this tack - focussing on providing technological support difficult or prohibitively expensive for the community of grant-supported researchers to build themselves, was exactly the argument that got commercial publishers into the STM business in a big way back in the late 50's, early 60's. As has happened about every 20 - 30 years since the late 1900's, the quantity of published manuscripts was taking a qualitative leap forward, and beginning to outstrip the capacity of the "old fashioned" publishing technology in wide spread us by the vendors serving the society-based publishers. Certain commercial publishers were beginning to computerize their operations providing efficiencies and economies of scale that put them in the position to offer a good deal to societies who were increasingly strapped for $$$ and having to put severe limits on the number of manuscripts they publish. This is a condition that will be very familiar to anyone who's been involved with STM publishing since the mid-90's (I guess we're nearing another 20 - 30 year qualitative jump forward). Anyway - I don't see why they should drop the idiotic practice of tying their revenue to frozen IP and go back to their original "value add" proposition - better technology to provide for the evolving publishing needs of the of the STM. Enough of my screed on Open Access - As I see it, the NLP approach and efforts such as are being proposed here for more formally precise and complete descriptions of the primary data itself can run in parallel - working the niche they are most effective at handling but both ultimately converging on a much more complete, formal, and machine pars-able representation of research data (including the more reduced forms your work addresses when working on the current representation of data in the STM literature). Cheers, Bill On Jun 9, 2006, at 2:34 PM, Bob Futrelle wrote: > > You'd have to download EXPO to see what it contains. My guess is that > it's a continuation of the work that King has been doing for some time > now. He works on robot experimental configurations for bio expts. and > wants to represent a structured version of the output (or drivers?). > He has a student, I think, who should be producing some papers on EXPO > fairly soon. There may be some powerpoint floating around that tells > more. > > I'll write King and alert him to our discussion and ask him, point > blank, where can we get an explanation of what EXPO is and what it > does. (He's on my editorial board, for Biological Knowledge, so I'm > in touch with him.) > > - Bob Futrelle > > On 6/9/06, William Bug <William.Bug@drexelmed.edu> wrote: >> >> This was a new one on me too, Mark. It was posted to Slashdot the >> other day, and the Sorceforge site the article points to is >> essentially empty. >> >> http://sourceforge.net/projects/expo/ >> >> As you might gather, EXPO is not a very good term to search in all >> the usual suspect search engines - INSPEC, PubMed, IEEE XPlore, >> CiteSeer.IST, and Google/Google Scholar. Only a very few specific >> studies using EXPO in the title came up in: >> >> PubMed: >> >> CT-expo--a novel program for dose evaluation in CT >> Rofo. 2002 Dec;174(12):1570-6. >> >> >> >> INSPEC: >> >> The extended Poincare generating function type (EXPO) >> >> Extrasolar Planet Observatory (ExPO) >> >> EXPO is the integration of two programs, EXTRA and SIRPOW.92 and is a >> program for full powder decomposition and crystal structure solution. >> >> >> >> ACL Anthology of research papers in Comp. Linguistics >> >> A FORMAL GRAMMAR OF EXPRESSIVENESS FOR SACRED LEGENDS >> acl.ldc.upenn.edu/C/C80/C80-1023.pdf >> >> (an absolutely fascinating manuscript in no way related to this >> research project) >> >> >> There is certainly much interesting and relevant research going on in >> this center at the University of Aberystwyth (http://www.aber.ac.uk/ >> compsci/Research/bio/grants.shtml), but I wasn't able to find an >> specific reference to EXPO anywhere, though clearly it could be the >> result of research in any one of several of the projects listed. >> >> In the end, I just gave up. >> >> Cheers, >> Bill >> >> >> On Jun 9, 2006, at 1:29 PM, Mark Musen wrote: >> >> > >> > On Jun 8, 2006, at 10:09 PM, AJ Chen wrote: >> >> The first task is to develop an ontology for self-publishing of >> >> experiment. I have proposed a list of objects and properties >> >> related to self-publishing experiment. Please download the >> >> attached file under Task Status and review the proposal. Your >> >> feedback and comments will be greatly appreciated. You may also >> >> edit the file directly and email me the edited file. >> >> >> > >> > A colleague just pointed me to this (rather vacuous) article. Does >> > anyone know more about this work? >> > >> > http://www.newscientisttech.com/article/dn9288-translator-lets- >> > computers-understand-experiments-.html >> > >> > Mark >> > >> >> Bill Bug >> Senior Analyst/Ontological Engineer >> >> Laboratory for Bioimaging & Anatomical Informatics >> www.neuroterrain.org >> Department of Neurobiology & Anatomy >> Drexel University College of Medicine >> 2900 Queen Lane >> Philadelphia, PA 19129 >> 215 991 8430 (ph) >> 610 457 0443 (mobile) >> 215 843 9367 (fax) >> >> >> Please Note: I now have a new email - William.Bug@DrexelMed.edu >> >> >> >> >> >> >> >> This email and any accompany attachments are confidential. This >> information is intended solely for the use of the individual to >> whom it is addressed. Any review, disclosure, copying, >> distribution, or use of this email communication by others is >> strictly prohibited. If you are not the intended recipient please >> notify us immediately by returning this message to the sender and >> delete all copies. Thank you for your cooperation. >> >> > > > -- > Robert P. Futrelle > Associate Professor > Biological Knowledge Laboratory > College of Computer and Information Science > Northeastern University MS WVH202 > 360 Huntington Ave. > Boston, MA 02115 > > Office: (617)-373-4239 > Fax: (617)-373-5121 > http://www.ccs.neu.edu/home/futrelle > http://www.bionlp.org > http://www.diagrams.org > http://biologicalknowledge.com > Bill Bug Senior Analyst/Ontological Engineer Laboratory for Bioimaging & Anatomical Informatics www.neuroterrain.org Department of Neurobiology & Anatomy Drexel University College of Medicine 2900 Queen Lane Philadelphia, PA 19129 215 991 8430 (ph) 610 457 0443 (mobile) 215 843 9367 (fax) Please Note: I now have a new email - William.Bug@DrexelMed.edu This email and any accompany attachments are confidential. This information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this email communication by others is strictly prohibited. If you are not the intended recipient please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Friday, 9 June 2006 20:58:27 UTC