Re: scientific publishing task force update from William Bug on 2006-06-09 (public-semweb-lifesci@w3.org from June 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Fri, 9 Jun 2006 16:57:51 -0400
To: "Bob Futrelle" <bob.futrelle@gmail.com>
Cc: public-semweb-lifesci@w3.org, Rob Williams <s2g2@mycingular.blackberry.net>
Message-Id: <B8F4D593-E215-43A9-B68C-5170EF84BCE0@DrexelMed.edu>
Those references would be really wonderful to have in hand.  Many  
thanks, Bob.

Given the direction we are trying to go in on the BIRN project - very  
extensive use of FuGO (http://fugo.sourceforge.net/) & PATO (http:// 
obo.sourceforge.net/cgi-bin/detail.cgi?attribute_and_value) - both of  
which are to be included in the OBO Foundry (http://obofoundry.org/)  
- for creating formal, computable descriptions of PRIMARY  
experimental data, this project very much pricked up my ears when I  
saw the post on Slashdot.  Given the critical role ontologies have  
played in robotics over the last few decades, I'm not surprised  
something like this should come from a researcher in that field,  
especially one who works with the laboratory robotics vendors such as  
TECAN and others.

By the way, I really liked your commentary in response to AJ's  
request - and your work in this area - especially as relates to  
knowledge extraction from figures/figure legends in STM  
publications.  This work is extremely valuable, in my humble opinion  
- work we should all get familiar with.

Having said that, I believe the proposal AJ has put out there on  
behalf of the Sci Pub Task Force addresses a related yet distinct  
aspect of formalizing descriptions of scientific results.  As the  
Wiki page says at the outset, "(the objective is) to develop a  
general purpose ontology for self-publishing (a) single experiment in  
RDF format that will facilitate data sharing, discovery and  
integration."

There is an effort within the community of researchers curating and  
using biomedical ontologies to create formal descriptions of primary  
research data to develop a deterministically precise and semantically  
rich description of research data.  The idea - very much critical to  
some of the objectives we have in the neuro-centric BIRN project - is  
to use formal descriptions from elemental biomed. knowledge domains  
(as opposed to complex, pre-coordinated descriptions that can't be  
easily decomposed by humans or algorithms) in order to start building  
a web of semantic information as closely tied to the primary data as  
possible.  If one is to perform large-scale, data integration and  
meta-analysis on data derived from disparate studies (as BLAST and  
HMM gene finding algorithms can do with genomic sequence data), one  
will often have to go right back to the primary data - and have  
complete, formal descriptions of data acquisition provenance and all  
the processing done on the data prior to any significant reduction/ 
analysis.  This is certainly true both for neuroimaging data sets  
derived from all imaging modalities used in neuroscience, as well as  
microarray data.  The level to which a system can be expected to re- 
analyzed pooled results from separate studies will often depend on  
the completeness and computability of this provenance and initial  
process-related metadata.  In BIRN, we are thinking of adopted a  
standard not unlike what the NCI caBIG project has developed for  
certifying software - a compliance standard for formal descriptions  
of primary data.  In other words, if you provide your data in this  
form with this level of detailed quantitative and semantic  
description you can expect your data will be utilized by laundry list  
A of integration and pooled analysis tools.  If you only conform to  
SOME of the requirements, your data will only be available to a  
lesser list of more general meta-analysis tools and procedures.

When I say this "form", what we - in BIRN - are looking to work  
toward as the "ideal" standard is to use FuGO-related ontologies for  
the formal descriptions of devices, assays, reagents (and ultimately  
environmental factors/subject history required to complete the  
context for the experimental design/appropriate analysis).  We would  
then use PATO to link these descriptions of primary data to the  
ontological descriptions of phenotype they are intended to  
represent.  For instance, we would take a formal description of data  
collected using an assay for left-handedness (BIRN has many  
behavioral & cognitive assays linked to functional brain imaging  
experiments to contend with) and link that to a formal description of  
the trait (handedness in primates) using PATO (Phenotype Attribute  
and Trait Ontology).

How does this relate to KR and the expression of experimental data in  
STM articles?

As I see it, we would hope publishers (and/or the relevant research  
societies associated with various STM publications - and/or NCBI -  
and the device & reagent vendors) will gradually develop tools to  
make it TRIVIAL for researchers to represent primary data in this  
form.  To my mind, the best alternative business model the commercial  
publishers can propose in light of the Open Access debate would be to  
focus on this critical technology development issue - one very much  
related to their core business of publishing scientific info &  
knowledge - would be for them to provide the very valuable service -  
and work with group such as the W3C SW HCLSIG SciPub Task Force - to  
establish relevant formats, standards, and KR resources.  Certainly,  
the "open" STM publishers who have evolved in the last 5+ years -  
Biomed Central & PLoS, in particular - have shown they recognize the  
importance of this effort.  Taking this tack - focussing on providing  
technological support difficult or prohibitively expensive for the  
community of grant-supported researchers to build themselves, was  
exactly the argument that got commercial publishers into the STM  
business in a big way back in the late 50's, early 60's.  As has  
happened about every 20 - 30 years since the late 1900's, the  
quantity of published manuscripts was taking a qualitative leap  
forward, and beginning to outstrip the capacity of the "old  
fashioned" publishing technology in wide spread us by the vendors  
serving the society-based publishers.  Certain commercial publishers  
were beginning to computerize their operations providing efficiencies  
and economies of scale that put them in the position to offer a good  
deal to societies who were increasingly strapped for $$$ and having  
to put severe limits on the number of manuscripts they publish.  This  
is a condition that will be very familiar to anyone who's been  
involved with STM publishing since the mid-90's (I guess we're  
nearing another 20 - 30 year qualitative jump forward).  Anyway - I  
don't see why they should drop the idiotic practice of tying their  
revenue to frozen IP and go back to their original "value add"  
proposition - better technology to provide for the evolving  
publishing needs of the of the STM.

Enough of my screed on Open Access -

As I see it, the NLP approach and efforts such as are being proposed  
here for more formally precise and complete descriptions of the  
primary data itself can run in parallel - working the niche they are  
most effective at handling but both ultimately converging on a much  
more complete, formal, and machine pars-able representation of  
research data (including the more reduced forms your work addresses  
when working on the current representation of data in the STM  
literature).

Cheers,
Bill

On Jun 9, 2006, at 2:34 PM, Bob Futrelle wrote:

>
> You'd have to download EXPO to see what it contains.  My guess is that
> it's a continuation of the work that King has been doing for some time
> now.  He works on robot experimental configurations for bio expts. and
> wants to represent a structured version of the output (or drivers?).
> He has a student, I think, who should be producing some papers on EXPO
> fairly soon.  There may be some powerpoint floating around that tells
> more.
>
> I'll write King and alert him to our discussion and ask him, point
> blank, where can we get an explanation of what EXPO is and what it
> does.  (He's on my editorial board, for Biological Knowledge, so I'm
> in touch with him.)
>
>  - Bob Futrelle
>
> On 6/9/06, William Bug <William.Bug@drexelmed.edu> wrote:
>>
>> This was a new one on me too, Mark.  It was posted to Slashdot the
>> other day, and the Sorceforge site the article points to is
>> essentially empty.
>>
>> http://sourceforge.net/projects/expo/
>>
>> As you might gather, EXPO is not a very good term to search in all
>> the usual suspect search engines - INSPEC, PubMed, IEEE XPlore,
>> CiteSeer.IST, and Google/Google Scholar.  Only a very few specific
>> studies using EXPO in the title came up in:
>>
>> PubMed:
>>
>> CT-expo--a novel program for dose evaluation in CT
>> Rofo. 2002 Dec;174(12):1570-6.
>>
>>
>>
>>   INSPEC:
>>
>> The extended Poincare generating function type (EXPO)
>>
>> Extrasolar Planet Observatory (ExPO)
>>
>> EXPO is the integration of two programs, EXTRA and SIRPOW.92 and is a
>> program for full powder decomposition and crystal structure solution.
>>
>>
>>
>> ACL Anthology of research papers in Comp. Linguistics
>>
>> A FORMAL GRAMMAR OF EXPRESSIVENESS FOR SACRED LEGENDS
>> acl.ldc.upenn.edu/C/C80/C80-1023.pdf
>>
>> (an absolutely fascinating manuscript in no way related to this
>> research project)
>>
>>
>> There is certainly much interesting and relevant research going on in
>> this center at the University of Aberystwyth (http://www.aber.ac.uk/
>> compsci/Research/bio/grants.shtml), but I wasn't able to find an
>> specific reference to EXPO anywhere, though clearly it could be the
>> result of research in any one of several of the projects listed.
>>
>> In the end, I just gave up.
>>
>> Cheers,
>> Bill
>>
>>
>> On Jun 9, 2006, at 1:29 PM, Mark Musen wrote:
>>
>> >
>> > On Jun 8, 2006, at 10:09 PM, AJ Chen wrote:
>> >> The first task is to develop an ontology for self-publishing of
>> >> experiment. I have proposed a list of objects and properties
>> >> related to self-publishing experiment. Please download the
>> >> attached file under Task Status and review the proposal. Your
>> >> feedback and comments will be greatly appreciated.  You may also
>> >> edit the file directly and email me the edited file.
>> >>
>> >
>> > A colleague just pointed me to this (rather vacuous) article.  Does
>> > anyone know more about this work?
>> >
>> > http://www.newscientisttech.com/article/dn9288-translator-lets-
>> > computers-understand-experiments-.html
>> >
>> > Mark
>> >
>>
>> Bill Bug
>> Senior Analyst/Ontological Engineer
>>
>> Laboratory for Bioimaging  & Anatomical Informatics
>> www.neuroterrain.org
>> Department of Neurobiology & Anatomy
>> Drexel University College of Medicine
>> 2900 Queen Lane
>> Philadelphia, PA    19129
>> 215 991 8430 (ph)
>> 610 457 0443 (mobile)
>> 215 843 9367 (fax)
>>
>>
>> Please Note: I now have a new email - William.Bug@DrexelMed.edu
>>
>>
>>
>>
>>
>>
>>
>> This email and any accompany attachments are confidential. This  
>> information is intended solely for the use of the individual to  
>> whom it is addressed. Any review, disclosure, copying,  
>> distribution, or use of this email communication by others is  
>> strictly prohibited. If you are not the intended recipient please  
>> notify us immediately by returning this message to the sender and  
>> delete all copies. Thank you for your cooperation.
>>
>>
>
>
> -- 
> Robert P. Futrelle
>    Associate Professor
> Biological Knowledge Laboratory
> College of Computer and Information Science
> Northeastern University MS WVH202
> 360 Huntington Ave.
> Boston, MA 02115
>
> Office: (617)-373-4239
> Fax:    (617)-373-5121
> http://www.ccs.neu.edu/home/futrelle
> http://www.bionlp.org
> http://www.diagrams.org
> http://biologicalknowledge.com
>

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompany attachments are confidential. This information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this email communication by others is strictly prohibited. If you are not the intended recipient please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Friday, 9 June 2006 20:58:27 UTC