RE: HCLS Scientific Discourse Call Monday, May 2nd, 10 am EST - minutes from Waard, Anita de A (ELS-AMS) on 2011-05-03 (public-semweb-lifesci@w3.org from May 2011)

From: Waard, Anita de A (ELS-AMS) <A.dewaard@elsevier.com>
Date: Tue, 3 May 2011 22:45:32 +0200
To: "M. Scott Marshall" <mscottmarshall@gmail.com>
Cc: "Alexander Garcia Castro" <alexgarciac@gmail.com>, "Jodi Schneider" <jodi.schneider@deri.org>, "barend mons" <barend.mons@nbic.nl>, "Tim Clark" <tim_clark@harvard.edu>, "HCLS IG" <public-semweb-lifesci@w3.org>, "Alberto Accomazzi" <aaccomazzi@cfa.harvard.edu>, "Sophia Ananiadou" <Sophia.Ananiadou@manchester.ac.uk>, "Philip Bourne" <bourne@sdsc.edu>, "Gully Burns" <gully@usc.edu>, "Daniel, Ronald (ELS-SDG)" <R.Daniel@elsevier.com>, "Rahul Dave" <rahuldave@gmail.com>, "Alf Eaton" <A.Eaton@nature.com>, "Matthew Gamble" <matthew.gamble@gmail.com>, "Yolanda Gil" <gil@isi.edu>, "Alyssa Goodman" <agoodman@cfa.harvard.edu>, "Paul Groth" <pgroth@gmail.com>, "Tudor Groza" <tudor.groza@deri.org>, "Hays, Ellen (ELS-BUR)" <E.Hays@elsevier.com>, "Maryann Martone" <maryann@ncmir.ucsd.edu>, "David R Newman" <drn05r@ecs.soton.ac.uk>, "Scerri, Antony (ELS-CAM)" <A.scerri@elsevier.com>, "Jack Park" <jackpark@gmail.com>, "Silvio Peroni" <speroni@cs.unibo.it>, "Steve Pettifer" <steve.pettifer@manchester.ac.uk>, "Philippe Rocca-Serra" <proccaserra@googlemail.com>, "Cartic Ramakrishnan" <cartic@isi.edu>, "RebholzSchuhmann" <d.rebholz.schuhmann@gmail.com>, "David Shotton" <david.shotton@zoo.ox.ac.uk>, "Kaitlin Thaney" <k.thaney@digital-science.com>, "Karin Verspoor" <Karin.Verspoor@ucdenver.edu>, "Lynette Hirschman" <lynette@mitre.org>, "Susanna-Assunta Sansone" <sa.sansone@gmail.com>, "Kees van Bochove" <business@keesvanbochove.nl>, "Katy Wolstencroft" <katy@cs.man.ac.uk>, "Jun Zhao" <jun.zhao@zoo.ox.ac.uk>, "Paul Groth" <pgroth@few.vu.nl>, "Marco Roos" <M.Roos1@uva.nl>
Message-ID: <994C62F4D342094CB0A2A3431B8ED8500A9D5BFD@ELSAMSEXCP02VA.science.regn.net>

Dear Scott, all:

We had a most productive call yesterday, largely echoing your thoughts, below. Two points were covered: a discussion of the BioRDF demonstrator, and a proposal to make a joint demonstrator - see also http://www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/20110502. 

1. BioRDF Demonstrator: 

BioRDF group: http://www.w3.org/wiki/HCLSIG_BioRDF_Subgroup) 
Demo: http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html)
Annotated corpus with triples: http://biordfmicroarray.googlecode.com/hg/all3_genelists_provenance.ttl

Corpus: 
[13] Dunckley T, Beach TG, et al.. (2006). Gene expression correlates of neurofibrillary tangles in Alzheimer's disease. Neurobiol Aging;27:1359-71. http://www.ncbi.nlm.nih.gov/pubmed/16242812 
[14] Liang WS, Dunckley T, et al.. (2007). Gene expression profiles in anatomically and functionally distinct regions of the normal aged human brain. Physiol Genomics 28: 311-22. http://www.ncbi.nlm.nih.gov/pubmed/18332434
[15] Liang WS, Reiman EM, et al.. (2008). Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons. Proc Natl Acad Sci U S A l2008;105: 4441-6.
http://www.ncbi.nlm.nih.gov/pubmed/17077275 

Need help to automate: 
1) Institution provenance and PIs etc. 
2) Experimental context: what platform (e.g. microarray experiments - what company etc); disease patients have; where in the brain samples were collected, how far along was the disease when the sample was collected.
3) From this: generate list of genes, need details of statistical methods, what was algorithm etc. and analysis provenance etc. and confidence in statistical results  

Current use case: cancer; previous use case: Alzheimers

2. BioRDF-Scientific Discourse Joint Demonstrator proposal

The scientific discourse group (in particular: Jodi, Anita and Paolo) will mark up the corpus that the BioRDF group has worked on. 
We will mark up these documents with
a) ORB 
b) Annotation Ontology 
within the Harvard Annotation Framework, and link the BioRDF triples to specific locations in the text. 

This serves three purposes:
1) It allows the Scientific Discourse group to test if ORB + AO is enough to mark a given location in the document. If so - that concludes the deliverables of the subtask; if not, we need and will define a 'medium-grained' ontology. 
2) It provides the BioRDF group with more detailed, location-linked annotations to their test corpus 3) This can help them in their quest to automate the mining of these triples 

After this markup is done, the evaluations will be: 
1) Is ORB + AO enough? Is the SciDisc/Rhetorical structure group done?
2) Can this be a useful start towards automating the knowledge the BioRDF group wants to automate? 

If anyone from either group is interested in participating in this exercise, please let us know. 

Best, 

- Anita.  

Anita de Waard
Disruptive Technologies Director, Elsevier Labs
http://elsatglabs.com/labs/anita/
a.dewaard@elsevier.com

-----Original Message-----
From: M. Scott Marshall [mailto:mscottmarshall@gmail.com]
Sent: Mon 5/2/2011 9:51
To: Waard, Anita de A (ELS-AMS)
Cc: Alexander Garcia Castro; Jodi Schneider; barend mons; Tim Clark; HCLS IG; Alberto Accomazzi; Sophia Ananiadou; Philip Bourne; Gully Burns; Daniel, Ronald (ELS-SDG); Rahul Dave; Alf Eaton; Matthew Gamble; Yolanda Gil; Alyssa Goodman; Paul Groth; Tudor Groza; Hays, Ellen (ELS-BUR); Maryann Martone; David R Newman; Scerri, Antony (ELS-CAM); Jack Park; Silvio Peroni; Steve Pettifer; Philippe Rocca-Serra; Cartic Ramakrishnan; RebholzSchuhmann; David Shotton; Kaitlin Thaney; Karin Verspoor; Lynette Hirschman; Susanna-Assunta Sansone; Kees van Bochove; Katy Wolstencroft; Jun Zhao; Paul Groth; Marco Roos
Subject: Re: HCLS Scientific Discourse Call Monday, May 2nd, 10 am EST - the real invite

Regrets - have a meeting during that time.

Addressing overlap with other task forces: at last year's C-SHALS,
Tim, Kei, and I noticed some overlap in the RDF representation of
experiments and started teleconferences in which BioRDF and SciDisc
(Sudeshna Das) could stay coordinated. We are continuing that work in
the form of a W3C note about RDF for expression studies (i.e.
microarrays but not necessarily excluding other forms of expression
data). We are hoping to find common (stable) ground for the W3C note
by comparing / contrasting a number of existing approaches. We have
presented our approaches in BioRDF telcons and a 'Metadata Capture'
meeting in the Netherlands organized by Kees van Bochove.

About ways to further combine across the overlaps:

I see a lot of potential to combine approaches from the task forces by
putting together several common elements (that already exist to some
extent):

* representation of provenance for text-mined assertions

* representation of microarray experiment results in RDF with
provenance information about the RDF itself, in addition to experiment
provenance and RDF representation of experiment metadata

A) performing microarray analysis in a workflow
B) performing text mining in a workflow
C) linking microarray analysis results with literature (linking RDF
output from A & B)

The above (A - C) combines a microarray experiment with a
computational experiment (in the form of a workflow in which the
analysis is done), which makes it important to clearly delineate
different types of provenance - that of the microarray experiment,
that of the workflow, and that of the RDF production.

Perhaps there's a way to use a medium granularity markup of the
microarray study article to enhance the literature mining, or use it
together with the results RDF somehow.

BTW, if one considers a nanopublication as a sort of prescribed
provenance for a particular type of data, then you could consider the
gene list (or each gene on it) to be a nanopublication.

Cheers,
Scott

P.S. I had the above combination of ideas in mind for my presentation
to SciDisc. Sorry to blurt it out without being able to stick around
to explain what I mean. Maybe at the next telcon.

P.P.S. There is a EU project Workflow4ever that has some common
interests in the above.

-- 
M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls
http://staff.science.uva.nl/~marshall

On Mon, May 2, 2011 at 2:41 PM, Waard, Anita de A (ELS-AMS)
<A.dewaard@elsevier.com> wrote:
> Apologies for my confusing mail this weekend: obviously I'm not quite ready to send real emails from my iPhone! The goal was to discuss our conclusions from the April 18th meeting, which I copied below, and look back on our use cases and the work from other HCLS subgroups.
>
> Please find an improved version below or at //www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/20110502.
>
> Best,
>
> - Anita.
>
> Anita de Waard
> Disruptive Technologies Director, Elsevier Labs
> http://elsatglabs.com/labs/anita/
> a.dewaard@elsevier.com
>
> http://www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/20110502
>
> Please join the HCLS Scientific Discourse concall on Monday, May 2 10 am EST, 3 pm BST, 7 am Pacific
>
> Agenda:
>
> 1) Timeframe and names for plans below
> 2) How close are we to fulfilling our original use cases? (http://www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/)
> 3) Overlap with other HCLS subgroups (see http://www.w3.org/wiki/HCLSIG for a listing)
> 4) Next steps.
>
> Conclusions meeting April 18:
>
> 1) Joint work on annotation a corpus of documents with links to workflow components and data. This will allow a concrete instantiation of the medium-grained ontology, and offer a discussion point for describing the experiment/paper link which we are approaching from many different sides. Alex Garcia will jumpstart this process by making a collection of full-text Elsevier documents available which he has annotated with RDF; after seeing these, we will select a subcorpus to mark up a) Data b) Experimental model c) Key discourse components from, and work to make a demonstrator.
>
> 2) A paper. Discussing our various models, and ways to integrate; include discussion re. overlap/difference between (explicit, personal) knowledge in discourse and (implicit, shared) knowledge that underlies experimental models. Could be possible outcome of demo.
>
> 3) A face-to-face meeting. Kees van Bochove has kindly agreed to organise this. Possible venues: ISMB in Vienna, ICBO in Buffalo, or a one-off workshop in the Netherlands. Topic: Experiment/discourse integration: models, examples, and next steps.
>
>
> Dial-in & IRC Information
>
>    * Dial-In #: +1.617.761.6200 (Cambridge, MA)
>    * Dial-In #: +33.4.26.46.79.03 (Paris, France)
>    * Dial-In #: +44.203.318.0479 (London, UK)
>    * Participant Access Code: 42572 ("HCLS2")
>    * IRC Channel: irc.w3.org port 6665 channel #HCLS2 use IRC direct link or (see W3C IRC page for details, or see Web IRC)
>    * Mibbit quick start: Click on mibbit for instant IRC access
>    * Duration: 1hr
> Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)
>
>

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)

Received on Tuesday, 3 May 2011 20:47:06 UTC