Re: HCLS Scientific Discourse Call Monday, May 2nd, 10 am EST - minutes

hi jodi, 



one of the great successes of FGED ( www.fged.org )  was to get journals to reguire that data be made public for published articles.  so i didn't do more than go to the pubmed link and click on the Geo DataSets link.  then, if it is in GEO, it has usually been imported into ArrayExpress using a common pattern for the ArrayExpress accession. 



cheers, 

michael 


----- Original Message ----- 
From: "Jodi Schneider" <jodi.schneider@deri.org> 
To: "Michael Miller" <Michael.Miller@systemsbiology.org> 
Cc: "Anita de A Waard (ELS-AMS)" <A.dewaard@elsevier.com>, "M. Scott Marshall" <mscottmarshall@gmail.com>, "Alexander Garcia Castro" <alexgarciac@gmail.com>, "barend mons" <barend.mons@nbic.nl>, "Tim Clark" <tim_clark@harvard.edu>, "HCLS IG" <public-semweb-lifesci@w3.org>, "Alberto Accomazzi" <aaccomazzi@cfa.harvard.edu>, "Sophia Ananiadou" <Sophia.Ananiadou@manchester.ac.uk>, "Philip Bourne" <bourne@sdsc.edu>, "Gully Burns" <gully@usc.edu>, "Ronald Daniel (ELS-SDG)" <R.Daniel@elsevier.com>, "Rahul Dave" <rahuldave@gmail.com>, "Alf Eaton" <A.Eaton@nature.com>, "Matthew Gamble" <matthew.gamble@gmail.com>, "Yolanda Gil" <gil@isi.edu>, "Alyssa Goodman" <agoodman@cfa.harvard.edu>, "Paul Groth" <pgroth@gmail.com>, "Tudor Groza" <tudor.groza@deri.org>, "Ellen Hays (ELS-BUR)" <E.Hays@elsevier.com>, "Maryann Martone" <maryann@ncmir.ucsd.edu>, "David R Newman" <drn05r@ecs.soton.ac.uk>, "Antony Scerri (ELS-CAM)" <A.scerri@elsevier.com>, "Jack Park" <jackpark@gmail.com>, "Silvio Peroni" <speroni@cs.unibo.it>, "Steve Pettifer" <steve.pettifer@manchester.ac.uk>, "Philippe Rocca-Serra" <proccaserra@googlemail.com>, "Cartic Ramakrishnan" <cartic@isi.edu>, "RebholzSchuhmann" <d.rebholz.schuhmann@gmail.com>, "David Shotton" <david.shotton@zoo.ox.ac.uk>, "Kaitlin Thaney" <k.thaney@digital-science.com>, "Karin Verspoor" <Karin.Verspoor@ucdenver.edu>, "Lynette Hirschman" <lynette@mitre.org>, "Susanna-Assunta Sansone" <sa.sansone@gmail.com>, "Kees van Bochove" <business@keesvanbochove.nl>, "Katy Wolstencroft" <katy@cs.man.ac.uk>, "Jun Zhao" <jun.zhao@zoo.ox.ac.uk>, "Paul Groth" <pgroth@few.vu.nl>, "Marco Roos" <M.Roos1@uva.nl> 
Sent: Sunday, May 8, 2011 6:50:40 AM 
Subject: Re: HCLS Scientific Discourse Call Monday, May 2nd, 10 am EST - minutes 

Thanks for this, Michael. 


I've added it as context to our notes on these articles: 
http://www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/20110502#1._BioRDF_Demonstrator: 


Can you tell me where you searched for this data? I'm not very familiar with this domain. 


-Jodi 



On 4 May 2011, at 16:47, Michael Miller wrote: 



hi all, 

i did a quick search on the articles cited to find their associated data. 
two of the articles (14 and 15) appear to be different takes on the same 
data: 



[13] Dunckley T, Beach TG, et al.. (2006). Gene expression correlates 


of neurofibrillary tangles in Alzheimer's disease. Neurobiol 


Aging;27:1359-71. http://www.ncbi.nlm.nih.gov/pubmed/16242812 
GEO: GSE4757 ArrayExpress: E-GEOD-4757 



[14] Liang WS, Dunckley T, et al.. (2007). Gene expression profiles in 


anatomically and functionally distinct regions of the normal aged human 


brain. Physiol Genomics 28: 311-22. 


http://www.ncbi.nlm.nih.gov/pubmed/18332434 
GEO: GSE5281(same as below) ArrayExpress: N/A 



[15] Liang WS, Reiman EM, et al.. (2008). Alzheimer's disease is 


associated with reduced expression of energy metabolism genes in 


posterior cingulate neurons. Proc Natl Acad Sci U S A l2008;105: 4441- 


6. 


http://www.ncbi.nlm.nih.gov/pubmed/17077275 
GEO: GSE5281(same as above) ArrayExpress: N/A 

cheers, 
michael 



-----Original Message----- 


From: public-semweb-lifesci-request@w3.org [mailto:public-semweb- 


lifesci-request@w3.org] On Behalf Of Waard, Anita de A (ELS-AMS) 


Sent: Tuesday, May 03, 2011 1:46 PM 


To: M. Scott Marshall 


Cc: Alexander Garcia Castro; Jodi Schneider; barend mons; Tim Clark; 


HCLS IG; Alberto Accomazzi; Sophia Ananiadou; Philip Bourne; Gully 


Burns; Daniel, Ronald (ELS-SDG); Rahul Dave; Alf Eaton; Matthew Gamble; 


Yolanda Gil; Alyssa Goodman; Paul Groth; Tudor Groza; Hays, Ellen (ELS- 


BUR); Maryann Martone; David R Newman; Scerri, Antony (ELS-CAM); Jack 


Park; Silvio Peroni; Steve Pettifer; Philippe Rocca-Serra; Cartic 


Ramakrishnan; RebholzSchuhmann; David Shotton; Kaitlin Thaney; Karin 


Verspoor; Lynette Hirschman; Susanna-Assunta Sansone; Kees van Bochove; 


Katy Wolstencroft; Jun Zhao; Paul Groth; Marco Roos 


Subject: RE: HCLS Scientific Discourse Call Monday, May 2nd, 10 am EST 


- minutes 





Dear Scott, all: 





We had a most productive call yesterday, largely echoing your thoughts, 


below. Two points were covered: a discussion of the BioRDF 


demonstrator, and a proposal to make a joint demonstrator - see also 


http://www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meet 


ings/20110502. 





1. BioRDF Demonstrator: 





BioRDF group: http://www.w3.org/wiki/HCLSIG_BioRDF_Subgroup ) 


Demo: http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html ) 


Annotated corpus with triples: 


http://biordfmicroarray.googlecode.com/hg/all3_genelists_provenance.ttl 





Corpus: 


[13] Dunckley T, Beach TG, et al.. (2006). Gene expression correlates 


of neurofibrillary tangles in Alzheimer's disease. Neurobiol 


Aging;27:1359-71. http://www.ncbi.nlm.nih.gov/pubmed/16242812 


[14] Liang WS, Dunckley T, et al.. (2007). Gene expression profiles in 


anatomically and functionally distinct regions of the normal aged human 


brain. Physiol Genomics 28: 311-22. 


http://www.ncbi.nlm.nih.gov/pubmed/18332434 


[15] Liang WS, Reiman EM, et al.. (2008). Alzheimer's disease is 


associated with reduced expression of energy metabolism genes in 


posterior cingulate neurons. Proc Natl Acad Sci U S A l2008;105: 4441- 


6. 


http://www.ncbi.nlm.nih.gov/pubmed/17077275 





Need help to automate: 


1) Institution provenance and PIs etc. 


2) Experimental context: what platform (e.g. microarray experiments - 


what company etc); disease patients have; where in the brain samples 


were collected, how far along was the disease when the sample was 


collected. 


3) From this: generate list of genes, need details of statistical 


methods, what was algorithm etc. and analysis provenance etc. and 


confidence in statistical results 





Current use case: cancer; previous use case: Alzheimers 





2. BioRDF-Scientific Discourse Joint Demonstrator proposal 





The scientific discourse group (in particular: Jodi, Anita and Paolo) 


will mark up the corpus that the BioRDF group has worked on. 


We will mark up these documents with 


a) ORB 


b) Annotation Ontology 


within the Harvard Annotation Framework, and link the BioRDF triples to 


specific locations in the text. 





This serves three purposes: 


1) It allows the Scientific Discourse group to test if ORB + AO is 


enough to mark a given location in the document. If so - that concludes 


the deliverables of the subtask; if not, we need and will define a 


'medium-grained' ontology. 


2) It provides the BioRDF group with more detailed, location-linked 


annotations to their test corpus 3) This can help them in their quest 


to automate the mining of these triples 





After this markup is done, the evaluations will be: 


1) Is ORB + AO enough? Is the SciDisc/Rhetorical structure group done? 


2) Can this be a useful start towards automating the knowledge the 


BioRDF group wants to automate? 





If anyone from either group is interested in participating in this 


exercise, please let us know. 








Best, 





- Anita. 





Anita de Waard 


Disruptive Technologies Director, Elsevier Labs 


http://elsatglabs.com/labs/anita/ 


a.dewaard@elsevier.com 











-----Original Message----- 


From: M. Scott Marshall [mailto:mscottmarshall@gmail.com] 


Sent: Mon 5/2/2011 9:51 


To: Waard, Anita de A (ELS-AMS) 


Cc: Alexander Garcia Castro; Jodi Schneider; barend mons; Tim Clark; 


HCLS IG; Alberto Accomazzi; Sophia Ananiadou; Philip Bourne; Gully 


Burns; Daniel, Ronald (ELS-SDG); Rahul Dave; Alf Eaton; Matthew Gamble; 


Yolanda Gil; Alyssa Goodman; Paul Groth; Tudor Groza; Hays, Ellen (ELS- 


BUR); Maryann Martone; David R Newman; Scerri, Antony (ELS-CAM); Jack 


Park; Silvio Peroni; Steve Pettifer; Philippe Rocca-Serra; Cartic 


Ramakrishnan; RebholzSchuhmann; David Shotton; Kaitlin Thaney; Karin 


Verspoor; Lynette Hirschman; Susanna-Assunta Sansone; Kees van Bochove; 


Katy Wolstencroft; Jun Zhao; Paul Groth; Marco Roos 


Subject: Re: HCLS Scientific Discourse Call Monday, May 2nd, 10 am EST 


- the real invite 





Regrets - have a meeting during that time. 





Addressing overlap with other task forces: at last year's C-SHALS, 


Tim, Kei, and I noticed some overlap in the RDF representation of 


experiments and started teleconferences in which BioRDF and SciDisc 


(Sudeshna Das) could stay coordinated. We are continuing that work in 


the form of a W3C note about RDF for expression studies (i.e. 


microarrays but not necessarily excluding other forms of expression 


data). We are hoping to find common (stable) ground for the W3C note 


by comparing / contrasting a number of existing approaches. We have 


presented our approaches in BioRDF telcons and a 'Metadata Capture' 


meeting in the Netherlands organized by Kees van Bochove. 





About ways to further combine across the overlaps: 





I see a lot of potential to combine approaches from the task forces by 


putting together several common elements (that already exist to some 


extent): 





* representation of provenance for text-mined assertions 





* representation of microarray experiment results in RDF with 


provenance information about the RDF itself, in addition to experiment 


provenance and RDF representation of experiment metadata 





A) performing microarray analysis in a workflow 


B) performing text mining in a workflow 


C) linking microarray analysis results with literature (linking RDF 


output from A & B) 





The above (A - C) combines a microarray experiment with a 


computational experiment (in the form of a workflow in which the 


analysis is done), which makes it important to clearly delineate 


different types of provenance - that of the microarray experiment, 


that of the workflow, and that of the RDF production. 





Perhaps there's a way to use a medium granularity markup of the 


microarray study article to enhance the literature mining, or use it 


together with the results RDF somehow. 





BTW, if one considers a nanopublication as a sort of prescribed 


provenance for a particular type of data, then you could consider the 


gene list (or each gene on it) to be a nanopublication. 





Cheers, 


Scott 





P.S. I had the above combination of ideas in mind for my presentation 


to SciDisc. Sorry to blurt it out without being able to stick around 


to explain what I mean. Maybe at the next telcon. 





P.P.S. There is a EU project Workflow4ever that has some common 


interests in the above. 





-- 


M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls 


http://staff.science.uva.nl/~marshall 





On Mon, May 2, 2011 at 2:41 PM, Waard, Anita de A (ELS-AMS) 


< A.dewaard@elsevier.com > wrote: 




Apologies for my confusing mail this weekend: obviously I'm not quite 


ready to send real emails from my iPhone! The goal was to discuss our 


conclusions from the April 18th meeting, which I copied below, and look 


back on our use cases and the work from other HCLS subgroups. 









Please find an improved version below or at 


//www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/ 


20110502. 









Best, 









- Anita. 









Anita de Waard 




Disruptive Technologies Director, Elsevier Labs 




http://elsatglabs.com/labs/anita/ 




a.dewaard@elsevier.com 












http://www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meet 


ings/20110502 









Please join the HCLS Scientific Discourse concall on Monday, May 2 10 


am EST, 3 pm BST, 7 am Pacific 









Agenda: 









1) Timeframe and names for plans below 




2) How close are we to fulfilling our original use cases? 


( http://www.w3.org/wiki/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/ ) 




3) Overlap with other HCLS subgroups (see 


http://www.w3.org/wiki/HCLSIG for a listing) 




4) Next steps. 









Conclusions meeting April 18: 









1) Joint work on annotation a corpus of documents with links to 


workflow components and data. This will allow a concrete instantiation 


of the medium-grained ontology, and offer a discussion point for 


describing the experiment/paper link which we are approaching from many 


different sides. Alex Garcia will jumpstart this process by making a 


collection of full-text Elsevier documents available which he has 


annotated with RDF; after seeing these, we will select a subcorpus to 


mark up a) Data b) Experimental model c) Key discourse components from, 


and work to make a demonstrator. 









2) A paper. Discussing our various models, and ways to integrate; 


include discussion re. overlap/difference between (explicit, personal) 


knowledge in discourse and (implicit, shared) knowledge that underlies 


experimental models. Could be possible outcome of demo. 









3) A face-to-face meeting. Kees van Bochove has kindly agreed to 


organise this. Possible venues: ISMB in Vienna, ICBO in Buffalo, or a 


one-off workshop in the Netherlands. Topic: Experiment/discourse 


integration: models, examples, and next steps. 














Dial-in & IRC Information 









   * Dial-In #: +1.617.761.6200 (Cambridge, MA) 




   * Dial-In #: +33.4.26.46.79.03 (Paris, France) 




   * Dial-In #: +44.203.318.0479 (London, UK) 




   * Participant Access Code: 42572 ("HCLS2") 




   * IRC Channel: irc.w3.org port 6665 channel #HCLS2 use IRC direct 


link or (see W3C IRC page for details, or see Web IRC) 




   * Mibbit quick start: Click on mibbit for instant IRC access 




   * Duration: 1hr 




Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 


Netherlands, Registration No. 33156677 (The Netherlands) 


















Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 


Netherlands, Registration No. 33156677 (The Netherlands) 

Received on Sunday, 8 May 2011 15:36:35 UTC