Re: BioRDF Telcon from Kei Cheung on 2010-06-14 (public-semweb-lifesci@w3.org from June 2010)

From: Kei Cheung <kei.cheung@yale.edu>
Date: Mon, 14 Jun 2010 14:41:21 -0400
To: HCLS <public-semweb-lifesci@w3.org>
Message-id: <4C1677D1.6040201@yale.edu>
The minutes for today's BioRDF call are available at:

http://esw.w3.org/HCLSIG_BioRDF_Subgroup/Meetings/2010/06-14_Conference_Call

Thanks to Matthias for scribing. Below are some excerpts of the minutes.

I'll be away for the next 5 weeks. Jun has agreed to convene the BioRDF 
calls on June 21 and July 19.

Cheers,

-Kei

*****Excerpts Begin*******

kei: i want to give a bit of context. part of the agenda is to have jeff 
and stephen give a description of the new NIF sparql endpoints.
... this is related to our broader query federation use-case.
... more recently we also looked at a more specific use case, microarray 
data.
... we have looked at some examples of microarray results in the area of 
neurological diseases.
... from gene expression data we could also link to other kinds of data, 
including imaging.
... let us start with the description of NIF endpoints.

jeff: we can divide it into two types of content: the entities in NIF, 
and the properties that are entered by the community.

<slars0n> NIF SPARQL Endpoint: 
https://confluence.crbs.ucsd.edu/display/NIF/Sparql+endpoint

jeff: this is available in several ways. first, a SPARQL endpoint.
... second, extracted data from literature, and making it query-able. 
this data will also be available through the SPARQL endpoint.
... this contet will be available in September.
... kei: for the microarray use-case, we have looked at some examples, 
such as Alzheimer's disease. Information about different types of 
neurons, brain regions etc. would be very helpful for annotation.

kei: you also mentioned the literature aspect. one of the challenges we 
encountered was extracting gene lists from papers.

stephen: to get a sense of the basic structure of what we are doing 
here: we are going through a loop between an OWL file which contains the 
NIF content, and a Semantic MediaWiki, which has every entity in that 
ontology renderes as a page.
... an ontology engineer can track the changes in the wiki and updates 
the OWL ontology.

<slars0n> http://neurolex.org/wiki/Main_Page

stephen: as the OWL file changes, the engineers will update the wiki.
... neurolex is easily accessible through the web browser. l
... our goal with the ontology was to be very comprehensive. instead of 
linking out, we brought everything in.
... now that semantic web is growing, we are evaluating ways of linking out.
... the SPARQL endpoint i sent before contains a lof of OWL statements 
(restrictions etc.)

<slars0n> http://neurolex.org/wiki/SparqlEndPoint

stephen: the SPARQL endpoint i sent just now comes from the Semantic 
MediaWiki export.
... this version has less OWL (restrictions etc.) in it.
... the two endpoints are on different servers.
... the ontology endpoint is on a virtuoso server. advantage: can do 
transitive queries.
... the performance of transitive queries is good.

scott: did you run rules / pre-inferencing?

stephen: the transitive operation does not require rulesets as far as i 
know, you just add it to the query.
... don't know about internals.

stephen: we used a cloud-based service that lets you do SPARQL
... has well-documented update facilities.
... you can even have a 'history' of updates.

<slars0n> http://n2.talis.com/wiki/Main_Page

stephen: (N2 by Talis)

kei: in HCLS we have two instances of Knowledge Bases: the one at DERI 
(based on Virtuoso), one at University of Berlin (based on AllegroGraph).
... we have the endpoints, but users still need to know detailed graph 
structure. it would be helpful to have some high-level metadata that 
would help users know what information is contained in endpoint, what 
information can be interrelated between endpoints...
... at the moment we have to develop federated queries at a very low level.

scott: at the moment we have a few, nice, useful SPARQL endpoints, but 
in the future there could be thousands of enpoints to choose from
... the ultimate form of federation would be asking the question at one 
place and having it automatically distributed to the right places.
... OWL, SKOS? is it exposed via D2R or SWObjects? Licensing information?
... you also need to know the contents. having very condensed 
information about what is contained in the named graph.

jeff: we are extracting data from tables, we have a curator working on that.
... e.g., how up- and down-regulation is represented. we use a mixture 
of automated tools and manual curation.
... the tables usually come from HTML/PDF version of papers. sometimes 
also from supplemental material.

scott: another aspect (having spoken to chis stoeckert)... if we take 
this not only to MGED, but also the publishers, and try them to get 
researchers to submit gene lists, that would solve this problem in the 
future.

kei: the NIF ontology will also be deposited in NCBO BioPortal
... BioPortal has its own SPARQL endpoint, too
... will there be redundancy? which endpoints / URIs will I use?

jeff: Neurolex is the 'working draft', before it goes through the 
rigours of ontology engineer.
... NCBO is a community place.

scott: i suppose that some of the data released in september will also 
contain the data that was annotated

jeff: yes

scott: you could also make that data available from NCBO

kei: another topic: gene lists. a number of us have been working on how 
to represent gene lists.
... we could look at Neurolex to see which neuroscience terms we can 
extract form these endpoints that would be relevant for annotation.
... matthias has also been working with aTags, used NCBO resources.
... we need an iterated process of debugging, based on use-cases
... i will be away, jun will convene some of the calls

stephen: we would be happy to receive feedback, suggestions for links.

scott: one potential use-case would be EHRs, helping clinicians with 
certain tasks through integrated information.

*****Excerpts End*******

Kei Cheung wrote:
> This is a reminder that the next BioRDF telcon call will be held at 
> 11  am EDT (4 pm CET) on Monday, June 14 (see details below).
>
> Jeff Grethe and Stephen Larson will join the call to talk to us ahout 
> NIF SPARQL endpoints.
>
> Cheers,
>
> -Kei
>
>
> == Conference Details ==
> * Date of Call: Monday, June 14, 2010
> * Time of Call: 11:00 am Eastern Time (4 pm CET)
> * Dial-In #: +1.617.761.6200 (Cambridge, MA)
> * Dial-In #: +33.4.89.06.34.99 (Nice, France)
> * Dial-In #: +44.117.370.6152 (Bristol, UK)
> * Participant Access Code: 4257 ("HCLS")
> * IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for
> details, or see Web IRC), Quick Start: Use
> http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls for
> IRC access.
> * Duration: ~1 hour
> * Frequency: bi-weekly
> * Convener: Kei
> * Scribe: to-be-determined
>
> ==Agenda==
> * Introduction (Kei)
> * NIF SPARQL endpoints (Jeff, Stephen)
> * Gene list RDF representation (Lena, Satya, Jun, Scott)
>
Received on Monday, 14 June 2010 18:42:07 UTC