Re: BioRDF call -- update about aTags, text extraction

I cannot join the call today, so let me summarize recent progress here:

Extending the Science Commons Text Annotation Service with support for additional entities and aTag output format
aTags can now be generated automatically from Pubmed abstracts and result sets of Pubmed queries. I extended the Science Commons Text annotation service (http://whatizit.neurocommons.org) service, which uses the 'Whatizit' entity recogntion web service [1] provided by the Rebholz group at EBI. These extensions are NOT available on the public website yet (sorry), but I will make them available online in the next 2-3 days.

Sentences in Pubmed abstracts that contain one or more named entities are annotated with embedded RDFa, following the aTag convention. You can use this service to create RDF/OWL statements based on Pubmed. Of course, entities are identified with URIs from OBO and "linked data" resources. This can be a simple entry point for creating RDF/OWL information for arbitrary Pubmed queries.

Currently, the service recognizes entities from Gene Ontology, ChEBI, some smaller OBO ontologies, Uniprot and NCBI Taxonomy. Potentially it could also link to Drug Bank and to a disease ontology, but this is made a bit difficult because of the types of identifiers returned by the original EBI web service. For example, the EBI service uses the secondary DrugBank identifiers to identify drugs, while the Linked Data version of DrugBank [2] uses the primary DrugBank identifiers to create URIs for database records -- therefore, I cannot currently use the EBI service to easily generate annotations with DrugBank URIs. I will look into solving this issue.  Furthermore, I will also update the service when new URI schemes are available (new OBO URIs; URIs minted by the Shared Names initiative). Support for caching is also on my todo list.

Ligand-receptor interaction data now available in aTag format

I have created a conversion of (part of) the PDSK Ki database of ligand-receptor interactions into aTag format. It uses identifiers from NeuronDB and ChEBI for tagging. Another version that uses DBpedia identifiers is planned.

Ligand-receptor interaction aTags, collected in a single HTML file, you can extract the RDF with a RDFa/GRDDL parser of your choice:
http://hcls.deri.org/atag/data/kidb_atags.html

RDF in Turtle format, extacted from this HTML page:
http://hcls.deri.org/atag/data/kidb_atags_jun_2009.ttl

The Ki value is a measure of how strong ligand and receptor interact, smaller numbers mean stronger interaction.

Further information:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/aTags/datasets

HCLS KB at DERI
We got a new server for the HCLS KB at DERI (with 6 separate hard drives, which should improve query performance considerably). I will also upgrade the HCLS KB from Virtuoso 5 to Virtuoso 6, which should also bring performance improvements. The only problem at the moment is finding the time to do the server migration...

Cheers,
Matthias Samwald

[1] http://www.ebi.ac.uk/webservices/whatizit/
[2] http://www4.wiwiss.fu-berlin.de/drugbank/






--------------------------------------------------
From: "Kei Cheung" <kei.cheung@yale.edu>
Sent: Saturday, June 06, 2009 1:48 AM
To: "HCLS" <public-semweb-lifesci@w3.org>
Subject: BioRDF call

> This is a reminder that the next BioRDF teleconf. will be held at 11 am 
> EDT (5 pm CET) on Monday, June 8 (see details below).
> 
> Cheers,
> 
> -Kei
> 
> == Conference Details ==
> * Date of Call: Monday June 8, 2009
> * Time of Call: 11:00 am Eastern Time
> * Dial-In #: +1.617.761.6200 (Cambridge, MA)
> * Dial-In #: +33.4.89.06.34.99 (Nice, France)
> * Dial-In #: +44.117.370.6152 (Bristol, UK)
> * Participant Access Code: 4257 ("HCLS")
> * IRC Channel: irc.w3.org port 6665 channel #hcls (see 
> [http://www.w3.org/Project/IRC/ W3C IRC page] for details, or see 
> [http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC])
> * Duration: ~1 hour
> * Frequency: bi-weekly
> * Convener: Kei Cheung
> 
> == Agenda ==
> * Provenance/workflow presentation (Satya)
> * HCLS KB update (Matthias, Adrian)
> * Image data (Rob)
> * SPARQL control access (Eric, Helena)
> * Shared name -- pathway use case (Eric, Scott, Helena)
> * AIDA (Scott)
> * TCM data (Jun)
> * aTag (Matthias)
>

Received on Monday, 8 June 2009 10:50:09 UTC