[BIORDF] Status of the parkinson's disease demo

Hello everyone,

sorry that I could not make it to the recent Telecons.

---
Summary and status of the tasks for the parkinson's disease demo we planned during the F2F (in my understanding):
---

* Convert Senselab/NeuronDB [1] to RDF (done by Kei and his group). STATUS: almost done. However, when viewing the OWL file in Protege at the F2F I was still missing a lot of the data that is available on the NeuronDB website -- it seemed that the file consisted only of classes and dummy instances, but not relations (which are the most important thing we can derive from NeuronDB). Maybe Kei could shed light on this issue? 

* Debug the Senselab/NeuronDB OWL file. STATUS: ?

* Convert PDSP KiDB to OWL (done by myself). STATUS: done.

* Debug the PDSP KiDB OWL [2] file with pellet. STATUS: almost done. A single, elusive error remains, but this will soon be found. Pellet really is a great aid in debugging OWL -- at least much better than Protege (thanks for the tip, Alan).

* Convert MeSH [3] to OWL (done by myself). STATUS: done, already available as a SKOS file from [4].

* 'Convert' Pubchem in order to yield the relation between a CAS number from the PDSP KiDB with concepts from MeSH. This is more problematic than I thought. At the F2F, I made the suggestion use Pubchem only to extract the relation between CAS number and MeSH annotations. Some people also suggested that as much as possible from Pubchem should be converted or made accessible via wrappers. However, I think I did not stress enough that this would be a quite demanding task, as Pubchem is not only quite complex, but also very large - the XML export of Pubchem has hundreds of gigabytes. Furthermore, it seems that the static exports available via the FTP site of Pubchem do not contain all of the necessary information (e.g. MeSH annotations) - these are only contained in files that are the results of a search.
Therefore, I would still suggest to focus on simply extracting the CAS number - MeSH relation. I would also suggest that conversion should be limited only to a small, selected set of records that are useful for the demonstration.
STATUS: I queried the 'Pubchem Substance' database with the searchstring 'parkinson OR antiparkinsonian OR huntington OR dyskinesia OR hallucinogen OR neurotoxic OR serotonin OR dopamine OR glutamate', which gave over thousand results. These results were saved as XML. XQuery was used to extract the CAS number - MeSH relations from the resultset. Unfortunately, the end result turned out to be less useful than expected. This is partly caused by the fact that the metadata scheme of the Pubchem exports is not very concise, e.g. the MeSH terms are mixed with other kinds of annotations and they are represented as strings (e.g. 'ANTIPARKINSONIAN AGENTS') and not as MeSH - IDs. Very annoying.
I will continue to explore the data in Pubchem, but the first explorations were a bit disappointing. I hope I will find more useful results, otherwise we would need to re-think the structure of the demonstration a bit.

* Dissemination of the results, query mechanism, website and interface for the demonstration. STATUS: nothing done yet. I would suggest that for the time being, we should try to make a coherent semantic network out of all data sources and put it in a single triplestore. When this seems to work, we should try to simulate a distributed environment, where each datasource and the mappings between datasources is located on different SPARQL endpoints that can be queried via federated SPARQL queries. Many persons at the F2F (Vipul and others) suggested to use another solution that uses a federated query based on the Parkinson seed ontology, without requiring a mapping of the original data sources. The query algorithms would have to be written by our group and would give the user only limited possibilities for making queries (at least that was my understanding of the issue, please correct me if I am wrong). This will probably lead to a heated discussion in a few months.


kind regards,
Matthias Samwald



[1] http://senselab.med.yale.edu/senselab/
[2] http://pdsp.med.unc.edu/pdsp.php
[3] http://www.nlm.nih.gov/mesh/
[4] http://neuroscientific.net/index.php?id=download

Received on Friday, 13 October 2006 22:20:52 UTC