- From: Egon Willighagen <egon.willighagen@gmail.com>
- Date: Mon, 23 Nov 2009 21:29:23 +0100
- To: Susie Stephens <susie.stephens@gmail.com>
- Cc: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>, Bioclipse-devel ML <bioclipse-devel@lists.sourceforge.net>
Hi all, next Wednesday I unfortunately cannot participate because of family obligations. On Mon, Nov 23, 2009 at 5:19 PM, Susie Stephens <susie.stephens@gmail.com> wrote: > Here's the reminder for Wednesday's LODD telcon. I was up for a data update, so will have to do like this... my introduction to this list is ancient, so before. My background is cheminformatics and chemometrics (statistics/data analysis on chemical data). I'm a strong believer in Open Data, Open Source and Open Standards, and (past) developer of several projects, including Strigi-chemical (chemistry extension for the KDE desktop search engine), the Chemistry Development Kit, JChemPaint, Jmol, Jmol, and several other ones. Right now, I am postdoc in a drug discovery group at Uppsala University (Prof. Wikberg) and developing the cheminformatics use at the department, which includes the Bioclipse workbench. Proteochemometrics is the main statistical method used in our group, and model validation is clearly important. This is where RDF comes in: aggregation of data before model building, and for model validation afterwards. The latter will preferably be data which is related to the model, and not really of the same type. RDF is clearly one of the few methods up to this job. When I first joined the HCLS mailing list and conf calls, I saw very much focus on biological data, clinical data, but a lack of focus on the molecular chemistry behind all, which is actually crucial for the cheminformatics and proteochemometrics. So, that more or less defines the area where I contribute to the RDF activities... the border of molecular data and drug-related properties. So far, I have developed an extension for Bioclipse to deal with RDF, and it currently supports an in memory triple store, SPARQL queries on the in memory stores as well as on remote SPARQL end points. Like the most of Bioclipse2, it is scriptable, which allows easy building of small programs or workflows to integrate RDF into other Bioclipse extension, including the cheminformatics functionality, but also Jmol. There is also an R interface, to bridge with statistical modeling. Last week Friday, I gave a talk about this work at SWAT4LS in Amsterdam, and my slides are available in my blog [0]. Getting back to the data, I am working on making various unique molecular property resources available as RDF. This includes the GNU FDL-licensed NMRShiftDB data, which contains NMR spectra (mostly carbon-13) used for metabolite identification (think finding biomarkers). There are also two smaller CC0 data sets, one based on ChemPedia [1], a new crowd-sourcing endeavor for naming molecules (no i18n support yet, but requested), and the RDF Open Notebook Science Solubility project [2], which we described in a Chapter in the recent Beautiful Data book from O'Reilly. There are other things I am doing, which include an ontology for molecular (or QSAR) descriptors, and a RDF equivalent for the cheminformatics data model used by the CDK. This would, though I am myself not convinced this is really where we want to go, allow serialization of full molecular structures as RDF data, though parts of this may very well be rather useful for XHTML+RDFa for scientific publication of, for example, organic synthesis papers... I'd very much like to help get these data sets into the LODD network (particular the last two, which are easiest because of the CC0 license). One thing I want to do soon (actually, as part of the SWAT4LS proceedings paper), is create a data set with CDK-based molecular similarities. The CDK can calculate various, and this will create a nice sparse matrix. I'm leaning towards doing the molecules in DBPedia, but and more than Open to analyse other Open data sets too (bearing a proper license, or proper Public Domain statement, like CC0). I'll put up the final script on MyExperiment.org anyway, for others to analyze other data sets. No ETA for that, though. An example script downloads molecules from DBPedia and visualizes them 2D in a molecule table [3,4]. I am looking forward to hearing your comments and ideas on this work. Regards, Egon 0.http://chem-bla-ics.blogspot.com/2009/11/swat4ls-linking-open-drug-data-to.html 1.http://chem-bla-ics.blogspot.com/2009/11/chempedia-rdf-1-sparql-end-point.html 2.http://chem-bla-ics.blogspot.com/2009/11/open-notebook-science-solubility-sparql.html 3.http://egonw.posterous.com/molecules-in-dbpedia-visualized-with-bioclips 4.http://www.myexperiment.org/workflows/927 -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Received on Monday, 23 November 2009 20:30:24 UTC