- From: Richard Boyce <rdb20@pitt.edu>
- Date: Thu, 5 Jul 2012 21:49:53 -0400
- To: <public-semweb-lifesci@w3.org>
- Message-ID: <4FF64441.8040801@pitt.edu>
Awesome start Matthias. Glad to help with fixing the product label mappings (using LinkedSPLs http://tinyurl.com/d46azay). re #4 below, I just mapped all product labels and active moieties in LinkeSPLs to RxNORM Purls within the bioportal's SPARQL endpoint - maybe that will be helpful. Will make time to look into this mid-next week - do you have a repository where you code is located? Please keep me in the loop otherwise. -Rich On 07/04/2012 05:21 PM, Matthias Samwald wrote: > Dear all, > I published a first prototype of the "Medical Microdata Compendium", a > collection of open medical and pharmacological datasets with markup > conforming to the recently updated schema.org and the microdata > format. The *long-term goal* of this project is to provide structured > medical and pharmacological information to search engines to enable > better decision making by doctors and patients. The far more humble > *short-term goal* is to research how microdata can be used for > retrieving and querying biomedical information, and to come up with > interesting demonstrations and use-cases. > The data can be viewed here: > http://samwald.info/medical_microdata/ > At the moment this is a flat list of web pages, with each page > describing a formulated pharmaceutical or a substance. The data were > derived from the DailyMed and DrugBank datasets from the LODD > collection <http://www.w3.org/wiki/HCLSIG/LODD/Data>. > Example of a DrugBank resource: > http://samwald.info/medical_microdata/drugbank_resource_drugs_DB00175.html > Example of a DailyMed resource: > http://samwald.info/medical_microdata/dailymed_resource_drugs_3580.html > You can extract the structured data from these pages with a variety of > tools. For example, You can use the Sindice inspector: > http://inspector.sindice.com/inspect?url=http%3A%2F%2Fsamwald.info%2Fmedical_microdata%2Fdrugbank_resource_drugs_DB00175.html > At the moment I am evaluating how different search engines can cope > with the data. For example, the microdata can already be used by > Google Custom Search Engines. Other 'semantic' search engines such as > http://sindice.com/ or the medical search engine developed by the > http://khresmoi.eu/ project should also be evaluated. > *If you are interested in joining the effort to evaluate how semantic > markup can be used to improve medical information search and decision > making, please send me an e-mail!* I would like to see this work > published as a journal paper, and could use some co-authors. I > appreciate every feedback606725 or idea! > ** > Regarding the Medical Microdata Compendium, there are several issues > that still need to be taken care of: > 1) The DailyMed resources are still riddled with character encoding > issues -- this is a problem of the LODD data source and will be > remedied by switching to a newer version of this dataset, Richard's > 'Linked Structured Product Labels'. > 2) Only a fraction of the properties of the source datasets have been > mapped, namely those where a close fit between a property in the > source dataset and schema.org could be found. This means that a lot of > useful data is not captured. I will look into using the proposed > schema.org extension mechanism > <http://schema.org/docs/extension.html> to see if it could help to > capture these additional properties and types. > 3) More datasets need to be converted, such as ClinicalTrials.gov (and > its linked data mirror http://linkedct.org/). This will also help to > better demonstrate interlinking of different datasets (e.g., from > disease to drug to ongoing clinical trials in the area). > 4) The generation of http://schema.org/MedicalCode entities needs to > be fixed. Also, we need to check how we can align with controlled > vocabularies that already have URIs (e.g. to BioPortal > <http://bioportal.bioontology.org/>taxonomies) > 5) General clean-up, code formatting and improvement of web design > Cheers, > Matthias Samwald -- Richard Boyce, PhD Assistant Professor of Biomedical Informatics Faculty, Geriatric Pharmaceutical Outcomes and Gero-Informatics Research and Training Program Scholar, Comparative Effectiveness Research Program University of Pittsburgh rdb20@pitt.edu 412-648-9219 (W), 206-371-6186 (C) Twitter: @bhaapgh
Received on Friday, 6 July 2012 01:51:28 UTC