Re: Medical Microdata Compendium (Open Biomedical Datasets with schema.org annotation) -- was: Re: New proposal: health & medical extensions to schema.org

Awesome start Matthias. Glad to help with fixing the product label 
mappings (using LinkedSPLs http://tinyurl.com/d46azay).  re #4 below, I 
just mapped all product labels and active moieties in LinkeSPLs to 
RxNORM Purls within the bioportal's SPARQL endpoint - maybe that will be 
helpful. Will make time to look into this mid-next week - do you have a 
repository where you code is located? Please keep me in the loop otherwise.

-Rich




On 07/04/2012 05:21 PM, Matthias Samwald wrote:
> Dear all,
> I published a first prototype of the "Medical Microdata Compendium", a 
> collection of open medical and pharmacological datasets with markup 
> conforming to the recently updated schema.org and the microdata 
> format. The *long-term goal* of this project is to provide structured 
> medical and pharmacological information to search engines to enable 
> better decision making by doctors and patients. The far more humble 
> *short-term goal* is to research how microdata can be used for 
> retrieving and querying biomedical information, and to come up with 
> interesting demonstrations and use-cases.
> The data can be viewed here:
> http://samwald.info/medical_microdata/
> At the moment this is a flat list of web pages, with each page 
> describing a formulated pharmaceutical or a substance. The data were 
> derived from the DailyMed and DrugBank datasets from the LODD 
> collection <http://www.w3.org/wiki/HCLSIG/LODD/Data>.
> Example of a DrugBank resource:
> http://samwald.info/medical_microdata/drugbank_resource_drugs_DB00175.html
> Example of a DailyMed resource:
> http://samwald.info/medical_microdata/dailymed_resource_drugs_3580.html
> You can extract the structured data from these pages with a variety of 
> tools. For example, You can use the Sindice inspector:
> http://inspector.sindice.com/inspect?url=http%3A%2F%2Fsamwald.info%2Fmedical_microdata%2Fdrugbank_resource_drugs_DB00175.html
> At the moment I am evaluating how different search engines can cope 
> with the data. For example, the microdata can already be used by 
> Google Custom Search Engines. Other 'semantic' search engines such as 
> http://sindice.com/ or the medical search engine developed by the 
> http://khresmoi.eu/ project should also be evaluated.
> *If you are interested in joining the effort to evaluate how semantic 
> markup can be used to improve medical information search and decision 
> making, please send me an e-mail!* I would like to see this work 
> published as a journal paper, and could use some co-authors. I 
> appreciate every feedback606725 or idea!
> **
> Regarding the Medical Microdata Compendium, there are several issues 
> that still need to be taken care of:
> 1) The DailyMed resources are still riddled with character encoding 
> issues -- this is a problem of the LODD data source and will be 
> remedied by switching to a newer version of this dataset, Richard's 
> 'Linked Structured Product Labels'.
> 2) Only a fraction of the properties of the source datasets have been 
> mapped, namely those where a close fit between a property in the 
> source dataset and schema.org could be found. This means that a lot of 
> useful data is not captured. I will look into using the proposed 
> schema.org extension mechanism 
> <http://schema.org/docs/extension.html> to see if it could help to 
> capture these additional properties and types.
> 3) More datasets need to be converted, such as ClinicalTrials.gov (and 
> its linked data mirror http://linkedct.org/). This will also help to 
> better demonstrate interlinking of different datasets (e.g., from 
> disease to drug to ongoing clinical trials in the area).
> 4) The generation of http://schema.org/MedicalCode entities needs to 
> be fixed. Also, we need to check how we can align with controlled 
> vocabularies that already have URIs (e.g. to BioPortal 
> <http://bioportal.bioontology.org/>taxonomies)
> 5) General clean-up, code formatting and improvement of web design
> Cheers,
> Matthias Samwald


-- 
Richard Boyce, PhD
Assistant Professor of Biomedical Informatics
Faculty, Geriatric Pharmaceutical Outcomes and Gero-Informatics Research and Training Program
Scholar, Comparative Effectiveness Research Program
University of Pittsburgh
rdb20@pitt.edu
412-648-9219 (W), 206-371-6186 (C)
Twitter: @bhaapgh

Received on Friday, 6 July 2012 01:51:28 UTC