Re: RDF Schema / LODD mapping -- Re: New proposal: health & medical extensions to schema.org

Dear all,

I have finished a first conversion of some key datasets from the "Linked Open Drug Data" collection to schema.org with medical extensions. At the moment, I converted the datasets from Drugbank [1] and Dailymed [2]. I can work on mapping other datasets such as RxNorm, DBpedia and ClinicalTrials.gov as well, if this pilot leads to promising results.

The RDF of the conversion is available at
http://samwald.info/res/medical-schema-org/pharmaceutical-information-according-to-schema-org.ttl

Beware that this file is quite large (33 MB). I have published it uncompressed so that it is more transparent to web crawlers.

How this was done
To create this file, I extracted the RDF triples from the RDFa file provided by Aaron (outcome available at [3]). I had to fix a minor bug to make that happen correctly (there was whitespace in some of the labels and URIs). Then I manually created a mapping file between the schema.org extensions and the entities and properties used in the Drugbank and Dailymed datasets. This mapping is based on RDF Schema and Simple SPARQL Rules, and is available at [4] -- please have a look. I loaded all these files together with the LODD datasets into a triplestore with RDFS reasoning and executed the SPARQL Rules, yielding the final pharmaceutical-information-according-to-schema-org.ttl file.

Where to go from here
It would be great to evaluate how Google and other search engines (such as Khresmoi [5]) can use structured information based on schema.org to improve access to medical / pharmaceutical information. To do this, we could set up web sites based on these datasets with embedded Microdata (or RDFa lite?) statements. Then we could compare the usability of schema.org-aware search engines with standard search engines (e.g., a normal Google Custom Search Engine). I think this could provide a very impressive example of what schema.org markup enables (and probably a nice scientific article).

@ Aaron: What do you suggest as the next steps for setting up such a test scenario? Are there any prototypical search tools from Google on the horizon that we could use?
@ Aaron: If you want to get some more detailed feedback from me about the schema.org extensions and some modelling choices, we should probably get in contact via Skype.
@ All: Do you have any suggestions for automatically publishing RDF datasets as HTML-with-Microdata or HTML-with-RDFa? Or do we need to write a script from scratch?

[1] http://drugbank.ca/ 
[2] http://dailymed.nlm.nih.gov/dailymed/
[3] http://samwald.info/res/medical-schema-org/schema_org_rdfa.ttl
[4] http://samwald.info/res/medical-schema-org/schema_org_2_LODD_mapping.ttl
[5] http://khresmoi.eu/

Best,
Matthias




From: Matthias Samwald 
Sent: Monday, May 21, 2012 1:35 PM
To: Aaron Brown 
Cc: Dan Brickley ; public-semweb-lifesci@w3.org 
Subject: RDF Schema / LODD mapping -- Re: New proposal: health & medical extensions to schema.org


Dear Aaron,

I think it might be an interesting exercise to publish some of the "Linked Open Drug Data" [1] datasets as microdata that adheres to the proposed extensions. These datasets were published in RDF format by members of the W3C Health Care and Life Science Interest Group. Mapping these datasets to your proposed schema.org extensions would be much easier if we had an RDF Schema of those extensions (which is available for the official schema.org via [2] and [3]). Could you make an RDF schema of your extensions available?

[1] http://www.w3.org/wiki/HCLSIG/LODD/Data
[2] http://schema.org/docs/schemaorg.owl
[3] http://schema.rdfs.org/all.ttl

Cheers,
Matthias Samwald



From: Michel Dumontier 
Sent: Wednesday, May 16, 2012 4:05 PM
To: w3c semweb hcls 
Cc: Aaron Brown ; Dan Brickley 
Subject: New proposal: health & medical extensions to schema.org


Hi all, 


 Aaron Brown (@google) and others have been working on a health/medical extension to schema.org ->  http://schemaorg-medicalext.appspot.com/. It's also linked on the W3 wiki at
http://www.w3.org/wiki/WebSchemas/MedicalHealthProposal, along with other proposals - http://www.w3.org/wiki/WebSchemas. Have a look at the medical/health proposal and tell us what you think - I'd love to hear from those that are active in creating or consuming web page content (SciDisc, atags, Mark Wilkinson's Personal Health Lens, etc).


 Reserve Friday June 1 @ 11am (Terminology task force slot) for a special meeting discuss the proposal and we'll craft some feedback for the public mailing list at public-vocabs@w3.org. 


Cheers!


m.


---------- Forwarded message ----------
From: Dan Brickley <danbri@danbri.org>
Date: Tue, May 15, 2012 at 10:49 AM
Subject: Fwd: New proposal: health & medical extensions to schema.org
To: eric@w3.org, team-hcls-chairs@w3.org, Aaron Brown <abbrown@google.com>
Cc: ivan@w3.org


Eric, HCLS folk, Ivan,

I want to introduce you to Aaron Brown, and pass along his msg below
introducing some work on health/medical markup for use in the public
Web, part of the schema.org project which is a collaboration amongst
several search engines to improve structured data usage within HTML.

Aaron has been busy with a pretty substantial medical/health
vocabulary, and yesterday circulated a first public version for
feedback/comments. I wanted to ask your advice on how best we might
connect this with the various activities of the HCLS W3C group. The
message below is public (see
http://lists.w3.org/Archives/Public/public-vocabs/2012May/0057.html
), so we could just pass it along to the public HCLS list
http://lists.w3.org/Archives/Public/public-semweb-lifesci/ but if
you've any thoughts on how best to interact with HCLS that would be
really useful. The emphasis with the vocabulary Aaron's working on is
on in-page HTML markup rather than full/deep ontology engineering,
though there are obviously points of connection to such activities.
I'll leave Aaron to discuss the details (see his note below or ask in
this thread).

Thanks for any advice,

cheers,

Dan


ps. for a bit more background -
The public-vocabs@w3.org list is the main feedback/discussion forum
for the schema.org initiative. Within W3C it is the 'Web Schemas'
taskforce of the Semantic Web group, which I  chair. I also btw have
an @google affiliation for my schema.org work, though I don't formally
represent Google at W3C. Basically the Web Schemas group serves as a
liaison point between schema.org as an external entity, the W3C
community, and other groups producing metadata vocabularies. More
details c/o http://www.w3.org/wiki/WebSchemas ...


---------- Forwarded message ----------
From: Aaron Brown <abbrown@google.com>
Date: 14 May 2012 22:56
Subject: New proposal: health & medical extensions to schema.org
To: public-vocabs@w3.org


Hi all,

As I’ve alluded to before on this list
(http://lists.w3.org/Archives/Public/public-vocabs/2012Feb/0053.html),
over the past 6 months, a few of us at Google and other institutions
have been working on a set of schema.org extensions to cover the
health and medical domain. After several internal iterations and a lot
of feedback from initial reviewers (including the US NCBI; physicians
at Harvard, Stanford, and Duke; the major search engines; and a few
health web sites), we think we have a solid draft and would like to
open it for public feedback as a step toward incorporating it into
schema.org.

The proposed health/medical schema can be found at
http://schemaorg-medicalext.appspot.com/ which includes an
introduction as well as a snapshot of the type hierarchy and several
markup examples. It's also linked on the w3 wiki at
http://www.w3.org/wiki/WebSchemas/MedicalHealthProposal. As you'll see
this is a substantial piece of work, so we’d welcome feedback and
detailed review comments on the specifics (please follow up to this
email).

For those interested in more background on the approach: our goal is
to create schema that webmasters and content publishers can use to
mark up health and medical content on the web, with a particular focus
on markup that will help patients, physicians, and generally
health-interested consumers find relevant health information via
search. The scope of coverage for the schema is broad, and is intended
to cover both consumer- and professionally-targeted health and medical
web content (of course, any particular piece of online health/medical
content is likely to use only a subset of the schema). We’ve worked
with physicians, consumer web sites, and government health
organizations to get input into the key topics and properties to model
and to refine the schema structure and type/property documentation.

Note that it is explicitly not our goal to replace the many very good
and comprehensive medical ontologies, meta-thesaurii, or controlled
vocabularies that have been created over the years; our focus has been
instead on creating complementary, lightweight markup that surfaces
the existence of and relationships between entities in health/medical
web pages. When other ontologies and/or controlled vocabularies are
available, our proposed schema can link to and take advantage of them,
e.g. via the code property of MedicalEntity. It is also not an initial
goal to support automated reasoning, medical records coding, or
genomic tagging, as these would require substantially more detailed
(and hence high barrier-to-entry) modeling and markup; they could be
considered for future extensions.

We look forward to your feedback!

Thanks,

Aaron Brown (Google)

--
Aaron Brown | Senior Product Manager | Google, Inc. | New York, NY

Received on Thursday, 31 May 2012 14:28:24 UTC