W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > February 2013

Re: From strings to things: ClinicalTrials.gov

From: Brendan Kelleher <brendan.kelleher@karmadata.com>
Date: Sat, 16 Feb 2013 12:56:55 -0500
Message-ID: <CACv-H8-q9c5O5wwbWSjat_jcM59dta-1ZN=HB3oEXEMxdyV9wA@mail.gmail.com>
To: Kerstin Forsberg <kerstin.l.forsberg@gmail.com>
Cc: Oktie Hassanzadeh <oktie@cs.toronto.edu>, public-semweb-lifesci@w3.org, em@zepheira.com, cdsouthan@hotmail.com, Sean Power <sean.power@karmadata.com>
Kerstin, thanks for initiating the conversation.

Our philosophy has been to code fields to identifiers with inherent meaning
and leverage existing taxonomies whenever possible.  Thus we code
organizations to root URLs, drugs to INN or USAN names, disease to MeSH,
etc.  The more that's done at the source the better.


On Sat, Feb 16, 2013 at 11:36 AM, Kerstin Forsberg <
kerstin.l.forsberg@gmail.com> wrote:

> Dear Oktie,
>
> Yes, and I'm also pointing colleagues to this great dataset part of LODD (
> http://linkedct.org).
>
> Two reflections:
> 1) My understanding is that colleagues are more comfortable with going
> directly to the source and use the XML download (
> http://clinicaltrials.gov/ct2/resources/download )
> 2) I meant 5-star linked data in terms of linking outwards to existing
> identifiers instead of "internal" URIs like
> http://linkedct.org/resource/intervention/a0e0900a02a9fa5501b51b95c281e3f9/for Atorvastatin (Intervention).
>
> Looks like you do a good job with your See also links, e.g.
> http://dbpedia.org/resource/Atorvastatin and
> http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01076) However,
> my understanding is that some of the types of things are quite challenging,
> see for example Drug Identification Links: Connecting Up,
> http://www.citeulike.org/user/cdsouthan/article/10423875
>
> Kerstin
>
>
>
> 2013/2/16 Oktie Hassanzadeh <oktie@cs.toronto.edu>
>
>> Dear Kerstin,
>>
>> Have you ever looked at http://linkedct.org ?
>>
>> LinkedCT uses a complex process to turn ClinicalTrials.gov into
>> high-quality 5-start Linked Data. And yes it does provide HTTP URIs for all
>> the "things" on ClinicalTrials.gov, provides HTML or RDF, SPARQL endpoint,
>> etc.
>>
>> Please take a look at http://linkedct.org , http://linkedct.org/stats/ ,
>> and http://linkedct.org/faq/ , and the following articles for any
>> questions you might have.
>>
>> Oktie Hassanzadeh, Soheil Hassas Yeganeh, Renée J. Miller: Linking
>> Semistructured Data on the Web<http://webdb2011.rutgers.edu/papers/Paper%2027/paper27-camera-ready.pdf>.
>> WebDB 2011
>> Oktie Hassanzadeh, Anastasios Kementsietsidis, Lipyeow Lim, Renée J.
>> Miller, Min Wang: LinkedCT: A Linked Data Space for Clinical Trials<http://arxiv.org/abs/0908.0567>.
>> CoRR abs/0908.0567 2009
>>
>>
>> Cheers,
>> Oktie
>>
>> ========================
>> Oktie Hassanzadeh
>> oktie@cs.toronto.edu
>> http://www.cs.toronto.edu/~oktie
>>
>>
>> On Sat, Feb 16, 2013 at 7:58 AM, Kerstin Forsberg <
>> kerstin.l.forsberg@gmail.com> wrote:
>>
>>> Hi,
>>> a couple of tweets, blog post comments 1) and email exchanges during the
>>> week on moving ClinicalTrials.gov "from strings to things" made me think
>>> this could be a topic for discussion at the upcoming CSHALS. As I'll not be
>>> able to be there in person I'm using this email list to hear your thoughts.
>>>
>>> Background:
>>> We see many nice examples of curated/standardized feeds of CT.gov data,
>>> such as http://linkedct.org,
>>> http://www.patientslikeme.com/clinical_trials and
>>> http://www.clinicalcollections.org/trials/ etc.. Most of them do a good
>>> job in turning “strings into things” and a few of them apply the Linked
>>> Data principles. However, I don’t think any of them use http-based URIs to
>>> identify things such as sponsor organization, clinical sites, clinical
>>> investigators, geography, disease, drug, and time.
>>>
>>> I argue that we as a community caring for clinical trials data should
>>> push back to FDA and NLM to get an official, standardized, linked data
>>> interface directly to the CT.gov at source. And yes, also for FDA and NLM
>>> to push back to pharma companies to provide standardized data about our
>>> trials with URIs to identify things instead of all these text strings. And
>>> also if pharma company websites such as
>>> http://www.gsk-clinicalstudyregister.com/ and
>>> http://www.astrazenecaclinicaltrials.com/ did the same.
>>>
>>> Given the current movement for clinical trial data transparency 2) I may
>>> think the timing is good. But, potentially challenging both for FDA, NLM
>>> and for the pharma companies. They (we) will all look for practical advice
>>> on what URIs to use for things such as drugs and organizations.
>>>
>>> Thoughts?
>>> Kerstin
>>>
>>>
>>> 1)
>>> http://blog.karmadata.com/2013/02/11/loading-clinical-trials-data-in-ten-minutes-flat/comment-page-1/#comment-20
>>> 2)
>>> http://www.placebocontrol.com/2013/02/our-new-glass-house-gsks-commitment-to.html
>>>
>>
>>
>


-- 
Brendan Kelleher
director of information
karmadata
brendan.kelleher@karmadata.com
617.807.0032
Received on Saturday, 16 February 2013 21:44:45 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:01:18 GMT