RE: From strings to things: ClinicalTrials.gov

Alan,

You are absolutely right. In this world of excessive digital data human readability and intuitive nature of the expression and links is very critical. For this reason, for chemical structures we developed Chem-BLAST (http://xpdb.nist.gov/chemblast/pdb.pl ) were the links are on images of structural 'root's rather than InChI or other form of IDs such as pubchem-ID. Of course chemical data are easier that text-based data. 
More recently, we are working on developing such an approach for text-based data. First step in enabling such approach is to establish a list of highly re-used short terms ('root') of communications/links. For this reason we built a resource of 'roots' for bio-medical data (http://xpdb.nist.gov/bioroot/bioroot.pl  and  http://xpdb.nist.gov/bioroot/bioblocks.pl  ). This work is not complete yet, but we believe that we are making progress.

Thanks

T N Bhat

-----Original Message-----
From: Alan Ruttenberg [mailto:alanruttenberg@gmail.com] 
Sent: Sunday, February 17, 2013 1:49 PM
To: Oktie Hassanzadeh
Cc: Kerstin Forsberg; public-semweb-lifesci@w3.org; em@zepheira.com; cdsouthan@hotmail.com; brendan.kelleher@karmadata.com
Subject: Re: From strings to things: ClinicalTrials.gov

Oktie,

One thing I think would be helpful is attending more to using human readable labels for terms. For example, if we browse directly at linkedct.org we see a lot of long strings of numbers. But for most of these there is a reasonable label. For example, under outcomes, the first element is printed as 92f8444723382d2b6f2c06f69f3fe6f8, but if we browse to http://linkedct.org/resource/outcome/92f8444723382d2b6f2c06f69f3fe6f8/

we see the property measure "Graft vs tumor effect as measured by CT scan at days 30, 60, and 100 following transplant", which in this case is a reasonable label. Similarly on this page we see provenances as http://clinicaltrials.gov/show/NCT00003553?displayxml=true, whereas:
"Clinicaltrials.org record for the study: Peripheral Stem Cell Transplant in Treating Patients With Metastatic Kidney Cancer" is much more inviting. That would link to the same place, but give the viewer a reason to hit the link.

One thing that's happened when I've tried to engage clinical colleagues with linkedct is that it is hard for them to get past this (and frankly for me too).

Contact me off list if you want to understand the issue with the licensing you've chosen.

hth,
Alan

On Sat, Feb 16, 2013 at 6:47 PM, Oktie Hassanzadeh <oktie@cs.toronto.edu> wrote:
> Dear Kerstin,,,
>
> LinkedCT provides many external links including the seeAlso links you 
> have pointed out, so the data is clearly 5-star Linked Data.
>
> Regarding the type of the links, there were long discussions at some 
> point on this same list I believe on whether or not we should use 
> sameAs to link to other resources, and the conclusion was that it's 
> safer to use seeAlso since stating that an intervention on LinkedCT is 
> the same as a drug on DBpedia for example, may be inaccurate.
>
> Regarding the quality and the quantity of the external links, we 
> clearly can do better (and that's what we are planning to do), but 
> existing links have already proven useful in a couple of use cases 
> that take advantage of the links to PubMed, DrugBank, and DBpedia. One 
> example is the LinkedSPLs work lead by Rich Boyce:
>
> Dynamic enhancement of drug product labels to support drug safety, 
> efficacy, and effectiveness R.D. Boyce et al. Journal of Biomedical 
> Semantics 4(1), 5, BioMed Central Ltd, 2013
>
> Cheers,
> Oktie
>
>
>
> On Sat, Feb 16, 2013 at 11:36 AM, Kerstin Forsberg 
> <kerstin.l.forsberg@gmail.com> wrote:
>>
>> Dear Oktie,
>>
>> Yes, and I'm also pointing colleagues to this great dataset part of 
>> LODD (http://linkedct.org).
>>
>> Two reflections:
>> 1) My understanding is that colleagues are more comfortable with 
>> going directly to the source and use the XML download 
>> (http://clinicaltrials.gov/ct2/resources/download )
>> 2) I meant 5-star linked data in terms of linking outwards to 
>> existing identifiers instead of "internal" URIs like 
>> http://linkedct.org/resource/intervention/a0e0900a02a9fa5501b51b95c28

>> 1e3f9/
>> for Atorvastatin (Intervention).
>>
>> Looks like you do a good job with your See also links, e.g.
>> http://dbpedia.org/resource/Atorvastatin and
>> http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01076) 
>> However, my understanding is that some of the types of things are 
>> quite challenging, see for example Drug Identification Links: 
>> Connecting Up,
>> http://www.citeulike.org/user/cdsouthan/article/10423875

>>
>> Kerstin
>>
>>
>>
>> 2013/2/16 Oktie Hassanzadeh <oktie@cs.toronto.edu>
>>>
>>> Dear Kerstin,
>>>
>>> Have you ever looked at http://linkedct.org ?
>>>
>>> LinkedCT uses a complex process to turn ClinicalTrials.gov into 
>>> high-quality 5-start Linked Data. And yes it does provide HTTP URIs 
>>> for all the "things" on ClinicalTrials.gov, provides HTML or RDF, 
>>> SPARQL endpoint, etc.
>>>
>>> Please take a look at http://linkedct.org , 
>>> http://linkedct.org/stats/ , and http://linkedct.org/faq/ , and the 
>>> following articles for any questions you might have.
>>>
>>> Oktie Hassanzadeh, Soheil Hassas Yeganeh, Renée J. Miller: Linking 
>>> Semistructured Data on the Web. WebDB 2011 Oktie Hassanzadeh, 
>>> Anastasios Kementsietsidis, Lipyeow Lim, Renée J.
>>> Miller, Min Wang: LinkedCT: A Linked Data Space for Clinical Trials. 
>>> CoRR
>>> abs/0908.0567 2009
>>>
>>>
>>> Cheers,
>>> Oktie
>>>
>>> ========================
>>> Oktie Hassanzadeh
>>> oktie@cs.toronto.edu
>>> http://www.cs.toronto.edu/~oktie

>>>
>>>
>>> On Sat, Feb 16, 2013 at 7:58 AM, Kerstin Forsberg 
>>> <kerstin.l.forsberg@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> a couple of tweets, blog post comments 1) and email exchanges 
>>>> during the week on moving ClinicalTrials.gov "from strings to 
>>>> things" made me think this could be a topic for discussion at the 
>>>> upcoming CSHALS. As I'll not be able to be there in person I'm using this email list to hear your thoughts.
>>>>
>>>> Background:
>>>> We see many nice examples of curated/standardized feeds of CT.gov 
>>>> data, such as http://linkedct.org, 
>>>> http://www.patientslikeme.com/clinical_trials

>>>> and http://www.clinicalcollections.org/trials/ etc.. Most of them 
>>>> do a good job in turning “strings into things” and a few of them 
>>>> apply the Linked Data principles. However, I don’t think any of 
>>>> them use http-based URIs to identify things such as sponsor 
>>>> organization, clinical sites, clinical investigators, geography, disease, drug, and time.
>>>>
>>>> I argue that we as a community caring for clinical trials data 
>>>> should push back to FDA and NLM to get an official, standardized, 
>>>> linked data interface directly to the CT.gov at source. And yes, 
>>>> also for FDA and NLM to push back to pharma companies to provide 
>>>> standardized data about our trials with URIs to identify things 
>>>> instead of all these text strings. And also if pharma company 
>>>> websites such as http://www.gsk-clinicalstudyregister.com/

>>>> and http://www.astrazenecaclinicaltrials.com/ did the same.
>>>>
>>>> Given the current movement for clinical trial data transparency 2) 
>>>> I may think the timing is good. But, potentially challenging both 
>>>> for FDA, NLM and for the pharma companies. They (we) will all look 
>>>> for practical advice on what URIs to use for things such as drugs and organizations.
>>>>
>>>> Thoughts?
>>>> Kerstin
>>>>
>>>>
>>>> 1)
>>>> http://blog.karmadata.com/2013/02/11/loading-clinical-trials-data-i

>>>> n-ten-minutes-flat/comment-page-1/#comment-20
>>>> 2)
>>>> http://www.placebocontrol.com/2013/02/our-new-glass-house-gsks-comm

>>>> itment-to.html
>>>
>>>
>>
>

Received on Tuesday, 19 February 2013 16:07:01 UTC