Re: BioRDF Announcement

2008/7/23 Olivier Bodenreider <olivier@nlm.nih.gov>:
> Peter Ansell wrote:
>>
>> 2008/7/22 Olivier Bodenreider <olivier@nlm.nih.gov>:
>>
>>>
>>> [...]
>
>>> Regarding the UMLS Metathesaurus, there are various kinds of restrictions
>>> listed in the license agreement
>>> (http://wwwcf.nlm.nih.gov/umlslicense/snomed/license.cfm), which is why
>>> most
>>> UMLS-based services (e.g., Knowledge Source Server GUI and API, MetaMap,
>>> etc.) require authentication. There have been discussions for a while
>>> here
>>> at NLM about providing a subset of the UMLS that could be freely
>>> distributed. Currently, such source vocabularies (with "source
>>> restriction
>>> level = 0") can be easily extracted from the Metathesaurus using
>>> MetamorphoSys. As you mention, SNOMED CT, while freely available in the
>>> member countries of the IHTSDO, cannot be made publicly available.
>>> I have plans to work on an RDF version of MeSH that could be made
>>> publicly
>>> available. EricN has encouraged me to do it for quite some time now, but
>>> I
>>> haven't still done it yet.
>>>
>>> Even through SNOMED CT cannot be made available as a, say, RDF endpoint,
>>> I
>>> think it is still useful to consider (non-dereferenceable) URIs based on
>>> SNOMED CT concept identifiers for annotation purposes in Semantic Web
>>> applications.
>>>
>>>
>>
>> How is it legal to utilise identifiers based on SNOMED for work done
>> outside the US? The license agreement seems to restrict these things
>> as you would never be able to create the non-dereferenceable
>> identifiers or search them without them being a derivative of what
>> seems to be a heavily restricted data set.
>>
>
> I am not a lawyer, but as I understand it, the license agreement prevents
> anyone (but the IHTSDO) to make SNOMED CT available "to the world" in any
> form. It does not however, as pointed out by John Madden, prevent a user
> from some IHTSDO member country to map a dataset to SNOMED CT (i.e., to
> enrich this dataset with SNOMED CT identifiers) and make this dataset
> available outside IHTSDO member countries. Depending on the use case, a
> SNOMED CT license might or might not be needed for fully exploiting the
> dataset outside IHTSDO member countries.
> I agree, however, that it would be good to have the IHTSDO confirm the
> universal legality of SNOMED CT-based URIs. It would be even better if the
> IHTSDO would create, maintain and promote such SNOMED CT-based URIs. What
> this group (HCLS) could contribute is a series of use cases justifying the
> involvement of the IHTSDO.

A series of use cases to convince them it would be worth it definitely
sounds good, although it would be really nice to be able to integrate
the relevant metadata (if only rdfs:label and rdf:type) into a
database in order to automatically exploit these parts in SPARQL
queries.

>> On the note of MeSH is what Bio2RDF have done to it illegal in any way
>> when it is intended for universal redistribution? [1]
>>
>> Cheers,
>>
>> Peter
>> [1] http://bio2rdf.org/download/
>>
>
> The English version of MeSH used in Bio2RDF is a "level 0 source", which
> means that there are no specific restrictions attached to it (unlike SNOMED
> CT, for example). MeSH is also publicly available outside the UMLS, provided
> users agree with the following terms and conditions of use:
> http://www.nlm.nih.gov/mesh/termscon.html, which I assume the Bio2RDF did.

That reassures me to know it is one of the least restricted sources.

> Beyond legality, one major issue to me is *authority*. While Bio2RDF did an
> important and generally excellent job in converting various resources to
> RDF, it is unclear to me 1) how long such an effort is sustainable and 2)
> how reflective it is of the semantics of the original resource.
> Quick example about 2): Entry terms in MeSH are generally not equivalent to
> synonyms, but are labeled as such after conversion to RDF in Bio2RDF.
> My point here is that, as much as possible, the originator of a resource
> should take responsibility for its conversion to and sustained availability
> in RDF. Again, this is NOT a criticism of Bio2RDF, but rather my view of the
> information sources.

It would be nice to have a realistic semantics to use in Bio2RDF, but
the catch-22 so far has been that not everyone has done it in a
coherent manner using standard URI's for different databases, and
hence providers either haven't wanted to publish their information
themselves because they were unsure about the naming convention, or
they haven't seen the value in having everyone publish in RDF using
URI coreference to denote relationships. Once they see the value in
the URI coreference part I think providers will naturally want to
publish their datasets themselves and provide a more coherent
semantics for people who want to focus on that part as opposed to a
vision of linked data-bases which provide access but not necessarily
semantic meaning such as the synonym example you provide.

Having said that, I think there are enough databases done so far that
we could really do with some providers getting in and focusing on both
providing the RDF information themselves using standard URI's (not
necessarily http://bio2rdf.org/, as any standard address/format will
work if it is simple to use and implemented across the board), and/or
giving suggestions via our mailing list or here as to which
descriptions that are currently available are misleading, like the
synonym example, and what could be done to change it.

Thanks for your suggestions!

Cheers,

Peter

Received on Tuesday, 22 July 2008 21:56:17 UTC