URI thoughts

I often like to put together proposals as "strawmen", in order to  
provoke discussion while not needing to push a personal agenda  
through...

Based on the URI discussion we are having, and Alan's request to  
distinguish data records (Entrez, ENSEMBL, Uniprot) clearly from the  
"biological or chemical things" they are about, I have listed a few  
possible best practices to consider and the reasons why:

1) Dereferencing: The dereferencing of a URI to a data record results  
in the return of all the "authority managed" information about it  
(locally curated data) in the form of a RDF graph. Outside annotations  
would not be included unless the authority provided an open annotative  
service. This is what you get back when you query sources such as NCBI  
or EBI.

2) Versioning: A few useful pieces of metadata for changeable (mutable)  
URI-referenced RDF graphs (dereferenced) is what version is current,  
when it was assigned or created (date and time, UTC), and a reference  
to the sorted list of all earlier versions. This would allow precise  
rolling back to any version for performing a re-analysis of info from  
an earlier time.

3) Signifiers: Life science data records of bio or chem entities  
(genes, snps, protein, chemicals, agents, diseases, pathways,  
anatomical parts) should always reference a community agreed upon  
conceptualized bio/chem-entity, i.e., to what the scientist in his or  
her mind commonly and collectively regard when hearing "human GSK3  
beta". These could have ontologies layered on them when they become  
available. These entities represent the 'signifiers or signs' for the  
'signified or real-world objects' such as "Hu GSK3b" or " Mus MAP12"  
(for the curious, see http://en.wikipedia.org/wiki/Sign_(semiotics),  
btw the full RDF graph around an entity would be equivalent to Peirce's  
'interpretant'). They would exist as non-data objects, more like  
scientific placeholders, but can use rdfs:seeAlso to point to real data  
records of them. Data records by themselves WOULD NOT be of this  
special meta-class. If this sounds fuzzy to you, consider what it took  
to align most of the gene synonym names to one agreed symbol;  
sociologically this is no different.

4)  Covering Mapping: Propose an initial set of properties to support  
the above model. As a starter, define an equivalent of rdfs:isDefinedBy  
for life science that would specifically map an instance graph of the  
data record to the singular conceptualized bio/chem-entity, using  
something on the order of  hcls:isDefinedAs :

<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? 
db=gene&cmd=Retrieve&list_uids=2932>  <hcls:isDefinedAs>   
<http://purl.org/hcls/bioentity/hu_gsk3b>

In line with what Chimezie proposed, rdfs:seeAlso could be used to  
declare the inverse relation for a select set of data records; not sure  
if any new relation is needed here.

In the absence of any formal ontology that could cover all life  
sciences data records (e.g., Genes), a relational instance model might  
be more practical and appealing; A transitive rule could be proposed  
that states all data records referencing the same bio/chem-entity would  
be viewed as "bio/chem entity" equivalent, regardless of what  
ontology/rdfschema were used to define each of them:
(?data1 hcls:isDefinedAs ?ent) AND (?data2 hcls:isDefinedAs ?ent) ->  
(?data1 hcls:sameEntityAs ?data2 )

This is an example of what I had suggested as a "Covering", since there  
is no explicit need to use ontologies to map data records to common  
class-based concepts. owl:sameAs could be used hear, but  the  
'sameEntityAs' relation could have more selective meaning for this  
community in terms of data records and 'things'. I leave it open for  
discussion...

I'd be interested to hear how important and practical the points raised  
here are. The main objective I have is to try and get our common  
discussion to focus on some basic, agreeable points that we can work  
together on over the next (hopefully) few weeks.

cheers,
Eric


Eric Neumann, PhD
co-chair, W3C Healthcare and Life Sciences,
and Senior Director Product Strategy
Teranode Corporation
83 South King Street, Suite 800
Seattle, WA 98104
+1 (781)856-9132
www.teranode.com 
    

Received on Tuesday, 20 June 2006 13:00:25 UTC