Re: URI thoughts from Jack Park on 2006-06-20 (public-semweb-lifesci@w3.org from June 2006)

From: Jack Park <jack.park@sri.com>
Date: Tue, 20 Jun 2006 06:41:41 -0700
To: w3c semweb hcls <public-semweb-lifesci@w3.org>
Message-ID: <4497FB15.8000805@sri.com>
I think that these valuable ideas can be supplemented by an additional 
set of properties (here, I am making a perhaps false assumption that 
Eric's "strawperson" proposal doesn't already anticipate subject 
identity properties). Appealing to Peircian notions for subject identity 
is, I believe, a good move. Along those lines, the Ph.D. thesis of Jens 
Eric Mai (see [1]) would be of great value. My thoughts are animated by 
notions of subject identity as the subject maps community uses that 
term; properties that help to identify a specific subject, perhaps along 
the lines reasoning support required in differential diagnosis, are 
necessary, when one doesn't happen to know a specific sign or symbol.

If one *knows* one is looking at "Hu GSK3b", the problem is simple. But, 
when one is comparing two ontologies, where "Hu GSK3b"  is not 
specifically identified, an appeal to the property sets that describe 
the entities becomes the basis for comparison. Perhaps this is what item 
4 below, Covering Mapping, proposes, but maybe not. rdfs:isDefinedBy 
passes subject identity off to another entity. If that entity is well 
represented by a subject identity property (SIP) set, then, if 
rdfs:isDefinedBy has an inverse relation, say rdfs:defines, it becomes 
possible to start from the SIPs and maintain the graph.

My half EURO for the day...
Jack
[1] http://www.ischool.washington.edu/mai/pubs.html

Eric Neumann wrote:
>
>
> I often like to put together proposals as "strawmen", in order to 
> provoke discussion while not needing to push a personal agenda through...
>
> Based on the URI discussion we are having, and Alan's request to 
> distinguish data records (Entrez, ENSEMBL, Uniprot) clearly from the 
> "biological or chemical things" they are about, I have listed a few 
> possible best practices to consider and the reasons why:
>
> 1) Dereferencing: The dereferencing of a URI to a data record results 
> in the return of all the "authority managed" information about it 
> (locally curated data) in the form of a RDF graph. Outside annotations 
> would not be included unless the authority provided an open annotative 
> service. This is what you get back when you query sources such as NCBI 
> or EBI.
>
> 2) Versioning: A few useful pieces of metadata for changeable 
> (mutable) URI-referenced RDF graphs (dereferenced) is what version is 
> current, when it was assigned or created (date and time, UTC), and a 
> reference to the sorted list of all earlier versions. This would allow 
> precise rolling back to any version for performing a re-analysis of 
> info from an earlier time.
>
> 3) Signifiers: Life science data records of bio or chem entities 
> (genes, snps, protein, chemicals, agents, diseases, pathways, 
> anatomical parts) should always reference a community agreed upon 
> conceptualized bio/chem-entity, i.e., to what the scientist in his or 
> her mind commonly and collectively regard when hearing "human GSK3 
> beta". These could have ontologies layered on them when they become 
> available. These entities represent the 'signifiers or signs' for the 
> 'signified or real-world objects' such as "Hu GSK3b" or " Mus MAP12" 
> (for the curious, see http://en.wikipedia.org/wiki/Sign_(semiotics), 
> btw the full RDF graph around an entity would be equivalent to 
> Peirce's 'interpretant'). They would exist as non-data objects, more 
> like scientific placeholders, but can use rdfs:seeAlso to point to 
> real data records of them. Data records by themselves WOULD NOT be of 
> this special meta-class. If this sounds fuzzy to you, consider what it 
> took to align most of the gene synonym names to one agreed symbol; 
> sociologically this is no different.
>
> 4)  Covering Mapping: Propose an initial set of properties to support 
> the above model. As a starter, define an equivalent of 
> rdfs:isDefinedBy for life science that would specifically map an 
> instance graph of the data record to the singular conceptualized 
> bio/chem-entity, using something on the order of  hcls:isDefinedAs :
>
> <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&list_uids=2932>  
> <hcls:isDefinedAs>  <http://purl.org/hcls/bioentity/hu_gsk3b>
>
> In line with what Chimezie proposed, rdfs:seeAlso could be used to 
> declare the inverse relation for a select set of data records; not 
> sure if any new relation is needed here.
>
> In the absence of any formal ontology that could cover all life 
> sciences data records (e.g., Genes), a relational instance model might 
> be more practical and appealing; A transitive rule could be proposed 
> that states all data records referencing the same bio/chem-entity 
> would be viewed as "bio/chem entity" equivalent, regardless of what 
> ontology/rdfschema were used to define each of them:
> (?data1 hcls:isDefinedAs ?ent) AND (?data2 hcls:isDefinedAs ?ent) -> 
> (?data1 hcls:sameEntityAs ?data2 )
>
> This is an example of what I had suggested as a "Covering", since 
> there is no explicit need to use ontologies to map data records to 
> common class-based concepts. owl:sameAs could be used hear, but  the 
> 'sameEntityAs' relation could have more selective meaning for this 
> community in terms of data records and 'things'. I leave it open for 
> discussion...
>
> I'd be interested to hear how important and practical the points 
> raised here are. The main objective I have is to try and get our 
> common discussion to focus on some basic, agreeable points that we can 
> work together on over the next (hopefully) few weeks.
>
> cheers,
> Eric
>
>
> Eric Neumann, PhD
> co-chair, W3C Healthcare and Life Sciences,
> and Senior Director Product Strategy
> Teranode Corporation
> 83 South King Street, Suite 800
> Seattle, WA 98104
> +1 (781)856-9132
> www.teranode.com 
>
>
Received on Tuesday, 20 June 2006 13:41:57 UTC