RE: URI thoughts from Eric Neumann on 2006-06-21 (public-semweb-lifesci@w3.org from June 2006)

From: Eric Neumann <eneumann@teranode.com>
Date: Wed, 21 Jun 2006 09:32:14 -0400
To: "Chimezie Ogbuji" <ogbujic@bio.ri.ccf.org>, "Xiaoshu Wang" <wangxiao@musc.edu>
cc: "w3c semweb hcls" <public-semweb-lifesci@w3.org>
Message-ID: <8f68c7ad4a3a103621f28bdea687dda4@teranode.com>
Xiaoshu ,

Hmmm... I see a possible reason why SW is often hard to understand from 
a data provider point of view....

Authority is the organization that has published annotated data, such 
as NCBI or SwissProt. For the foreseeable future, they will be 
responsible for curating and managing some subset of the RDF graph for 
data records. Dereferencing in my opinion, should alsways defer to the 
authority site responsible for the base data.

If you or someone adds a new property to their base graph, the basic 
RDF model will allow merging, but that will not prevent 
inconsistencies( or "opinions") from being added. We will eventually 
have to address these issues of provenance, and there are several 
possible ways forwards here. But by dereferencing a Entrez Gene data 
record, I should get back only that part of the graph that NCBI Entrez 
is responsible for. Outside references and annotations could be 
discovered/aggregated by specific services, such as a DAS-like 
(distributed annotation server) model.

I don't believe authority responsibility can ever be disregarded for 
data record URIs...

To Chimezie's point, I also do not think all URI's need to be 
dereferensible, but certainly data record URI's should always be 
dereferensible.

In addition, the practice of using  rdfs:isDefinedBy to link data 
records to accepted concept bio/chem entities will work as long as 
rdfs:isDefinedBy does not get used in other ways that would obscure the 
special usage being proposed here. We would need to agree on the 
intended practice and meaning throughout the community so that 
predicates do what others would expect them to do:  "data records of 
entities like 'genes' always refer to unique concepts of bio/chem 
entities, even if polymorphisms exist"

Eric

--- Xiaoshu Wang <wangxiao@musc.edu> wrote:

 >
 > > 1) Dereferencing: The dereferencing of a URI to a
 > data record
 > > results in the return of all the "authority
 > managed"
 > > information about it (locally curated data) in the
 > form of a
 > > RDF graph. Outside annotations would not be
 > included unless
 > > the authority provided an open annotative service.
 > This is
 > > what you get back when you query sources such as
 > NCBI or EBI.
 >
 > I am not sure what the "authority" here means.  RDF
 > itself is monotonic and
 > open.  Hence, anyone can say anything about
 > anything.  In the eyes of RDF,
 > there is only the problem of model consistency and
 > an RDF engine can not
 > consider one assertion is "more" correct than
 > others.
 >
 > > 2) Versioning: A few useful pieces of metadata for
 > changeable
 > > (mutable) URI-referenced RDF graphs (dereferenced)
 > is what
 > > version is current, when it was assigned or
 > created (date and
 > > time, UTC), and a reference to the sorted list of
 > all earlier
 > > versions. This would allow precise rolling back to
 > any
 > > version for performing a re-analysis of info from
 > an earlier time.
 >
 > I think Dublin Core's relation element and
 > associated element refinement
 > like dc:replaces and dc:isReplacedBy etc., would
 > handle this adequately.
 >
 > > 3) Signifiers: Life science data records of bio or
 > chem
 > > entities (genes, snps, protein, chemicals, agents,
 > diseases,
 > > pathways, anatomical parts) should always
 > reference a
 > > community agreed upon conceptualized
 > bio/chem-entity, i.e.,
 > > to what the scientist in his or her mind commonly
 > and
 > > collectively regard when hearing "human GSK3
 > beta". These
 > > could have ontologies layered on them when they
 > become
 > > available. These entities represent the
 > 'signifiers or signs'
 > > for the 'signified or real-world objects' such as
 > "Hu GSK3b"
 > > or " Mus MAP12"
 > > (for the curious, see
 > http://en.wikipedia.org/wiki/Sign_(semiotics),
 > > btw the full RDF graph around an entity would be
 > equivalent
 > > to Peirce's 'interpretant'). They would exist as
 > non-data
 > > objects, more like scientific placeholders, but
 > can use
 > > rdfs:seeAlso to point to real data records of
 > them. Data
 > > records by themselves WOULD NOT be of this special
 >
 > > meta-class. If this sounds fuzzy to you, consider
 > what it
 > > took to align most of the gene synonym names to
 > one agreed
 > > symbol; sociologically this is no different.
 >
 > I can't agree more.  We should not mixup the
 > data/description about a
 > resource with the resource itself.  This is the
 > reason why I have strongly
 > opposed the idea of using wiki URI to represent
 > biological entities.
 > Information and non-information resource are
 > disjoint.  Mixing them up will
 > break the foundation of web and of course the logic
 > of an RDF engine.
 >
 > > 4)  Covering Mapping: Propose an initial set of
 > properties to
 > > support the above model. As a starter, define an
 > equivalent
 > > of rdfs:isDefinedBy for life science that would
 > specifically
 > > map an instance graph of the data record to the
 > singular
 > > conceptualized bio/chem-entity, using something on
 > the order
 > > of  hcls:isDefinedAs :
 > >
 > > <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
 > > db=gene&cmd=Retrieve&list_uids=2932>
 > <hcls:isDefinedAs>
 > > <http://purl.org/hcls/bioentity/hu_gsk3b>
 > >
 > > In line with what Chimezie proposed, rdfs:seeAlso
 > could be
 > > used to declare the inverse relation for a select
 > set of data
 > > records; not sure if any new relation is needed
 > here.
 >
 > I think such sets of vocabulary is needed.  But
 > rdfs:seeAlso etc. is refined
 > to be an AnnotationProperty in OWL so it can not be
 > extended anymore.  Some
 > simple property like
 > hcls:nchientry will just do in my opinion.  As a
 > start, I think such kind of
 > property should be very coarse grained.  Because the
 > more general, the more
 > sharable.
 >
 > Xiaoshu
 >
 >
 >

Eric Neumann, PhD
co-chair, W3C Healthcare and Life Sciences,
and Senior Director Product Strategy
Teranode Corporation
83 South King Street, Suite 800
Seattle, WA 98104
+1 (781)856-9132
www.teranode.com
Received on Wednesday, 21 June 2006 13:32:53 UTC