- From: Eric Neumann <eneumann@teranode.com>
- Date: Wed, 21 Jun 2006 09:32:14 -0400
- To: "Chimezie Ogbuji" <ogbujic@bio.ri.ccf.org>, "Xiaoshu Wang" <wangxiao@musc.edu>
- cc: "w3c semweb hcls" <public-semweb-lifesci@w3.org>
Xiaoshu , Hmmm... I see a possible reason why SW is often hard to understand from a data provider point of view.... Authority is the organization that has published annotated data, such as NCBI or SwissProt. For the foreseeable future, they will be responsible for curating and managing some subset of the RDF graph for data records. Dereferencing in my opinion, should alsways defer to the authority site responsible for the base data. If you or someone adds a new property to their base graph, the basic RDF model will allow merging, but that will not prevent inconsistencies( or "opinions") from being added. We will eventually have to address these issues of provenance, and there are several possible ways forwards here. But by dereferencing a Entrez Gene data record, I should get back only that part of the graph that NCBI Entrez is responsible for. Outside references and annotations could be discovered/aggregated by specific services, such as a DAS-like (distributed annotation server) model. I don't believe authority responsibility can ever be disregarded for data record URIs... To Chimezie's point, I also do not think all URI's need to be dereferensible, but certainly data record URI's should always be dereferensible. In addition, the practice of using rdfs:isDefinedBy to link data records to accepted concept bio/chem entities will work as long as rdfs:isDefinedBy does not get used in other ways that would obscure the special usage being proposed here. We would need to agree on the intended practice and meaning throughout the community so that predicates do what others would expect them to do: "data records of entities like 'genes' always refer to unique concepts of bio/chem entities, even if polymorphisms exist" Eric --- Xiaoshu Wang <wangxiao@musc.edu> wrote: > > > 1) Dereferencing: The dereferencing of a URI to a > data record > > results in the return of all the "authority > managed" > > information about it (locally curated data) in the > form of a > > RDF graph. Outside annotations would not be > included unless > > the authority provided an open annotative service. > This is > > what you get back when you query sources such as > NCBI or EBI. > > I am not sure what the "authority" here means. RDF > itself is monotonic and > open. Hence, anyone can say anything about > anything. In the eyes of RDF, > there is only the problem of model consistency and > an RDF engine can not > consider one assertion is "more" correct than > others. > > > 2) Versioning: A few useful pieces of metadata for > changeable > > (mutable) URI-referenced RDF graphs (dereferenced) > is what > > version is current, when it was assigned or > created (date and > > time, UTC), and a reference to the sorted list of > all earlier > > versions. This would allow precise rolling back to > any > > version for performing a re-analysis of info from > an earlier time. > > I think Dublin Core's relation element and > associated element refinement > like dc:replaces and dc:isReplacedBy etc., would > handle this adequately. > > > 3) Signifiers: Life science data records of bio or > chem > > entities (genes, snps, protein, chemicals, agents, > diseases, > > pathways, anatomical parts) should always > reference a > > community agreed upon conceptualized > bio/chem-entity, i.e., > > to what the scientist in his or her mind commonly > and > > collectively regard when hearing "human GSK3 > beta". These > > could have ontologies layered on them when they > become > > available. These entities represent the > 'signifiers or signs' > > for the 'signified or real-world objects' such as > "Hu GSK3b" > > or " Mus MAP12" > > (for the curious, see > http://en.wikipedia.org/wiki/Sign_(semiotics), > > btw the full RDF graph around an entity would be > equivalent > > to Peirce's 'interpretant'). They would exist as > non-data > > objects, more like scientific placeholders, but > can use > > rdfs:seeAlso to point to real data records of > them. Data > > records by themselves WOULD NOT be of this special > > > meta-class. If this sounds fuzzy to you, consider > what it > > took to align most of the gene synonym names to > one agreed > > symbol; sociologically this is no different. > > I can't agree more. We should not mixup the > data/description about a > resource with the resource itself. This is the > reason why I have strongly > opposed the idea of using wiki URI to represent > biological entities. > Information and non-information resource are > disjoint. Mixing them up will > break the foundation of web and of course the logic > of an RDF engine. > > > 4) Covering Mapping: Propose an initial set of > properties to > > support the above model. As a starter, define an > equivalent > > of rdfs:isDefinedBy for life science that would > specifically > > map an instance graph of the data record to the > singular > > conceptualized bio/chem-entity, using something on > the order > > of hcls:isDefinedAs : > > > > <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > > db=gene&cmd=Retrieve&list_uids=2932> > <hcls:isDefinedAs> > > <http://purl.org/hcls/bioentity/hu_gsk3b> > > > > In line with what Chimezie proposed, rdfs:seeAlso > could be > > used to declare the inverse relation for a select > set of data > > records; not sure if any new relation is needed > here. > > I think such sets of vocabulary is needed. But > rdfs:seeAlso etc. is refined > to be an AnnotationProperty in OWL so it can not be > extended anymore. Some > simple property like > hcls:nchientry will just do in my opinion. As a > start, I think such kind of > property should be very coarse grained. Because the > more general, the more > sharable. > > Xiaoshu > > > Eric Neumann, PhD co-chair, W3C Healthcare and Life Sciences, and Senior Director Product Strategy Teranode Corporation 83 South King Street, Suite 800 Seattle, WA 98104 +1 (781)856-9132 www.teranode.com
Received on Wednesday, 21 June 2006 13:32:53 UTC