- From: Peter Ansell <ansell.peter@gmail.com>
- Date: Sun, 22 Mar 2009 07:42:28 +1000
- To: Michel_Dumontier <Michel_Dumontier@carleton.ca>
- Cc: W3C HCLSIG hcls <public-semweb-lifesci@w3.org>
2009/3/22 Michel_Dumontier <Michel_Dumontier@carleton.ca>: > Eric and friends, > > > > I’m very sympathetic to the simplifying assumption of not distinguishing > between a record and the molecular entity it represents, but there are some > important considerations. First, we need to be cautious in the > transformation of recorded facts (as they appear in these database records) > to class restrictions on biomolecules in logic-based (e.g. OWL) ontologies. > Initially, we might say that a class biomolecules share a particular > molecular structure (or biopolymer sequence), but assertions of role, > function, PTMs, and involvement in biological process (among others) are > contextual or temporally qualified and as such it may not be appropriate to > generalize to all instances. For example, some protein records list all of > the _known_ PTMs .. hardly the basis to generalize that all instances will > also have those PTMs at those positions at all (or any!) time. This is > clearly a major knowledge representation challenge, in which we should > engage in different approaches to improve our representation. Class-based > representations are necessary as there is a need to refer to specific real > world instances, whether they be collections of molecules in a test tube, > electron micrographs that show individual macromolecular complexes or atomic > force microscopes that manipulate them. In the meantime, we’ll probably > continue to model database records as instances of their corresponding > entity. Class based assertions are useful, but in cases where databases are very large, it is hard to distinguish between records which tentatively make a class assertion, and those that more fully make a class assertion. The general rule I have followed is to provide these evidence statements together with the record, and enable people to make the assertions based on their level of evidence. Evidence is all you can say on a large scale in my opinion, as the biggest curated databases can realistically not do much more than ensure that a given thing has a publication behind it. That is science. If someone wants to do a model that relies on their personal level of surety about a given thing it is more likely that they will recreate a world for themselves and import in the various entities that they require by vague reference. ie, use a relatively non-descript property to tell people where their evidence comes from, but not actually rely on the semantics given to the publically curated entity for their purposes. If we can at least make it easy for people to let others know where to find more information about the things they are using inside of their novel datasources than we will have some success in bringing together publically available extra information about records. Making an assertion that we are going to decide what the actual class is for an asserted record type and use two or more URI's to distinguish this at the world level won't give either URI more evidence or data clarity in my opinion. > There is no doubt that it is challenging to devise a consistent naming > scheme – and nearly each member of the steering group has worked out some > way to do this (e.g. [1][2]). If the sharednames group wants to recommend an > consensual approach on the _syntax_ of any given name, with appropriate > rationale, then it’s possible that more people will use it as a guiding > principle. However, attempts to _control_ the naming process will result in > an undoubtedly unreceptive audience. Will a registry of names prevent people > from making similar or identical (literal) names? no. Establishing a > self-registry of namespaces like bio2rdf [3] or lsrn.org is a more worthy > goal. I, like several others, am interested to see how the committee will > “make sure that its URIs … resolve to information that is useful”. I expect > that this will be challenging to establish utility, particularly in the > context of a term contained in an expressive ontology. Useful is good. In both the namespace naming case and the syntax case I have put forward arguments in the past for both lsrn and bio2rdf as namespace congregation points, and essentially namespace:identifier and namespace/identifier as the two alternatives for syntax. Particularly with the ability to make up multiple namespace based on any given dataset, there needs to be a way of either formally telling people they are equal, with owl:sameAs, or informally telling them if there isn't a one-to-one correspondence between them and a mapping isn't simple to do in the general case. > I applaud efforts to publish data in an open and linked manner. But > somewhat disconcerting is that I’m (controversially) sure we’ll find > ourselves in the awkward position that there will be too much meaningless > linked data, in which we’ll have to filter useful, less useful, to > identical, useless or worse, misguiding or erroneous. It’s not hard to > imagine this happening. Applying the correct semantics to create meaningful > relations is of fundamental importance for answering questions about our > collective knowledge. Linking concepts or data with clearly defined semantic > links (e.g. SKOS, RO, OWL) is indeed useful, and its utility goes beyond > Linked Data. Eric’s appeal, that we should be careful to (meaningfully) link > to third party über- URIs, resonates for the same reason that you may want > to say something about an entity that other people won’t necessarily agree > with. The truth is that we all have different perceptions of reality, and > our knowledge about the world is in constant flux. We should be able to > express our knowledge to our degree of satisfaction. In a competitive, > distributed environment that is the web, people will choose terms and > ontologies that best agrees with their perception and with their > requirements. As a nascent scientific community, so early in the game of > designing accurate, expressive and meaningful ontologies, we should > encourage new ideas and ensure competition among them. > It would be nice to be able to have more than just dbxref as a relation, but in many cases the database owners do not provide more semantics that would be applicable over the whole dataset, so inevitably there will have to be some way of filtering based on the other fields that have been decided on as characteristic of a particular class/thing in the context of one's novel ontology. This segregation inevitably leads to new namespaces (read URI's) being created for different classes of knowledge, but hopefully with linked data they can still be interrelated, if only to be able to trace the evidence for a particular part of an advanced ontology for someone wishing to evaluate the ontology in terms of what other people have used the same evidence to do in the past. Cheers, Peter
Received on Saturday, 21 March 2009 21:43:09 UTC