- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Sun, 30 Jul 2006 22:07:30 -0400
- To: "Mark Wilkinson" <markw@illuminae.com>
- Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, public-semweb-lifesci@w3.org, noah_mendelsohn@us.ibm.com, "Sean Martin" <sjmm@us.ibm.com>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, "Phillip Lord" <phillip.lord@newcastle.ac.uk>, www-tag@w3.org, "Dan Connolly" <connolly@w3.org>
Excellent response! I 95% heartedly agree (all but the "I stand by LSIDS part" :) I will note however that whenever there are versions of something, there tends to some concept of the thing that they are versions of. So even though there are versions of the sequence, there ought to still be some thing which represents the thing that all the versions are of. Back to your point, is there anyone out there who has minted LSIDs for genes and for the sequences distinctly and related them? Do the gene LSIDs ever get versions? Do the sequence LSIDs ever not have versions? When there are different authorities for the genes and sequences, what are the relations that people use to relate them? Let's put these examples on the table. If any one has done this in the context of NCBI databases in particular I think it would be helpful to share the specifics of how these ids were used and conceptualized. My experience has been that there is routine confusion of the sort that you describe throughout the life sciences community and that this bleeds into the discussion of identifiers (as it just did, though I have to admit I was baiting for exactly this discussion :) I frequently see genes, transcripts, dna and mrna and their sequences, proteins, protein sequences, transcripts, and peptides all confusedly identified by overlapping identifiers. I don't see how any identifier scheme, in itself, lsid's included, currently fixes this problem. It is this problem that I personally want to see progress on. LSID's contract seems more to do with persistence, mutability, cacheability, and discoverability of byte sequences - not around issues of the identifiers and their relations making ontological sense. While I understand that in some contexts the issues around data management are central, they aren't in all contexts. Because I think that optimization of the data management issues, while in some ways elegantly handled by the LSID protocol, aren't central to the issue of representation in the life sciences, and because I don't see LSID addressing the representation issues, I worry that imposing the use of the LSID protocol puts a burden on all, for the benefit of relatively few. And for those relatively few who are going to go out of their way to have internal copies of data and the like, I don't see why a custom system that is circumvents http for efficiency reasons is too much of a burden. How do you see things otherwise? -Alan (Being deliberately provocative here - my assigned role in this debate :) On Jul 30, 2006, at 9:06 PM, Mark Wilkinson wrote: > On Sun, 30 Jul 2006 16:46:21 -0700, Alan Ruttenberg > <alanruttenberg@gmail.com> wrote: > > I may be speaking out-of-turn here, and should probably let Sean > answer this one since he may have (no doubt) thought-through it > more deeply than I have; however I think you may be mixing up > several different entities here (as so often happens in a URL > world ;-) ) > > In the case you cite above you are likely talking about a "gene", > not a "sequence". A "gene" will have its own LSID, and it is (even > by the strict genetic definition) a conceptual entity defined by > complementation. A "gene" and its "sequence" are not the same > thing! So... I don't see a problem. When you need to refer to the > gene in the abstract, you can refer to the gene's LSID. When you > need to talk about a concrete sequence, you refer to *it's* LSID. > The metadata of the gene will (in a sensible world) include triples > that describe its possible sequences, and these will have versions. > > Genes have many many many properties, so we cannot munge them all > into "sequence". Certainly, this is how we are modelling our data > locally... > > I stand by LSID's :-) > > Mark >
Received on Monday, 31 July 2006 02:07:54 UTC