- From: Renato Golin <renato@ebi.ac.uk>
- Date: Fri, 27 Jul 2007 22:34:17 +0100 (BST)
- To: "Tim Berners-Lee" <timbl@w3.org>
- Cc: "Alan Ruttenberg" <alanruttenberg@gmail.com>, "Chris Bizer" <chris@bizer.de>, "SW-forum Web" <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>, "Jonathan A Rees" <jar@mumble.net>
Hi Tim, there is one big problem with your suggestion... > So, how does this relate to the Science commons? I think the life sciences folks should not hold their breath until there is a unique identifier for each protein, an a unique concept for what a "protein" is exactly. They should serve up the actual records about these > things as documents, with known provenance and features and > failings. For decades bioinformaticians (or their counterparts at that time) are doing exactly what you're proposing and we're now in the same state as the web was a few years ago. I believe the phrase "Perl is the duck tape of bioinformatics" will remind you of a very nasty phase of the web... Every one have it's own ontology, "unique identifiers", libraries, file format, etc. Even within the same institute there are several different views (format, identifiers, libs, etc) of the same data. Newcomers quite often re-write the code and core library from scratch because "it was not good enough". > So a protein may get Ids in uniprot and in the Gene > Ontology, where the mapping isn't 100% crystal clear. And then > mapping files can be provided where the mappings exist. Even UniProt have difficulties in keeping track of "unique identifiers" and formats, because every scientist think its own ways are *much* better and because the field is so vague, no one disagrees... There is also a plethora of cross references and ontologies in UniProt but every other database is completely different and use a completely different set of ontologies... > This allows > each data source to change if necessary, as new understandings > arise. The system must not be so rigidly connected that nothing can grow. The service of the data should be maintained by the > organization which maintains the data, after an initial period when people externally show them how it is done. (like biordf and bio2rdf). Seriously, how many people do you know that can do it? Unfortunately I know only a few and even them are not actively doing what they know because of company's policies or institute's bureaucracy. Scientists are more proud of their formats and unique identifiers than of its relevancy to the community and they won't do it open nor will relinquish their control over it. Science today have few science in it... Interesting reading: http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics/ > Once these identifiers cease to be the one and only central ID, then they can be minted pragmatically. Exactly that's the point! Decentralization is the key, but it's quite hard to convince institutes to stop doing things as monolithic and selfish as they're doing now. Nevertheless, I do believe it'll happen and believe the semantic web will have a great, if not the greatest, contribution to this achievement. cheers, --renato
Received on Friday, 27 July 2007 21:34:26 UTC