Re: URI aliases and owl:sameAs was: Terminology Question concerning Web Architecture and Linked Data from Renato Golin on 2007-07-27 (semantic-web@w3.org from July 2007)

From: Renato Golin <renato@ebi.ac.uk>
Date: Fri, 27 Jul 2007 22:34:17 +0100 (BST)
To: "Tim Berners-Lee" <timbl@w3.org>
Cc: "Alan Ruttenberg" <alanruttenberg@gmail.com>, "Chris Bizer" <chris@bizer.de>, "SW-forum Web" <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>, "Jonathan A Rees" <jar@mumble.net>
Message-ID: <55619.81.151.155.49.1185572057.squirrel@webmail.ebi.ac.uk>

Hi Tim,

there is one big problem with your suggestion...

> So, how does this relate to the Science commons?  I think the life
sciences folks should not hold their breath until there is a unique
identifier for each protein, an a unique concept for what a "protein" is
exactly. They should serve up the actual records about these
> things as documents, with known provenance and features and
> failings.

For decades bioinformaticians (or their counterparts at that time) are
doing exactly what you're proposing and we're now in the same state as the
web was a few years ago. I believe the phrase "Perl is the duck tape of
bioinformatics" will remind you of a very nasty phase of the web...

Every one have it's own ontology, "unique identifiers", libraries, file
format, etc. Even within the same institute there are several different
views (format, identifiers, libs, etc) of the same data. Newcomers quite
often re-write the code and core library from scratch because "it was not
good enough".


>   So a protein may get Ids in  uniprot and in the Gene
> Ontology, where the mapping isn't 100% crystal clear. And then
> mapping files can be provided where the mappings exist.

Even UniProt have difficulties in keeping track of "unique identifiers"
and formats, because every scientist think its own ways are *much* better
and because the field is so vague, no one disagrees...

There is also a plethora of cross references and ontologies in UniProt but
every other database is completely different and use a completely
different set of ontologies...


> This allows
> each data source to change if necessary, as new  understandings
> arise.  The system must not be so rigidly connected that nothing can
grow.  The service of the data should be maintained by the
> organization which maintains the data, after an initial period when
people externally show them how it is done. (like biordf and bio2rdf).

Seriously, how many people do you know that can do it? Unfortunately I
know only a few and even them are not actively doing what they know
because of company's policies or institute's bureaucracy.

Scientists are more proud of their formats and unique identifiers than of
its relevancy to the community and they won't do it open nor will
relinquish their control over it.

Science today have few science in it... Interesting reading:

http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics/


> Once these identifiers cease to be the one and only central ID, then
they can be minted pragmatically.

Exactly that's the point! Decentralization is the key, but it's quite hard
to convince institutes to stop doing things as monolithic and selfish as
they're doing now.

Nevertheless, I do believe it'll happen and believe the semantic web will
have a great, if not the greatest, contribution to this achievement.

cheers,
--renato

Received on Friday, 27 July 2007 21:34:26 UTC