- From: Hugh Glaser <hg@ecs.soton.ac.uk>
- Date: Thu, 27 Nov 2008 14:04:17 +0000
- To: Georgi Kobilarov <georgi.kobilarov@gmx.de>, John Graybeal <graybeal@mbari.org>, Richard Cyganiak <richard@cyganiak.de>
- CC: "public-lod@w3.org" <public-lod@w3.org>, Semantic Web <semantic-web@w3.org>
On 27/11/2008 13:43, "Georgi Kobilarov" <georgi.kobilarov@gmx.de> wrote: > Hi John, > >> Do you think the argument is mostly settled, or would you agree that >> duplicating a massive set of URIs for 'local technical simplification' >> is a bad practice? (In which case, is the question just a matter of >> scale?) > > In my opinion it's not only about technical simplification. It's a > question of process. > When I want to publish a new dataset, only re-using existing URIs would > mean that I need to find the correct URIs in the first place. That's > difficult: some concepts in my dataset might not be available in others, > some concepts might be available in more than one other. Having to > decide too early restrains me from publishing my data. And causes a management problem if your get it wrong, or the others change the meaning, however subtly. > > By minting my own URIs, I can split the publishing problem into two > separate task: 1. publish data and 2. interlink data. And the > interlinking task could then even be done by someone else... Oh yes please! I think it is not just about managing large numbers of URIs across the SW because you minted a new *one*; we need to mint large numbers of URIs for our own data because until you can do the complex co-reference work on the KB, you don't know whether the strings you are working with are the same. Eg your input text says David Williams is at the University of Newcastle. It can be some time later before you are confidently able to assert that this is the University of Newcastle in Australia. In the meantime you mint a new URI. Later you decide it is co-referent with another. We take what I think might be called a hybrid approach. We do mint all the URIs; but we have a sub-system that re-processes the RDF so that the ones that have been identified as co-referent can be collapsed onto a single URI, either of our own or somewhere else. It leaves the old ones around for any future reference in a "deprecated state". We call this process "canonisation" :-) Best Hugh > > Best, > Georgi > > -- > Georgi Kobilarov > Freie Universität Berlin > www.georgikobilarov.com > >
Received on Thursday, 27 November 2008 14:06:21 UTC