Re: Dataset vocabularies vs. interchange vocabularies (was: Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase)

On 27/11/2008 13:43, "Georgi Kobilarov" <georgi.kobilarov@gmx.de> wrote:

> Hi John,
>
>> Do you think the argument is mostly settled, or would you agree that
>> duplicating a massive set of URIs for 'local technical simplification'
>> is a bad practice? (In which case, is the question just a matter of
>> scale?)
>
> In my opinion it's not only about technical simplification. It's a
> question of process.
> When I want to publish a new dataset, only re-using existing URIs would
> mean that I need to find the correct URIs in the first place. That's
> difficult: some concepts in my dataset might not be available in others,
> some concepts might be available in more than one other. Having to
> decide too early restrains me from publishing my data.
And causes a management problem if your get it wrong, or the others change
the meaning, however subtly.
>
> By minting my own URIs, I can split the publishing problem into two
> separate task: 1. publish data and 2. interlink data. And the
> interlinking task could then even be done by someone else...
Oh yes please!

I think it is not just about managing large numbers of URIs across the SW
because you minted a new *one*; we need to mint large numbers of URIs for
our own data because until you can do the complex co-reference work on the
KB, you don't know whether the strings you are working with are the same.
Eg your input text says David Williams is at the University of Newcastle. It
can be some time later before you are confidently able to assert that this
is the University of Newcastle in Australia. In the meantime you mint a new
URI. Later you decide it is co-referent with another.

We take what I think might be called a hybrid approach.
We do mint all the URIs; but we have a sub-system that re-processes the RDF
so that the ones that have been identified as co-referent can be collapsed
onto a single URI, either of our own or somewhere else.
It leaves the old ones around for any future reference in a "deprecated
state".
We call this process "canonisation" :-)

Best
Hugh

>
> Best,
> Georgi
>
> --
> Georgi Kobilarov
> Freie Universität Berlin
> www.georgikobilarov.com
>
>

Received on Thursday, 27 November 2008 14:06:21 UTC