Re: LD and Redundancy from Ross Singer on 2011-03-23 (public-lld@w3.org from March 2011)

From: Ross Singer <ross.singer@talis.com>
Date: Wed, 23 Mar 2011 16:36:32 -0400
To: Owen Stephens <owen@ostephens.com>
Cc: public-lld <public-lld@w3.org>
Message-ID: <AANLkTikTwgdhs8KVEh7uP8i6sQ2m-9mvBAunmOES1Pbz@mail.gmail.com>
I think we're going to have to assume there will be lots of duplication of
resources describing the same thing with different identifiers (although,
hopefully interrelated) for a couple of reasons:

1) A centralized repository will never be able to keep up with everything -
there will always be nodes with resources described prior to being added to
the repository; possibly never added.  These could also spring up in
multiple places independently
2) We should not expect universal, 100% agreement on how things are
defined/described.  We don't have this now, we certainly can't expect this
to change.
3) There are lots of non-authoritative resources (subject headings, people,
class numbers, etc.)
4) A centralized repository would have to rely quite heavily on discovery
    - there's a huge danger of GIGO here (there are plenty of typos in the
historical record)
    - plenty of chances of failed searches

Couple this to the fact that (most) everybody is going to to have to
duplicate all of the data for local indexing purposes, anyway...

-Ross.

On Wed, Mar 23, 2011 at 3:37 PM, Owen Stephens <owen@ostephens.com> wrote:

> I tend to agree with Joachim - we will see more data publication and at
> least in this phase will see plenty of institutions coining their own URIs.
> However, I also believe that the web tends towards less duplication (this
> isn't anything close to no duplication, just less duplication than we would
> have otherwise).
>
> We are already seeing that established URIs will be used where they exist
> (e.g. for LCSH) - and I guess we can expect to see more of these.
>
> That said, I think aggregations are a good thing (and inevitable) - and the
> more identifiers are shared, and the more people make sameas and similar
> statements, the easier aggregation will become.
>
> In terms of what we should be doing now? I'd say:
>
> Encourage re-use of URIs (ideally this would be baked into record creation
> in libraries, but that's a whole other ball game)
> Encourage sameas statements where new URIs have been coined (and
> appropriate)
> Start looking at how existing linked data representations of bibliographic
> data can be crawled and aggregated and see what works and what doesn't
>
> I'm sure there is other stuff, but those are the ones that spring to mind
> first
>
> The work of the JISC 'RDTF' (Resource Discovery Task Force) in the UK is
> looking at the strategy of 'publish' and 'aggregate' - although this doesn't
> dictate the use of Linked Data or RDF, many of the project falling into this
> area are adopting that approach, so hopefully we will see a good exploration
> of some of the issues from this area soon. See http://rdtf.mimas.ac.uk/ for
> more information on this.
>
> Owen
>
>
> Owen Stephens
> Owen Stephens Consulting
> Web: http://www.ostephens.com
> Email: owen@ostephens.com
> Telephone: 0121 288 6936
>
> On 23 Mar 2011, at 17:16, stu wrote:
>
> *On Thu, Mar 24, 2011 at 1:18 AM, Neubert Joachim <J.Neubert@zbw.eu>wrote:
>
> I'm not sure that a centralized model for building clusters (like VIAF) or
> a pre-declared central hub ("everybody maps to
> WorldCat/OpenLibrary/whatever") could work.*
>
> A centralized model is essential if global bibliography is to be an
> important part of the Web.  Sure, there are work-arounds involving declared
> or inferred equivalence.  These all require additional work on the part of
> systems and people, which will rarely be expended, with the result that link
> potency will (continue to) be diluted to insignificance.
>
> Is it important enough for the global library community to expend the
> resources to consolidate meaningful global bibliography?  Can the political
> impediments be overcome?
>
> I continue to believe that OCLC is the only likely candidate with a chance
> to make this happen, and it appears that the business cases are too weak,
> and constituent demand too feeble for that to happen in the current
> environment.
>
> I just Googled the book closest to hand, and on the first page, Wikipedia
> was number one, and there were two Amazon links in the top ten.  No library
> link of any sort appeared on the page.
>
> Linked data isn't going to change this without a centralized identifier
> infrastructure.
>
> stu
>
>
>
>>
>
>
Received on Wednesday, 23 March 2011 20:37:05 UTC