Re: LD and Redundancy from Stuart Weibel on 2011-03-24 (public-lld@w3.org from March 2011)

From: Stuart Weibel <stuart.weibel@gmail.com>
Date: Thu, 24 Mar 2011 09:24:34 +0900
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: Ross Singer <ross.singer@talis.com>, public-lld <public-lld@w3.org>
Message-Id: <76E21DD5-53F0-482C-8747-5124621C50CF@gmail.com>
It's a fundamental issue of primary importance to libraries.  It should be discussed in the report

Sent from my iPhone

On Mar 24, 2011, at 9:10 AM, Karen Coyle <kcoyle@kcoyle.net> wrote:

> Following up to this, we seem to agree that there will be redundancy of data and of identifiers. Is this a particular LLD issue that should be included in the group's report, or is this a general SemWeb issue that we can assume will be addressed in the normal course of things? At the moment there is a brief mention of this in the issues area of the report, but we're unsure what to say about it.
> 
> Perhaps we can resolve this on tomorrow's call.
> 
> Thanks, all,
> kc
> 
> Quoting Ross Singer <ross.singer@talis.com>:
> 
>> I think we're going to have to assume there will be lots of duplication of
>> resources describing the same thing with different identifiers (although,
>> hopefully interrelated) for a couple of reasons:
>> 
>> 1) A centralized repository will never be able to keep up with everything -
>> there will always be nodes with resources described prior to being added to
>> the repository; possibly never added.  These could also spring up in
>> multiple places independently
>> 2) We should not expect universal, 100% agreement on how things are
>> defined/described.  We don't have this now, we certainly can't expect this
>> to change.
>> 3) There are lots of non-authoritative resources (subject headings, people,
>> class numbers, etc.)
>> 4) A centralized repository would have to rely quite heavily on discovery
>>    - there's a huge danger of GIGO here (there are plenty of typos in the
>> historical record)
>>    - plenty of chances of failed searches
>> 
>> Couple this to the fact that (most) everybody is going to to have to
>> duplicate all of the data for local indexing purposes, anyway...
>> 
>> -Ross.
>> 
>> On Wed, Mar 23, 2011 at 3:37 PM, Owen Stephens <owen@ostephens.com> wrote:
>> 
>>> I tend to agree with Joachim - we will see more data publication and at
>>> least in this phase will see plenty of institutions coining their own URIs.
>>> However, I also believe that the web tends towards less duplication (this
>>> isn't anything close to no duplication, just less duplication than we would
>>> have otherwise).
>>> 
>>> We are already seeing that established URIs will be used where they exist
>>> (e.g. for LCSH) - and I guess we can expect to see more of these.
>>> 
>>> That said, I think aggregations are a good thing (and inevitable) - and the
>>> more identifiers are shared, and the more people make sameas and similar
>>> statements, the easier aggregation will become.
>>> 
>>> In terms of what we should be doing now? I'd say:
>>> 
>>> Encourage re-use of URIs (ideally this would be baked into record creation
>>> in libraries, but that's a whole other ball game)
>>> Encourage sameas statements where new URIs have been coined (and
>>> appropriate)
>>> Start looking at how existing linked data representations of bibliographic
>>> data can be crawled and aggregated and see what works and what doesn't
>>> 
>>> I'm sure there is other stuff, but those are the ones that spring to mind
>>> first
>>> 
>>> The work of the JISC 'RDTF' (Resource Discovery Task Force) in the UK is
>>> looking at the strategy of 'publish' and 'aggregate' - although this doesn't
>>> dictate the use of Linked Data or RDF, many of the project falling into this
>>> area are adopting that approach, so hopefully we will see a good exploration
>>> of some of the issues from this area soon. See http://rdtf.mimas.ac.uk/ for
>>> more information on this.
>>> 
>>> Owen
>>> 
>>> 
>>> Owen Stephens
>>> Owen Stephens Consulting
>>> Web: http://www.ostephens.com
>>> Email: owen@ostephens.com
>>> Telephone: 0121 288 6936
>>> 
>>> On 23 Mar 2011, at 17:16, stu wrote:
>>> 
>>> *On Thu, Mar 24, 2011 at 1:18 AM, Neubert Joachim <J.Neubert@zbw.eu>wrote:
>>> 
>>> I'm not sure that a centralized model for building clusters (like VIAF) or
>>> a pre-declared central hub ("everybody maps to
>>> WorldCat/OpenLibrary/whatever") could work.*
>>> 
>>> A centralized model is essential if global bibliography is to be an
>>> important part of the Web.  Sure, there are work-arounds involving declared
>>> or inferred equivalence.  These all require additional work on the part of
>>> systems and people, which will rarely be expended, with the result that link
>>> potency will (continue to) be diluted to insignificance.
>>> 
>>> Is it important enough for the global library community to expend the
>>> resources to consolidate meaningful global bibliography?  Can the political
>>> impediments be overcome?
>>> 
>>> I continue to believe that OCLC is the only likely candidate with a chance
>>> to make this happen, and it appears that the business cases are too weak,
>>> and constituent demand too feeble for that to happen in the current
>>> environment.
>>> 
>>> I just Googled the book closest to hand, and on the first page, Wikipedia
>>> was number one, and there were two Amazon links in the top ten.  No library
>>> link of any sort appeared on the page.
>>> 
>>> Linked data isn't going to change this without a centralized identifier
>>> infrastructure.
>>> 
>>> stu
>>> 
>>> 
>>> 
>>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
> 
>
Received on Thursday, 24 March 2011 00:25:43 UTC