Re: AW: [Spam-Wahrscheinlichkeit=45]Re: LD and Redundancy from Karen Coyle on 2011-03-24 (public-lld@w3.org from March 2011)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Thu, 24 Mar 2011 04:02:29 -0700
To: "Haffner, Alexander" <A.Haffner@dnb.de>
Cc: Ross Singer <ross.singer@talis.com>, public-lld <public-lld@w3.org>
Message-ID: <20110324040229.184764ov37d2br9h@kcoyle.net>
Quoting "Haffner, Alexander" <A.Haffner@dnb.de>:

> I also agree to the point that we unfortunately have to deal with  
> redundancies.
>
> But for the report IMO we should strictly differentiate between  
> authority data and bibliographic data. I reckon we can suggest a  
> centralization of authority data at least on national level. I  
> assume a downsizing of bibliographic redundancies needs consolidated  
> authority data and of course the consequent alignment of authorities  
> in bibliographic entries.

I think this has come up before in our discussions of authority files  
-- that authority files are, in a sense, vocabularies (although they  
drift over into "record" space depending on the kinds of properties  
they include). The nature of authority files is that the authoritative  
form identifies a single resource. Bibliographic records are more  
complex and there isn't a clear identifier for the resource (as anyone  
who has tried de-duping bib records knows all too well).

I think it's an excellent idea to speak of these separately in the  
recommendations because the nature of the problem is so different,  
and, as Alex says, the authority data will 1) be easier to merge and  
2) then facilitates the merging of bibliographic data.

I'll put a note in the recommendations section so we don't forget this.

kc

>
> Without raising the FRBR discussion again, I think one of the  
> redundancy reasons in bibliographic data is the lack of  
> possibilities to link items with trustworthy bibliographic records.  
> Everyone is creating own new bibliographic records - but this is  
> also caused by the harvesting approach in current library  
> environments and probably not changeable soon.
>
> And a last point: for the identification and decrease of  
> redundancies it's helpful to have standardized ontologies for the  
> library community - maybe not only RDA but definitely not more than  
> a handful...
>
> alex
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: public-lld-request@w3.org [mailto:public-lld-request@w3.org]  
>> Im Auftrag von
>> Karen Coyle
>> Gesendet: Donnerstag, 24. März 2011 01:11
>> An: Ross Singer
>> Cc: public-lld
>> Betreff: [Spam-Wahrscheinlichkeit=45]Re: LD and Redundancy
>>
>> Following up to this, we seem to agree that there will be redundancy
>> of data and of identifiers. Is this a particular LLD issue that should
>> be included in the group's report, or is this a general SemWeb issue
>> that we can assume will be addressed in the normal course of things?
>> At the moment there is a brief mention of this in the issues area of
>> the report, but we're unsure what to say about it.
>>
>> Perhaps we can resolve this on tomorrow's call.
>>
>> Thanks, all,
>> kc
>>
>> Quoting Ross Singer <ross.singer@talis.com>:
>>
>> > I think we're going to have to assume there will be lots of duplication of
>> > resources describing the same thing with different identifiers (although,
>> > hopefully interrelated) for a couple of reasons:
>> >
>> > 1) A centralized repository will never be able to keep up with  
>> everything -
>> > there will always be nodes with resources described prior to  
>> being added to
>> > the repository; possibly never added.  These could also spring up in
>> > multiple places independently
>> > 2) We should not expect universal, 100% agreement on how things are
>> > defined/described.  We don't have this now, we certainly can't expect this
>> > to change.
>> > 3) There are lots of non-authoritative resources (subject  
>> headings, people,
>> > class numbers, etc.)
>> > 4) A centralized repository would have to rely quite heavily on discovery
>> >     - there's a huge danger of GIGO here (there are plenty of typos in the
>> > historical record)
>> >     - plenty of chances of failed searches
>> >
>> > Couple this to the fact that (most) everybody is going to to have to
>> > duplicate all of the data for local indexing purposes, anyway...
>> >
>> > -Ross.
>> >
>> > On Wed, Mar 23, 2011 at 3:37 PM, Owen Stephens <owen@ostephens.com>
>> wrote:
>> >
>> >> I tend to agree with Joachim - we will see more data publication and at
>> >> least in this phase will see plenty of institutions coining  
>> their own URIs.
>> >> However, I also believe that the web tends towards less duplication (this
>> >> isn't anything close to no duplication, just less duplication  
>> than we would
>> >> have otherwise).
>> >>
>> >> We are already seeing that established URIs will be used where they exist
>> >> (e.g. for LCSH) - and I guess we can expect to see more of these.
>> >>
>> >> That said, I think aggregations are a good thing (and  
>> inevitable) - and the
>> >> more identifiers are shared, and the more people make sameas and similar
>> >> statements, the easier aggregation will become.
>> >>
>> >> In terms of what we should be doing now? I'd say:
>> >>
>> >> Encourage re-use of URIs (ideally this would be baked into  
>> record creation
>> >> in libraries, but that's a whole other ball game)
>> >> Encourage sameas statements where new URIs have been coined (and
>> >> appropriate)
>> >> Start looking at how existing linked data representations of  
>> bibliographic
>> >> data can be crawled and aggregated and see what works and what doesn't
>> >>
>> >> I'm sure there is other stuff, but those are the ones that spring to mind
>> >> first
>> >>
>> >> The work of the JISC 'RDTF' (Resource Discovery Task Force) in the UK is
>> >> looking at the strategy of 'publish' and 'aggregate' - although  
>> this doesn't
>> >> dictate the use of Linked Data or RDF, many of the project  
>> falling into this
>> >> area are adopting that approach, so hopefully we will see a good  
>> exploration
>> >> of some of the issues from this area soon. See  
>> http://rdtf.mimas.ac.uk/ for
>> >> more information on this.
>> >>
>> >> Owen
>> >>
>> >>
>> >> Owen Stephens
>> >> Owen Stephens Consulting
>> >> Web: http://www.ostephens.com
>> >> Email: owen@ostephens.com
>> >> Telephone: 0121 288 6936
>> >>
>> >> On 23 Mar 2011, at 17:16, stu wrote:
>> >>
>> >> *On Thu, Mar 24, 2011 at 1:18 AM, Neubert Joachim
>> <J.Neubert@zbw.eu>wrote:
>> >>
>> >> I'm not sure that a centralized model for building clusters  
>> (like VIAF) or
>> >> a pre-declared central hub ("everybody maps to
>> >> WorldCat/OpenLibrary/whatever") could work.*
>> >>
>> >> A centralized model is essential if global bibliography is to be an
>> >> important part of the Web.  Sure, there are work-arounds  
>> involving declared
>> >> or inferred equivalence.  These all require additional work on  
>> the part of
>> >> systems and people, which will rarely be expended, with the  
>> result that link
>> >> potency will (continue to) be diluted to insignificance.
>> >>
>> >> Is it important enough for the global library community to expend the
>> >> resources to consolidate meaningful global bibliography?  Can  
>> the political
>> >> impediments be overcome?
>> >>
>> >> I continue to believe that OCLC is the only likely candidate  
>> with a chance
>> >> to make this happen, and it appears that the business cases are too weak,
>> >> and constituent demand too feeble for that to happen in the current
>> >> environment.
>> >>
>> >> I just Googled the book closest to hand, and on the first page, Wikipedia
>> >> was number one, and there were two Amazon links in the top ten.   
>> No library
>> >> link of any sort appeared on the page.
>> >>
>> >> Linked data isn't going to change this without a centralized identifier
>> >> infrastructure.
>> >>
>> >> stu
>> >>
>> >>
>> >>
>> >>>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Karen Coyle
>> kcoyle@kcoyle.net http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
>>
>
>
>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Thursday, 24 March 2011 11:03:11 UTC