AW: Re: LD and Redundancy from Adrian Pohl on 2011-03-24 (public-lld@w3.org from March 2011)

From: Adrian Pohl <pohl@hbz-nrw.de>
Date: Thu, 24 Mar 2011 10:32:45 +0100
To: "public-lld" <public-lld@w3.org>
Cc: <culturegraph@lists.d-nb.de>
Message-Id: <4D8B1DCD020000140003E516@agrippa.hbz-nrw.de>
Hello,

German National Library and the hbz are working for some time now on a
service[1] to tackle the problem of redundancy or - as it is also called
- co-reference[2]. 

Basically the service groups identifiers and descriptions for same and
similar entities (records and bibliographic resources) and provides
trustworthy URIs which individual libraries can link their URIs from
their own namespace to. Thus, the approach is to establish some kind of
master-URIs that anybody can link to. As a resolving service it will
provide possible URIs to link to when someone puts in an identifier
(ISBN, OCL number, German National Library number etc.) or a record. If
a bibliographic entity isn't described yet it will create a new URI. The
service is based on a NoSQL data base for indexing and matching
purposes. The information about co-occurence of identifiers will be
provided as Linked Open Data. Also, the underlying software is Open
Source.[3]

By now, this project is focussed on data from the German library
networks as we can concentrate in the first step on some helpful
identifiers which aren't in use internationally. Also, it is possible to
match records from institutions which haven't published Linked Data yet:
We are cooperating with the German Working Group of Library Networks and
get data from all six library networks to do the matching. Nonetheless,
the service isn't intended to be limited to German data in the
future...

More information about this project (in German only) can be found at
[4]. A prototype will be ready in the next weeks.

Adrian

[1] See the announcement at
http://www.hbz-nrw.de/dokumentencenter/presse/pm/culturegraph_en and
the about text at http://www.culturegraph.org/website/about_en.htm

[2] For references about co-reference in the Semantic Web see
http://www.bibsonomy.org/user/acka47/co-reference.

[3] See http://culturegraph.sourceforge.net/

[4] 
https://wiki1.hbz-nrw.de/display/SEM/Resolving-+und+Lookup-Dienst+fuer+bibliothekarische+Identifier+in+culturegraph.org



 >>>"Haffner, Alexander" <A.Haffner@dnb.de> schrieb am Donnerstag, 24.
März 2011 um
08:16:
> I also agree to the point that we unfortunately have to deal with 
> redundancies. 
> 
> But for the report IMO we should strictly differentiate between
authority 
> data and bibliographic data. I reckon we can suggest a centralization
of 
> authority data at least on national level. I assume a downsizing of 
> bibliographic redundancies needs consolidated authority data and of
course 
> the consequent alignment of authorities in bibliographic entries. 
> 
> Without raising the FRBR discussion again, I think one of the
redundancy 
> reasons in bibliographic data is the lack of possibilities to link
items with 
> trustworthy bibliographic records. Everyone is creating own new
bibliographic 
> records - but this is also caused by the harvesting approach in
current 
> library environments and probably not changeable soon.
> 
> And a last point: for the identification and decrease of redundancies
it's 
> helpful to have standardized ontologies for the library community -
maybe not 
> only RDA but definitely not more than a handful... 
> 
> alex
> 
> 
>> -----Ursprüngliche Nachricht-----
>> Von: public-lld-request@w3.org [mailto:public-lld-request@w3.org] Im
Auftrag von
>> Karen Coyle
>> Gesendet: Donnerstag, 24. März 2011 01:11
>> An: Ross Singer
>> Cc: public-lld
>> Betreff: [Spam-Wahrscheinlichkeit=45]Re: LD and Redundancy
>> 
>> Following up to this, we seem to agree that there will be
redundancy
>> of data and of identifiers. Is this a particular LLD issue that
should
>> be included in the group's report, or is this a general SemWeb
issue
>> that we can assume will be addressed in the normal course of
things?
>> At the moment there is a brief mention of this in the issues area
of
>> the report, but we're unsure what to say about it.
>> 
>> Perhaps we can resolve this on tomorrow's call.
>> 
>> Thanks, all,
>> kc
>> 
>> Quoting Ross Singer <ross.singer@talis.com>:
>> 
>> > I think we're going to have to assume there will be lots of
duplication of
>> > resources describing the same thing with different identifiers
(although,
>> > hopefully interrelated) for a couple of reasons:
>> >
>> > 1) A centralized repository will never be able to keep up with
everything -
>> > there will always be nodes with resources described prior to being
added to
>> > the repository; possibly never added.  These could also spring up
in
>> > multiple places independently
>> > 2) We should not expect universal, 100% agreement on how things
are
>> > defined/described.  We don't have this now, we certainly can't
expect this
>> > to change.
>> > 3) There are lots of non-authoritative resources (subject
headings, people,
>> > class numbers, etc.)
>> > 4) A centralized repository would have to rely quite heavily on
discovery
>> >     - there's a huge danger of GIGO here (there are plenty of
typos in the
>> > historical record)
>> >     - plenty of chances of failed searches
>> >
>> > Couple this to the fact that (most) everybody is going to to have
to
>> > duplicate all of the data for local indexing purposes, anyway...
>> >
>> > -Ross.
>> >
>> > On Wed, Mar 23, 2011 at 3:37 PM, Owen Stephens
<owen@ostephens.com>
>> wrote:
>> >
>> >> I tend to agree with Joachim - we will see more data publication
and at
>> >> least in this phase will see plenty of institutions coining their
own URIs.
>> >> However, I also believe that the web tends towards less
duplication (this
>> >> isn't anything close to no duplication, just less duplication
than we would
>> >> have otherwise).
>> >>
>> >> We are already seeing that established URIs will be used where
they exist
>> >> (e.g. for LCSH) - and I guess we can expect to see more of
these.
>> >>
>> >> That said, I think aggregations are a good thing (and inevitable)
- and the
>> >> more identifiers are shared, and the more people make sameas and
similar
>> >> statements, the easier aggregation will become.
>> >>
>> >> In terms of what we should be doing now? I'd say:
>> >>
>> >> Encourage re-use of URIs (ideally this would be baked into record
creation
>> >> in libraries, but that's a whole other ball game)
>> >> Encourage sameas statements where new URIs have been coined (and
>> >> appropriate)
>> >> Start looking at how existing linked data representations of
bibliographic
>> >> data can be crawled and aggregated and see what works and what
doesn't
>> >>
>> >> I'm sure there is other stuff, but those are the ones that spring
to mind
>> >> first
>> >>
>> >> The work of the JISC 'RDTF' (Resource Discovery Task Force) in
the UK is
>> >> looking at the strategy of 'publish' and 'aggregate' - although
this doesn't
>> >> dictate the use of Linked Data or RDF, many of the project
falling into 
> this
>> >> area are adopting that approach, so hopefully we will see a good

> exploration
>> >> of some of the issues from this area soon. See
http://rdtf.mimas.ac.uk/ for
>> >> more information on this.
>> >>
>> >> Owen
>> >>
>> >>
>> >> Owen Stephens
>> >> Owen Stephens Consulting
>> >> Web: http://www.ostephens.com 
>> >> Email: owen@ostephens.com 
>> >> Telephone: 0121 288 6936
>> >>
>> >> On 23 Mar 2011, at 17:16, stu wrote:
>> >>
>> >> *On Thu, Mar 24, 2011 at 1:18 AM, Neubert Joachim
>> <J.Neubert@zbw.eu>wrote:
>> >>
>> >> I'm not sure that a centralized model for building clusters (like
VIAF) or
>> >> a pre-declared central hub ("everybody maps to
>> >> WorldCat/OpenLibrary/whatever") could work.*
>> >>
>> >> A centralized model is essential if global bibliography is to be
an
>> >> important part of the Web.  Sure, there are work-arounds
involving declared
>> >> or inferred equivalence.  These all require additional work on
the part of
>> >> systems and people, which will rarely be expended, with the
result that 
> link
>> >> potency will (continue to) be diluted to insignificance.
>> >>
>> >> Is it important enough for the global library community to expend
the
>> >> resources to consolidate meaningful global bibliography?  Can the
political
>> >> impediments be overcome?
>> >>
>> >> I continue to believe that OCLC is the only likely candidate with
a chance
>> >> to make this happen, and it appears that the business cases are
too weak,
>> >> and constituent demand too feeble for that to happen in the
current
>> >> environment.
>> >>
>> >> I just Googled the book closest to hand, and on the first page,
Wikipedia
>> >> was number one, and there were two Amazon links in the top ten. 
No library
>> >> link of any sort appeared on the page.
>> >>
>> >> Linked data isn't going to change this without a centralized
identifier
>> >> infrastructure.
>> >>
>> >> stu
>> >>
>> >>
>> >>
>> >>>
>> >>
>> >>
>> >
>> 
>> 
>> 
>> --
>> Karen Coyle
>> kcoyle@kcoyle.net http://kcoyle.net 
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
>>
Received on Thursday, 24 March 2011 09:33:43 UTC