W3C home > Mailing lists > Public > public-xg-lld@w3.org > February 2011

RE: Brainstorming: Key Issues

From: Ford, Kevin <kefo@loc.gov>
Date: Wed, 23 Feb 2011 16:40:57 -0500
To: Ross Singer <ross.singer@talis.com>, Karen Coyle <kcoyle@kcoyle.net>
CC: "public-xg-lld@w3.org" <public-xg-lld@w3.org>
Message-ID: <1D525027B29706438707F336D75A279F167D92E957@LCXCLMB03.LCDS.LOC.GOV>
Action item....


1. Clear Purpose and Objective with LLD

What's the case for LLD?  What problem does *Library* LD help to solve?  How will it be of benefit to libraries (in addition to and/or improvements on current practices, current standards, current costs, etc)?  How will LLD help libraries better serve their users?

This group's charter notes that "the mission of the Library Linked Data incubator group is to help increase global interoperability of library data on the Web ..." [1].   This is fine, but what is the benefit of this activity to libraries, which will be asked to pour resources into this activity?  To ask this in a more challenging way:  So what?  How are libraries losing out now?  What's broke that needs fixing?

Will LLD only benefit data consumers?  Who are these data consumers?  How will it benefit my mother, or your's?  Will it benefit data creators (catalogers, for example)?  Will it benefit those managing library technology?

I feel that whatever follows in this document should further illuminate the answers to these types of  questions.

FWIW, it's fine that end users (my mother, e.g.) benefits only indirectly.  Perhaps even catalogers benefit indirectly.  I'm not suggesting there are right or wrong answers here, but that we identify how and to whom LLD will be of benefit (though, generically, it absolutely must be of benefit to libraries).

It might be that these questions are closely related to point 2.


2. Attention to Education and Outreach

With 1, general education will be a must, as Karen rightly noted.  Good answers to 1 will make this easier by providing focus to education efforts.  It might also help to convince decision makers to direct resources toward LLD projects.


3. Open Data

Lot's of talk about LLD, but little about Library LOD.  The data needs to be freed from restrictions and, as Ross suggested, preferably bulk downloads provided.  To echo Ross's sentiment, should the BL ping VIAF or ID 10s of millions of times for information?

The inability to share/include/use resources/data with minimal restrictions, from an array of sources, will negatively impact interoperability, the same "interoperability" from the mission statement.


4. Data Modeling and Legacy Data

I think these are two sides of the same coin and should be treated simultaneously.  Data modeling questions should be asked in light of current and future practices.  Is a new data model needed for LLD?  If so, how far must it depart from current models (and why)?  If a new data model is to be recommended, what is the scope and purpose of this data model?  Only for LLD purposes (which is the scope of the current charter)?  Or meant to replace the current model completely (which, while not mutually exclusive, is technically beyond the current charter)?  What impact, if any, does a new data model have on legacy data?

There's been talk about data modeling and a general understanding that something different is needed.  It is beyond the scope of this group to recommend a detailed solution, but this group should be able to talk about how current data models are insufficient to the task and make general recommendations in light of those reservations.  It should be clear how proposed models will (positively) impact the audience members identified in 1.


5. Technology systems

Is the current LD technology stack suitable?   If libraries are to begin sharing the very information crucial to bibliographic description (a mere link to a subject heading versus a string's presence in the data), in no small part by relying on data from external sources, do specific technological requirements need to be defined to support look-up services, not only of known resources but of yet-matched strings?  SPARQL end points have not been widely implemented in existing LLD Implementations.

What technological needs will be required, if any, given the potential scope of change that could accompany a new data model?  Perhaps for LLD, very little is needed beyond the current technology stack.  That would make any new data model an auxiliary model to the current one, no?


Kevin

[1] http://www.w3.org/2005/Incubator/lld/


________________________________________
From: public-xg-lld-request@w3.org [public-xg-lld-request@w3.org] On Behalf Of Ross Singer [ross.singer@talis.com]
Sent: Tuesday, February 22, 2011 06:19
To: Karen Coyle
Cc: public-xg-lld@w3.org
Subject: Re: Brainstorming: Key Issues

Here are some other ideas, some related to Karen's:

1) Where to start?  To convert a dataset of any significant size, we'll need name authorities, subject thesauri, controlled vocabulary terms, etc.  If everyone does this in isolation, minting their own URIs, etc., how is this any better than silos of MARC records?  How do institutions the size of University of Michigan or Stanford get access to datasets such as VIAF so they don't have to do millions of requests every time they remodel their data?  How do they know which dataset to look in for a particular value?  What about all of the data that won't be found in centralized datasets (local subject headings, subject headings based on authorities with floating terms, names not in the NAF, etc.)?

2) How do we keep the original data and linked data in sync?  If changes happen to the linked data representation, how do we funnel that back into the original representation?  Do we even want to?

3) The richer the data, the more complicated the dependencies: how do we prevent rats nests of possible licensing issues (Karen raised this, as well)?  Similarly, this web also creates an n+1 problem: there's always the potential of a new URIs being introduced with each graph; how much is enough?  How will a library know?

4) How do we deal with incorrect data that we don't own/manage?

5) As the graph around a particular resource improves in quality, how do these changes propagate around to the various copies of the data?  How do libraries deal with the changes (not only regarding conflicts, but how to keep up with changes in the data model, with regard to indexing, etc.)?

6) Piggybacking on Karen's "chicken or the egg" problem, who will be first to take the plunge?  What is the benefit for them to do so?  In the absence of standards, will their experience have any influence on how standards are created (that is, will they go through the work only to have to later retool everything)?

-Ross.

On Thu, Feb 17, 2011 at 12:26 PM, Karen Coyle <kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>> wrote:
This is my kick-off for brainstorming and key issues. I'd suggest that
for the first go-round we not worry about structure or levels of
granularity but just throw out ideas. I'll do my best to keep track
and we can then come back and have a more coordinated discussion.

Karen's list:

1) Community agreement and leadership
 There are many in the community who are either not interested in
LLD, don't know about LLD, or who are actually opposed to LLD. At the
moment, there are no centers of leadership to facilitate such a major
change to library thinking about its data (although IFLA is probably
the most active).

2) Funding
 It is still quite difficult to convince potential funders that this
is an important area to be working in. This is the "chicken/egg"
problem, that without something to show funders, you can't get funding.

3) Legacy data
 The library world has an enormous cache of data that is somewhat
standardized but uses an antiquated concept of data and data modeling.
Transformation of this data will take coordination (since libraries
share data and systems for data creation). But before it can be
transformed it needs to be analyzed and there must be a plan for
converting it to linked data. (There is a need for library systems to
be part of this change, and that is very complex.)

4) Openness and rights issues
 While linked data can be used in an enterprise system, the value
for libraries is to encourage open use of bibliographic data.
Institutions that "own" bibliographic data may be under constraints,
legal or otherwise, that do not allow them to let their data be used
openly. We need to overcome this out-dated concept of data ownership.

5) Standards
 Libraries need to take advantage of the economies of scale that
data sharing afford. This means that libraries will need to apply
standards to their data for use within libraries and library systems.

You can comment on these and/or post your own. Don't think about it
too hard -- let's get as many issues on the table as we can! (I did 5
- you can do any number you wish.)

kc

--
Karen Coyle
kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Wednesday, 23 February 2011 21:43:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 23 February 2011 21:43:17 GMT