Re: [open-bibliography] Call for Use Cases: Library Linked Data from Jim Pitman on 2010-10-17 (public-lld@w3.org from October 2010)

From: Jim Pitman <pitman@stat.Berkeley.EDU>
Date: Sun, 17 Oct 2010 11:34:53 -0700
To: public-lld@w3.org, jodi.schneider@deri.org
Cc: open-bibliography@lists.okfn.org
Message-Id: <20101017183453.570307F80@bibserver.Berkeley.EDU>

Jodi,

Here's another response to the call for use cases.

many thanks for your assistance

--Jim
----------------------------------------------
Jim Pitman
Director, Bibliographic Knowledge Network Project
http://www.bibkn.org/

Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860

ph: 510-642-9970 fax: 510-642-7892
e-mail: pitman@stat.berkeley.edu
URL: http://www.stat.berkeley.edu/users/pitman
----------------------------------------------

=== name ===

Community Information Service

=== Owner ===

Jim Pitman
http://www.stat.berkeley.edu/~pitman/

=== Background and Current Practice ===

Academic organizations of varying sizes (research groups, university departments,
universities, university consortia, subject specific communities such as scholarly societies and special interest groups)
have a strong interest in maintaining awareness and quality of information in their domain, and in openly publishing this
information to the broader academic community and to the general public.
A significant component of this information is bibliographic metadata available from library resources,
especially information about books and articles published in a particular field, or associated with a particular
institution.
Current practice varies greatly. Many publishers and scholarly societies offer subscription-based A&I services which are paid
for by libraries. Typical license agreements limit these services to "individual" use.
This inhibits creative selection, remixing and republication of bibliographic metadata by interested individuals and organizations.
Another service is provided by Google Scholar. But again, selective harvesting and reuse of the data is inhibited by terms of use.

Most university departments and universities are unable to extract from their university library catalogs a list of all publications
of their own faculty. Even if they could, they are typically not be allowed to publish it without renegotiating license agreements with
bibliographic metadata suppliers.
A typical subject-specific interest group may be able to extract subject-specific bibliographic metadata from a variety of sources.
But again, there is a high barrier to cross before the group can obtain clear rights to republish or remix such material.
Essentially, the group has to acquire some legal identity, capable of making licensing agreements, before it can do so legally.
Then the group has to find a business model capable of supporting some individual whose job it is to manage such agreements.
This organizational overhead is unnecessary in a universe of linked data.

=== Goal ===

Make libary catalog and other publisher-genetated bibliographic metadata freely available to community data curators so it is easily filtered by
author/affiliation/subject/... to allow large numbers of small to medium sized academic communities to easily extract what data is of particular interest to them,
with minimal technical and legal overhead, and to openly republish that data in ways they find worthwhile. For example, by selecting, ranking or
classifying the data, and providing simple searches and faceted displays over bibliographic collections of special interest to the community.

How to use linked data technology to achieve this goal: provide the data with an open license which allows its reuse for such purposes,
and support the APIs, data standards and client software to lower the barrier to participation in information curation and sharing.

=== Target Audience ===

Scholars as service providers: all those who edit, curate and arrange scholarly information for the purpose of making it openly
accessible to a wide audience.
Indirectly, the general public which may find subject-specific resources curated by scholars more informative
than generic search services or Wikipedia.
Computer programs, inasmuch as these may be used for tasks of filtering, deduplication, selection, ... to save the time of expert curators.

=== Use Case Scenario ===

Curator of a community information service selects data from input sources to determine what books, articles, photographs, videos, ....
were published recently which would be of interest to the community.
Curator has input data available in such a way that they can easily control what is piped through to their information service.

=== Application of linked data for the given use case ===

Make it easy for data providers (publishers, libraries, other aggregators) to provide linked data with suitable API and client software
for community data curators to use.
Curators should expect that bibliographic records come equipped with identifiers for all entities
(editions, people, subjects, journals, publishers, .... ) and that this information is easily loaded into some
community managed CMS to allow remixing with whatever ranking/selection/faceting/... the community service may wish to provide.

=== Existing Work (optional) ===

Most A&I services maintain some data ingest systems for these purposes. But they are usually proprietary, and not readily available for use by smaller agents with
interests in biblio data curation. These mostly rely on converting raw publisher data into proprietary biblio formats for internal use, and licensing
data to libraries in degraded formats for use by supplicant scholars. These services add no value to the universe of linked data, but rather compete with it.
Some examples of software systems for open display of community curated bibliographic collections are
BibSonomy, BibServer, BibApp, Open Scholar. All of these systems would benefit from easy
availability of comprehensive linked library and publisher data via API.
An example of a typical community website which would benefit greatly from integration with linked data is the Probability Web.
See especially the lists of Books, People, and the link to the Probability Abstract Service, all of which could be
recreated to both import and export linked data.
There are more advanced services in other fields, especially RePEc (laudably open, but with large amounts of data whose license status is indeterminate)
and SSRN (free but not open to reuse). Such large community services are typically built with an architecture that is difficult to replicate.
What is needed is a simple and easily replicable architecture for community data curation services of various sizes to develop and interoperate.
BKNpeople and VIVO are starts in this direction at the level of identifying people and their interests. Integation of
such systems with the ORCID initiative will be important. See also the BKN Project.

=== Related Vocabularies ===

BIBO, CiTO, ...

=== Problems and Limitations ===

Reasons why this scenario is or may be difficult to achieve:

Social/Economic/Legal
-- vested interests in A&I services
-- lack of suitably licensed metadata
-- commercial publishers, universities and conservative scholarly societies refusing to release their metadata with an open license

Technical obstacles:
Lack of convergence towards a simple widely adopted standard for exchange of bibliographic metadata suitable for the community
information service use case.
The necessary data fields are little more than traditional bibtex fields, plus some conventions for handling entity identifiers and links.
BibJSON is an attempt at an adequate lightweight data exchange standard, compatible with linked data principles,
and influenced by the success of BibTeX and RePEc's Academic Metadata Format.
This standard is easily managed and understood by typical community data service managers, even without advanced software tools.
Providing and managing/adapting/maintaining good UIs for non-technical curators to manage BibJSON or similar record structures is the biggest technical challenge.
Also, supporting the necessary CMS over which these UIs can operate.
Needlebase shows promise of providing an adequate UI over a graphical datastore.
This is proprietary software, but it should be configurable to import and export linked data. Such systems for managing simple editorial
workflows over linked data are greatly needed.

=== Related Use Cases and Unanticipated Uses ===

If simple and easily affordable editorial systems are developed for managing collections of biblio data, it is hard to anticipate
which agents will emerge to provide the best services on various scales. Communities nest and overlap with each other. They
compete for the attention of their members. If communities export their enhancements as linked data, this data may be consumed again by larger aggregators,
especially Google and other big players, in ways which which should greatly improve current means of search and discovery of academic information.

=== References ===

Academic Metadata Format http://amf.openlib.org/doc/ebisu.html
arXiv http://arxiv.org/
BibServer http://bibserver.berkeley.edu/cgi-bin/bibs7?source=http://www.stat.berkeley.edu/users/pitman/bibserver.bib
BibApp http://www.bibapp.org/
BibJSON http://www.bibkn.org/bibjson/index.html
BibTeX http://en.wikipedia.org/wiki/BibTeX
BibSonomy http://www.bibsonomy.org/
BIBO http://bibliontology.com/
BKNpeople http://people.bibkn.org/
BKN Project: http://www.bibkn.org/
CiTO, the Citation Typing Ontology, by David Shotton. http://dx.doi.org/10.1186/2041-1480-1-S1-S6
Google Scholar http://scholar.google.com/
Needlebase http://www.needlebase.com/
Open Scholar http://scholar.harvard.edu/
ORCID http://www.orcid.org/
Probability Abstract Service http://pas.imstat.org/
RePEc http://repec.org/
SSRN http://www.ssrn.com/
The Probability Web http://www.mathcs.carleton.edu/probweb/probweb.html
VIVO http://www.vivoweb.org/

Received on Monday, 18 October 2010 12:41:59 UTC