Re: [open-bibliography] Call for Use Cases: Library Linked Data from Jodi Schneider on 2010-10-17 (public-lld@w3.org from October 2010)

From: Jodi Schneider <jodi.schneider@deri.org>
Date: Sun, 17 Oct 2010 20:31:08 +0100
To: Jim Pitman <pitman@stat.Berkeley.EDU>
Cc: public-lld@w3.org, open-bibliography@lists.okfn.org
Message-Id: <63957687-030A-4187-B748-D37A601B62E1@deri.org>
Thanks, Jim. Interesting use case. It's now on the LLD wiki at
http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Community_Information_Service

Best,
Jodi

On 17 Oct 2010, at 19:34, Jim Pitman wrote:

> Jodi,
> 
> Here's another response to the call for use cases.
> 
> many thanks for your assistance
> 
> --Jim
> ----------------------------------------------
> Jim Pitman
> Director, Bibliographic Knowledge Network Project
> http://www.bibkn.org/
> 
> Professor of Statistics and Mathematics
> University of California
> 367 Evans Hall # 3860
> Berkeley, CA 94720-3860
> 
> ph: 510-642-9970  fax: 510-642-7892
> e-mail: pitman@stat.berkeley.edu
> URL: http://www.stat.berkeley.edu/users/pitman
> ----------------------------------------------
> 
> === name ===
> 
> Community Information Service
> 
> === Owner ===
> 
> Jim Pitman
> http://www.stat.berkeley.edu/~pitman/
> 
> === Background and Current Practice ===
> 
> Academic organizations of varying sizes (research groups, university departments,
> universities, university consortia, subject specific communities such as scholarly societies and special interest groups) 
> have a strong interest in maintaining awareness and quality of information in their domain, and in openly publishing this
> information to the broader academic community and to the general public.
> A significant component of this information is bibliographic metadata available from library resources, 
> especially information about books and articles published in a particular field, or associated with a particular
> institution.
> Current practice varies greatly.  Many publishers and scholarly societies offer subscription-based A&I services which are paid
> for by libraries. Typical license agreements limit these services to "individual" use.
> This inhibits creative selection, remixing  and republication  of bibliographic metadata by interested individuals and organizations.
> Another service is provided by Google Scholar.  But again, selective harvesting and reuse of the data is inhibited by terms of use.
> 
> Most university departments and universities are unable to extract from their university library catalogs a list of all publications
> of their own faculty. Even if they could, they are typically not be allowed to publish it without renegotiating license agreements with
> bibliographic metadata suppliers.
> A typical subject-specific interest group may be able to extract subject-specific bibliographic metadata from a variety of sources.
> But again, there is a high barrier to cross before the group can obtain clear rights to republish or remix such material.
> Essentially, the group has to acquire some legal identity, capable of making licensing agreements, before it can do so legally.
> Then the group has to find a business model capable of supporting some individual whose job it is to manage such agreements.
> This organizational overhead is unnecessary in a universe of linked data. 
> 
> === Goal ===
> 
> Make libary catalog and other publisher-genetated bibliographic metadata freely available to community data curators so it is easily filtered by 
> author/affiliation/subject/... to allow large numbers of small to medium sized academic communities to easily extract what data is of particular interest to them, 
> with minimal technical and legal overhead, and to openly republish that data in ways they find worthwhile. For example, by selecting, ranking or 
> classifying the data, and providing simple searches and faceted displays over bibliographic collections of special interest to the community.
> 
> How to use linked data technology to achieve this goal: provide the data with an open license which allows its reuse for such purposes,
> and support the APIs, data standards and client software to lower the barrier to participation in information curation and sharing.
> 
> === Target Audience ===
> 
> Scholars as service providers: all those who edit, curate and arrange scholarly information for the purpose of making it openly
> accessible to a wide audience.
> Indirectly, the general public which may find subject-specific resources curated by scholars more informative 
> than generic search services or Wikipedia.
> Computer programs, inasmuch as these may be used for tasks of filtering, deduplication,  selection, ... to save the time of expert curators.
> 
> === Use Case Scenario ===
> 
> Curator of a community information service selects data from input sources to determine what books, articles, photographs, videos, .... 
> were published recently which would be of interest to the community. 
> Curator has input data available in such a way that they can easily control what is piped through to their information service.
> 
> === Application of linked data for the given use case ===
> 
> Make it easy for data providers (publishers, libraries, other aggregators) to provide linked data with suitable API and client software 
> for community data curators to use.
> Curators should expect that bibliographic records come equipped with identifiers for all entities
> (editions, people, subjects, journals, publishers, .... ) and that this information is easily loaded into some
> community managed CMS to allow remixing with whatever ranking/selection/faceting/... the community service may wish to provide.
> 
> === Existing Work (optional) ===
> 
> Most A&I services maintain some data ingest systems for these purposes. But they are usually proprietary, and not readily available for use by smaller agents with 
> interests in biblio data curation.  These mostly rely on converting raw publisher data into proprietary biblio formats for internal use, and licensing 
> data to libraries in degraded formats for use by supplicant scholars.  These services add no value to the universe of linked data, but rather compete with it.
> Some examples of software systems for open display of community curated bibliographic collections are
> BibSonomy, BibServer, BibApp, Open Scholar. All of these systems would benefit from easy
> availability of comprehensive linked library and publisher data via  API.
> An example of a typical community website which would benefit greatly from integration with linked data is the Probability Web. 
> See especially the lists of Books, People, and the link to the Probability Abstract Service, all of which could be
> recreated to both import and export linked data.
> There are more advanced services in other fields, especially RePEc (laudably open, but with large amounts of data whose license status is indeterminate)
> and SSRN (free but not open to reuse). Such large community services are typically built with an architecture that is difficult to replicate. 
> What is needed is a simple and easily replicable architecture for community data curation services of various sizes to develop and interoperate.
> BKNpeople and VIVO  are starts in this direction at the level of identifying people and their interests. Integation of
> such systems with the ORCID initiative will be important. See also the BKN Project.
> 
> === Related Vocabularies ===
> 
> BIBO, CiTO, ... 
> 
> === Problems and Limitations ===
> 
> Reasons why this scenario is or may be difficult to achieve:
> 
> Social/Economic/Legal
> -- vested interests in A&I services
> -- lack of suitably licensed metadata
> -- commercial publishers, universities and conservative scholarly societies  refusing to release their metadata with an open license
> 
> Technical obstacles:
> Lack of convergence towards a simple widely adopted standard for exchange of bibliographic metadata suitable for the community
> information service use case.
> The necessary data fields are little more than traditional bibtex fields, plus some conventions for handling entity identifiers and links.
> BibJSON is an attempt at an adequate lightweight data exchange standard, compatible with linked data principles,
> and influenced by the success of BibTeX and RePEc's Academic Metadata Format.
> This standard is easily managed and understood by typical community data service managers, even without advanced software tools.
> Providing and managing/adapting/maintaining good UIs for non-technical curators to manage BibJSON or similar record structures is the biggest technical challenge.
> Also, supporting the necessary CMS over which these UIs can operate.
> Needlebase shows promise of providing an adequate UI over a graphical datastore.
> This is proprietary software, but it should be configurable to import and export linked data. Such systems for managing simple editorial
> workflows over linked data are greatly needed.
> 
> === Related Use Cases and Unanticipated Uses ===
> 
> If simple and easily affordable editorial systems are developed for managing collections of biblio data, it is hard to anticipate
> which agents will emerge to provide the best services on various scales. Communities nest and overlap with each other. They 
> compete for the attention of their members. If communities export their enhancements as linked data, this data may be consumed again by larger aggregators, 
> especially Google and other big players, in ways which which should greatly improve current means of search and discovery of academic information.
> 
> === References ===
> 
> Academic Metadata Format http://amf.openlib.org/doc/ebisu.html
> arXiv  http://arxiv.org/
> BibServer     http://bibserver.berkeley.edu/cgi-bin/bibs7?source=http://www.stat.berkeley.edu/users/pitman/bibserver.bib 
> BibApp  http://www.bibapp.org/
> BibJSON http://www.bibkn.org/bibjson/index.html
> BibTeX  http://en.wikipedia.org/wiki/BibTeX
> BibSonomy http://www.bibsonomy.org/
> BIBO http://bibliontology.com/
> BKNpeople  http://people.bibkn.org/
> BKN Project: http://www.bibkn.org/
> CiTO, the Citation Typing Ontology, by David Shotton.  http://dx.doi.org/10.1186/2041-1480-1-S1-S6
> Google Scholar http://scholar.google.com/
> Needlebase http://www.needlebase.com/ 
> Open Scholar http://scholar.harvard.edu/
> ORCID  http://www.orcid.org/
> Probability Abstract Service   http://pas.imstat.org/
> RePEc http://repec.org/
> SSRN http://www.ssrn.com/
> The Probability Web  http://www.mathcs.carleton.edu/probweb/probweb.html
> VIVO http://www.vivoweb.org/
>
Received on Sunday, 17 October 2010 19:31:49 UTC