Comments on the draft report from romain.wenz@bnf.fr on 2011-07-22 (public-lld@w3.org from July 2011)

From: <romain.wenz@bnf.fr>
Date: Fri, 22 Jul 2011 15:20:56 +0200
To: public-lld@w3.org
Message-ID: <OF0D1445D3.CB5A7972-ONC12578D5.00481F84-C12578D5.00495439@LocalDomain>

Hello,
With colleagese, we have been reviewing the draft report at
http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion

Please find enclosed some comments, section by section, and suggestions.

All best,
Romain Wenz

Département de l'Information Bibliographique et Numérique
Bibliothèque nationale de France
Quai François Mauriac
75706 Paris cedex 13
33 (0)1 53 79 37 39
----------------------------------------------------

3. Benefits

3.2. Benefits of the Linked Data approach
Comment: Libraries produce reliable data, especially vocabularies and
authority data. If they open them as linked data, as soon as they use
shared ontology, they can help structure the Web of Data with data that
can be trusted, with vocabularies that anyone can link to.

Suggestion The Web needs to be structured with reliable and clean data,
and libraries can provide them.

3.2.3. Benefits to Librarians, archivists and curators
Comment: Among the very positive aspects of ?linked data? for libraries,
there is the possibility to act at different levels, with various
benefits.

Suggestion Every approach can offer specific benefits, from internal
re-use of data and identifiers to links or services to the end-user.

3.2.4. Benefits to Developers
Comment: The general benefit is to get rid of specific library formats,
which are not really interoperable (e.g. various MARCs). This is very
important, so as to break barriers between libraries and between library
data and other types of data. But the transition from library-specific
data to LD won't be straightforward.

Suggestion It will be possible to work step by step, with Web protocols.
Suggestion A section that could be added as ?3.2.5.?:
?Benefits to service providers, software vendors and external developers:
These developers will work with other important players: service
providers, software vendors and external developers.
The consequences are:
- Research and development could be enhanced through these players.
They could also work with research laboratories.
- Libraries will still work with external vendors.
- A new market emerges for industrials, developers and service
providers, which can increase their financial benefits. For instance,
using interoperable RDF formats enable other actors to re-use structured
data provided by libraries.?

5. Relevant technologies
Comment: We are talking about building structure in Web content, so that
data from the Web can be used by machines, the way it would be in
databases.

Suggestion Building a « Linked data » infrastructure does not imply to
create yet another silo.

5.5 Microformats, Microdata and RDFa
Comment: Linked Data can go one step further from the work that has been
done, for instance for OAI sets.

Suggestion RDFa can be a step for using existing information by
distilling it into a Web structure.

6. Implementation challenges and barriers to adoption
The whole section is clumsy because it makes no difference between various
situations. We can find more or less advanced projects: as the ?use case?
section shows, libraries can be very innovative.
http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport

6.1 Designed for stability, the library ecosystem resists change
Comment: The library ecosystem has been changing since Zenodotus. Semantic
Web techniques are different from traditional computer services, and
budgets are not on a comparable basis. Furthermore, today libraries data
are digital data and it?s not necessary to program retrospective
conversion of printed catalogues. Data are already digital data,
structured with digital formats.
The historical depth of the libraries and librarian data is a very
important asset in the frame of the semantic web, for which the notion of
trust is essential. Libraries improve the quality of their data by
constant revisions.

Suggestion Even if designed for stability, the library ecosystem moved
early to computer systems and keeps adapting to technological changes.

6.1.2 Library Data is shareable among libraries, but not yet with the
wider world
Comment: Librarians often work, for instance, with the archival community.
For instance, XML DTD EAD (Encoded Archival Initiative) was jointly
created by librarians and archivists in order to encode descriptions of
archival collections.

Suggestion Through cooperation with Archives and Museums, libraries
already share data and standards with a ?Wider world?. Moving to Linked
Data is a natural continuation.

6.1.3 Libraries are understaffed in the technology area
This part is overstrong and rude to libraries who actually recruit and
work in the technology area.
Suggestion It is not just a matter of recruiting ?IT people?, but of t
raining librarians so that they are aware and efficient in Web
technologies, and making sure Computing departments and librarians work
together. This is what libraries do.

6.2 Libraries do not adapt well to technological change
Comment: Libraries will need to manage the legacy of MARC format-based
data for a long period of time even if they manage to shift to LD
strategies and tools for their current practices.
This means that before enjoying all the benefits of LD (listed in the
scope document), libraries will need to maintain parallel systems, which
means an increase of costs and efforts in software and format development
and in data management.
In the short term library developers will still have to deals with these
formats, which are renewed.

Suggestion When convincing examples are shown, Libraries adapt very well
to technological changes.

6.2.2 Library standardization process is cumbersome
Comment: But possible!
Libraries are used to transform their formats, to map them with other
formats, to make them evolve when they work on new projects, new
technologies, and new types of documents.

Suggestion It takes time, so that the formats fits to the need, but it is
part of the libraries? culture.

6.2.4 Library standards are limited to the library data
Comment: Library data are not only bibliographic data. Libraries
catalogues contains also authority records with many pieces of information
about persons, families, corporate bodies, works, and subjects. Authority
data provide nominated entities and may provide permanent identifiers for
these entities (such as ARK identifiers in BnF catalogues).

Suggestion With reliable identifiers, Authority data are also key
elements for the semantic web.

6.3 ROI is difficult to calculate
Comment: Benefits are as difficult as cost to estimate precisely, but some
can and must be underlined. Mutualisation of the creation of data reduces
redundancies, increases staff efficiency, and allows librarians to focus
on other tasks like research on collections or conservation.
Linking the data of a library to cooperative metadata produced by reliable
institutions adds value to its data. Opening library linked data may
create economical value for a country, by allowing commercial reuses of
that data (Open data). Opening library data increases the users traffic
and the visibility of collections (through reuse, SEO, etc.), and thus the
possibilities of their ROI.
Using richer, more flexible, more relevant data improves the accessibility
and the services to users: in public institutions, public utility is a ROI
by itself. Helping researchers is another one.

Suggestion It is difficult to calculate ROI precisely, but it is easy to
see financial benefits (re-use, links, cuts of redundant tasks).

6.3.3 Vocabulary changes in library data are costly
Comment: With an Authority File providing permanent identifiers and links,
it is relatively easy to update any field linked with it. All changes in
authority records can be automatically transferred into related
bibliographic records.

Suggestion Moving to linked data implies to rely on authority files and
identifiers.

6.4.1- Some data cannot be published openly
Comment: In some countries, there is a distinction between ?public
information? and ?information that can be processed by machines?. In that
case, information that is available for individuals needs to be justified
and declared for massive use in computer programs.

Suggestion There can be national specificities. They have to be clearly
stated by the publishers.

6.4.2- Rights ownership can be unmanageably complex
Comment: Copied and extracted records are one thing. There is also a
question about the ?linked data itself?. The need to quote also means, for
the provider, being able to report about the use. In some countries
(including France) the use of the tax-payer?s money has to be justified.
You have to report for the money: the only way to do it is to have
metrics. This implies knowing who is using the data, even for free.

Suggestion Thanks for feedback and quoting if you use our data!

7. Recommendations

7.1.1 Identify sets of data as possible candidates for early exposure as
LD
Comment: Structured data rely on the use of identifiers. Publishing early
authority files and controlled vocabularies as linked data will make
easier further publication of bibliographic records as linked data, by
allowing links to them as a backbone for bibliographical information.
Suggestion Authority files can be a basket for the "low hanging fruits"
from other libraries.

7.1.2. For each set of data, determine ROI of current practices, and costs
and ROI of exposing as LD
Comment: Determining costs and ROI of exposing sets of data will help
choosing witch value vocabularies and datasets could have priority.
Therefore, determining ROI has to be done globally.
Suggestion Not necessarily ?for each set of data?.

7.1.3. Consider migration strategies
Comment: Using Semantic Web technologies inside the library ?catalogue?
seems very promising, because it will allow a very more flexible and
interoperable use of data: modelling, linking, merging, querying, removing
redundancies, integrating external data from various formats and
publishing as various formats, etc.
This is obviously a great aim for libraries, but it is much more difficult
than only publishing data as linked data. It must not be an obstacle: it
may be better for a library to publish first some sets of data as liked
data than trying from the beginning to migrate its entire catalogue.
Therefore, the migration of data does not need to cover all possible data.
It can be only the useful part. This is obviously the case when commercial
services use RDFa for SEO, with the subset of products which people will
be looking for. In fact, when we are just putting data into RDF, it is not
useful if there are no links.

Suggestion Libraries can ?pick and choose? what is relevant and migrate
it.
Suggestion Using RDF inside the systems themselves is another question
that has to be advocated.

7.2.2: Identify Linked Data literacy needed for different staff roles in
the library
Comment: In fact, when using the current datasets so as to use them in
RDF, we see that cataloguing still has to address the creation of links.
Mainly for reconciliation and alignments of concepts (for instance: ?do
those two books tell the same story??). There, the data obviously still
needs to be curated by humans.
But by re-using links and data produced by others, we can expect the
cataloguing work to be:
- more centralized;
- more about creating links (less about writing dates, names or page
numbers?).

Suggestion These evolutions have to be clear on the business side.

7.4 Identify and Link.
7.4.1 ?Create URIs for the items in library datasets?
Comment: Providing identifier is the only way to make links. For big
libraries permanent identifiers are already being used (e.g. ARK
identifiers for all resources at the BnF).

Suggestion This is the basis.

Exposition Enluminures en terre d?Islam entre abstraction et figuration - jusqu'au 25 septembre 2011 - BnF - Richelieu / Galerie Mansart Avant d'imprimer, pensez à l'environnement.

Received on Friday, 22 July 2011 13:21:17 UTC