W3C home > Mailing lists > Public > public-lld@w3.org > July 2011

Comments on the draft report

From: <romain.wenz@bnf.fr>
Date: Fri, 22 Jul 2011 15:20:56 +0200
To: public-lld@w3.org
Message-ID: <OF0D1445D3.CB5A7972-ONC12578D5.00481F84-C12578D5.00495439@LocalDomain>
Hello,
With colleagese, we have been reviewing the draft report at 
http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion 

Please find enclosed some comments, section by section, and suggestions.

All best,
Romain Wenz

Département de l'Information Bibliographique et Numérique
Bibliothèque nationale de France
Quai François Mauriac
75706 Paris cedex 13
33 (0)1 53 79 37 39
----------------------------------------------------

3. Benefits

3.2. Benefits of the Linked Data approach
Comment: Libraries produce reliable data, especially vocabularies and 
authority data. If they open them as linked data, as soon as they use 
shared ontology, they can help structure the Web of Data with data that 
can be trusted, with vocabularies that anyone can link to.

Suggestion  The Web needs to be structured with reliable and clean data, 
and libraries can provide them. 

3.2.3. Benefits to Librarians, archivists and curators 
Comment: Among the very positive aspects of ?linked data? for libraries, 
there is the possibility to act at different levels, with various 
benefits.

Suggestion  Every approach can offer specific benefits, from internal 
re-use of data and identifiers to links or services to the end-user.

3.2.4. Benefits to Developers 
Comment: The general benefit is to get rid of specific library formats, 
which are not really interoperable (e.g. various MARCs). This is very 
important, so as to break barriers between libraries and between library 
data and other types of data. But the transition from library-specific 
data to LD won't be straightforward. 

Suggestion  It will be possible to work step by step, with Web protocols.
Suggestion  A section that could be added as ?3.2.5.?: 
?Benefits to service providers, software vendors and external developers: 
These developers will work with other important players: service 
providers, software vendors and external developers. 
The consequences are:
-       Research and development could be enhanced through these players. 
They could also work with research laboratories. 
-       Libraries will still work with external vendors.
-       A new market emerges for industrials, developers and service 
providers, which can increase their financial benefits. For instance, 
using interoperable RDF formats enable other actors to re-use structured 
data provided by libraries.?

5. Relevant technologies
Comment: We are talking about building structure in Web content, so that 
data from the Web can be used by machines, the way it would be in 
databases.

Suggestion  Building a « Linked data » infrastructure does not imply to 
create yet another silo.

5.5 Microformats, Microdata and RDFa
Comment: Linked Data can go one step further from the work that has been 
done, for instance for OAI sets.

Suggestion  RDFa can be a step for using existing information by 
distilling it into a Web structure.

6. Implementation challenges and barriers to adoption
The whole section is clumsy because it makes no difference between various 
situations. We can find more or less advanced projects: as the ?use case? 
section shows, libraries can be very innovative. 
http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport 

6.1 Designed for stability, the library ecosystem resists change
Comment: The library ecosystem has been changing since Zenodotus. Semantic 
Web techniques are different from traditional computer services, and 
budgets are not on a comparable basis. Furthermore, today libraries data 
are digital data and it?s not necessary to program retrospective 
conversion of printed catalogues. Data are already digital data, 
structured with digital formats.
The historical depth of the libraries and librarian data is a very 
important asset in the frame of the semantic web, for which the notion of 
trust is essential. Libraries improve the quality of their data by 
constant revisions. 

Suggestion  Even if designed for stability, the library ecosystem moved 
early to computer systems and keeps adapting to technological changes. 

6.1.2 Library Data is shareable among libraries, but not yet with the 
wider world
Comment: Librarians often work, for instance, with the archival community. 
For instance, XML DTD EAD (Encoded Archival Initiative) was jointly 
created by librarians and archivists in order to encode descriptions of 
archival collections. 

Suggestion  Through cooperation with Archives and Museums, libraries 
already share data and standards with a ?Wider world?. Moving to Linked 
Data is a natural continuation.

6.1.3 Libraries are understaffed in the technology area
This part is overstrong and rude to libraries who actually recruit and 
work in the technology area.
Suggestion  It is not just a matter of recruiting ?IT people?, but of t
raining librarians so that they are aware and efficient in Web 
technologies, and making sure Computing departments and librarians work 
together. This is what libraries do. 

6.2 Libraries do not adapt well to technological change
Comment: Libraries will need to manage the legacy of MARC format-based 
data for a long period of time even if they manage to shift to LD 
strategies and tools for their current practices. 
This means that before enjoying all the benefits of LD (listed in the 
scope document), libraries will need to maintain parallel systems, which 
means an increase of costs and efforts in software and format development 
and in data management.
In the short term library developers will still have to deals with these 
formats, which are renewed.

Suggestion  When convincing examples are shown, Libraries adapt very well 
to technological changes.

6.2.2 Library standardization process is cumbersome
Comment: But possible!
Libraries are used to transform their formats, to map them with other 
formats, to make them evolve when they work on new projects, new 
technologies, and new types of documents. 

Suggestion  It takes time, so that the formats fits to the need, but it is 
part of the libraries? culture.

6.2.4 Library standards are limited to the library data
Comment: Library data are not only bibliographic data. Libraries 
catalogues contains also authority records with many pieces of information 
about persons, families, corporate bodies, works, and subjects. Authority 
data provide nominated entities and may provide permanent identifiers for 
these entities (such as ARK identifiers in BnF catalogues). 

Suggestion  With reliable identifiers, Authority data are also key 
elements for the semantic web.

6.3 ROI is difficult to calculate
Comment: Benefits are as difficult as cost to estimate precisely, but some 
can and must be underlined. Mutualisation of the creation of data reduces 
redundancies, increases staff efficiency, and allows librarians to focus 
on other tasks like research on collections or conservation.
Linking the data of a library to cooperative metadata produced by reliable 
institutions adds value to its data. Opening library linked data may 
create economical value for a country, by allowing commercial reuses of 
that data (Open data). Opening library data increases the users traffic 
and the visibility of collections (through reuse, SEO, etc.), and thus the 
possibilities of their ROI.
Using richer, more flexible, more relevant data improves the accessibility 
and the services to users: in public institutions, public utility is a ROI 
by itself. Helping researchers is another one.

Suggestion  It is difficult to calculate ROI precisely, but it is easy to 
see financial benefits (re-use, links, cuts of redundant tasks).

6.3.3 Vocabulary changes in library data are costly
Comment: With an Authority File providing permanent identifiers and links, 
it is relatively easy to update any field linked with it. All changes in 
authority records can be automatically transferred into related 
bibliographic records.

Suggestion  Moving to linked data implies to rely on authority files and 
identifiers.

6.4.1- Some data cannot be published openly
Comment:  In some countries, there is a distinction between ?public 
information? and ?information that can be processed by machines?. In that 
case, information that is available for individuals needs to be justified 
and declared for massive use in computer programs. 

Suggestion  There can be national specificities. They have to be clearly 
stated by the publishers.

6.4.2- Rights ownership can be unmanageably complex
Comment:  Copied and extracted records are one thing. There is also a 
question about the ?linked data itself?. The need to quote also means, for 
the provider, being able to report about the use. In some countries 
(including France) the use of the tax-payer?s money has to be justified. 
You have to report for the money: the only way to do it is to have 
metrics. This implies knowing who is using the data, even for free. 

Suggestion  Thanks for feedback and quoting if you use our data! 

7. Recommendations

7.1.1 Identify sets of data as possible candidates for early exposure as 
LD
Comment: Structured data rely on the use of identifiers. Publishing early 
authority files and controlled vocabularies as linked data will make 
easier further publication of bibliographic records as linked data, by 
allowing links to them as a backbone for bibliographical information.
Suggestion  Authority files can be a basket for the "low hanging fruits" 
from other libraries.

7.1.2. For each set of data, determine ROI of current practices, and costs 
and ROI of exposing as LD 
Comment: Determining costs and ROI of exposing sets of data will help 
choosing witch value vocabularies and datasets could have priority. 
Therefore, determining ROI has to be done globally.
Suggestion  Not necessarily ?for each set of data?.

7.1.3. Consider migration strategies
Comment: Using Semantic Web technologies inside the library ?catalogue? 
seems very promising, because it will allow a very more flexible and 
interoperable use of data: modelling, linking, merging, querying, removing 
redundancies, integrating external data from various formats and 
publishing as various formats, etc.
This is obviously a great aim for libraries, but it is much more difficult 
than only publishing data as linked data. It must not be an obstacle: it 
may be better for a library to publish first some sets of data as liked 
data than trying from the beginning to migrate its entire catalogue.
Therefore, the migration of data does not need to cover all possible data. 
It can be only the useful part. This is obviously the case when commercial 
services use RDFa for SEO, with the subset of products which people will 
be looking for. In fact, when we are just putting data into RDF, it is not 
useful if there are no links. 

Suggestion  Libraries can ?pick and choose? what is relevant and migrate 
it. 
Suggestion  Using RDF inside the systems themselves is another question 
that has to be advocated.

7.2.2: Identify Linked Data literacy needed for different staff roles in 
the library 
Comment: In fact, when using the current datasets so as to use them in 
RDF, we see that cataloguing still has to address the creation of links. 
Mainly for reconciliation and alignments of concepts (for instance: ?do 
those two books tell the same story??). There, the data obviously still 
needs to be curated by humans.
But by re-using links and data produced by others, we can expect the 
cataloguing work to be:
-       more centralized;
-       more about creating links (less about writing dates, names or page 
numbers?).

Suggestion  These evolutions have to be clear on the business side.

7.4 Identify and Link.
7.4.1 ?Create URIs for the items in library datasets?
Comment: Providing identifier is the only way to make links. For big 
libraries permanent identifiers are already being used (e.g. ARK 
identifiers for all resources at the BnF). 

Suggestion  This is the basis.



Exposition  Enluminures en terre d?Islam entre abstraction et figuration  - jusqu'au 25 septembre 2011 - BnF - Richelieu / Galerie Mansart Avant d'imprimer, pensez à l'environnement. 
Received on Friday, 22 July 2011 13:21:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 July 2011 13:21:18 GMT