Re: Comments on the draft report

Dear Romain,

Thank you for reviewing our draft report. Your comments are really useful.
I've just added the reference of your mail to our list of reviews [1],
so that we will be able to process your feedback when updating the
report.

Best regards,

Emma

[1] http://www.w3.org/2005/Incubator/lld/wiki/DraftReportReviewerAssignments

On Fri, Jul 22, 2011 at 3:20 PM,  <romain.wenz@bnf.fr> wrote:
>
> Hello,
> With colleagese, we have been reviewing the draft report at
> http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion
>
> Please find enclosed some comments, section by section, and suggestions.
>
> All best,
> Romain Wenz
>
> Département de l'Information Bibliographique et Numérique
> Bibliothèque nationale de France
> Quai François Mauriac
> 75706 Paris cedex 13
> 33 (0)1 53 79 37 39
> ----------------------------------------------------
>
> 3. Benefits
>
> 3.2. Benefits of the Linked Data approach
> Comment: Libraries produce reliable data, especially vocabularies and
> authority data. If they open them as linked data, as soon as they use shared
> ontology, they can help structure the Web of Data with data that can be
> trusted, with vocabularies that anyone can link to.
>
> Suggestion  The Web needs to be structured with reliable and clean data, and
> libraries can provide them.
>
> 3.2.3. Benefits to Librarians, archivists and curators
> Comment: Among the very positive aspects of “linked data” for libraries,
> there is the possibility to act at different levels, with various benefits.
>
> Suggestion  Every approach can offer specific benefits, from internal re-use
> of data and identifiers to links or services to the end-user.
>
> 3.2.4. Benefits to Developers
> Comment: The general benefit is to get rid of specific library formats,
> which are not really interoperable (e.g. various MARCs). This is very
> important, so as to break barriers between libraries and between library
> data and other types of data. But the transition from library-specific data
> to LD won't be straightforward.
>
> Suggestion  It will be possible to work step by step, with Web protocols.
> Suggestion  A section that could be added as “3.2.5.”:
> “Benefits to service providers, software vendors and external developers:
> These developers will work with other important players: service providers,
> software vendors and external developers.
> The consequences are:
> -        Research and development could be enhanced through these players.
> They could also work with research laboratories.
> -        Libraries will still work with external vendors.
> -        A new market emerges for industrials, developers and service
> providers, which can increase their financial benefits. For instance, using
> interoperable RDF formats enable other actors to re-use structured data
> provided by libraries.”
>
> 5. Relevant technologies
> Comment: We are talking about building structure in Web content, so that
> data from the Web can be used by machines, the way it would be in databases.
>
> Suggestion  Building a « Linked data » infrastructure does not imply to
> create yet another silo.
>
> 5.5 Microformats, Microdata and RDFa
> Comment: Linked Data can go one step further from the work that has been
> done, for instance for OAI sets.
>
> Suggestion  RDFa can be a step for using existing information by distilling
> it into a Web structure.
>
> 6. Implementation challenges and barriers to adoption
> The whole section is clumsy because it makes no difference between various
> situations. We can find more or less advanced projects: as the “use case”
> section shows, libraries can be very innovative.
> http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport
>
> 6.1 Designed for stability, the library ecosystem resists change
> Comment: The library ecosystem has been changing since Zenodotus. Semantic
> Web techniques are different from traditional computer services, and budgets
> are not on a comparable basis. Furthermore, today libraries data are digital
> data and it’s not necessary to program retrospective conversion of printed
> catalogues. Data are already digital data, structured with digital formats.
> The historical depth of the libraries and librarian data is a very important
> asset in the frame of the semantic web, for which the notion of trust is
> essential. Libraries improve the quality of their data by constant
> revisions.
>
> Suggestion  Even if designed for stability, the library ecosystem moved
> early to computer systems and keeps adapting to technological changes.
>
> 6.1.2 Library Data is shareable among libraries, but not yet with the wider
> world
> Comment: Librarians often work, for instance, with the archival community.
> For instance, XML DTD EAD (Encoded Archival Initiative) was jointly created
> by librarians and archivists in order to encode descriptions of archival
> collections.
>
> Suggestion  Through cooperation with Archives and Museums, libraries already
> share data and standards with a “Wider world”. Moving to Linked Data is a
> natural continuation.
>
> 6.1.3 Libraries are understaffed in the technology area
> This part is overstrong and rude to libraries who actually recruit and work
> in the technology area.
> Suggestion  It is not just a matter of recruiting “IT people”, but of
> training librarians so that they are aware and efficient in Web
> technologies, and making sure Computing departments and librarians work
> together. This is what libraries do.
>
> 6.2 Libraries do not adapt well to technological change
> Comment: Libraries will need to manage the legacy of MARC format-based data
> for a long period of time even if they manage to shift to LD strategies and
> tools for their current practices.
> This means that before enjoying all the benefits of LD (listed in the scope
> document), libraries will need to maintain parallel systems, which means an
> increase of costs and efforts in software and format development and in data
> management.
> In the short term library developers will still have to deals with these
> formats, which are renewed.
>
> Suggestion  When convincing examples are shown, Libraries adapt very well to
> technological changes.
>
> 6.2.2 Library standardization process is cumbersome
> Comment: But possible!
> Libraries are used to transform their formats, to map them with other
> formats, to make them evolve when they work on new projects, new
> technologies, and new types of documents.
>
> Suggestion  It takes time, so that the formats fits to the need, but it is
> part of the libraries’ culture.
>
> 6.2.4 Library standards are limited to the library data
> Comment: Library data are not only bibliographic data. Libraries catalogues
> contains also authority records with many pieces of information about
> persons, families, corporate bodies, works, and subjects. Authority data
> provide nominated entities and may provide permanent identifiers for these
> entities (such as ARK identifiers in BnF catalogues).
>
> Suggestion  With reliable identifiers, Authority data are also key elements
> for the semantic web.
>
> 6.3 ROI is difficult to calculate
> Comment: Benefits are as difficult as cost to estimate precisely, but some
> can and must be underlined. Mutualisation of the creation of data reduces
> redundancies, increases staff efficiency, and allows librarians to focus on
> other tasks like research on collections or conservation.
> Linking the data of a library to cooperative metadata produced by reliable
> institutions adds value to its data. Opening library linked data may create
> economical value for a country, by allowing commercial reuses of that data
> (Open data). Opening library data increases the users traffic and the
> visibility of collections (through reuse, SEO, etc.), and thus the
> possibilities of their ROI.
> Using richer, more flexible, more relevant data improves the accessibility
> and the services to users: in public institutions, public utility is a ROI
> by itself. Helping researchers is another one.
>
> Suggestion  It is difficult to calculate ROI precisely, but it is easy to
> see financial benefits (re-use, links, cuts of redundant tasks).
>
> 6.3.3 Vocabulary changes in library data are costly
> Comment: With an Authority File providing permanent identifiers and links,
> it is relatively easy to update any field linked with it. All changes in
> authority records can be automatically transferred into related
> bibliographic records.
>
> Suggestion  Moving to linked data implies to rely on authority files and
> identifiers.
>
> 6.4.1- Some data cannot be published openly
> Comment:  In some countries, there is a distinction between “public
> information” and “information that can be processed by machines”. In that
> case, information that is available for individuals needs to be justified
> and declared for massive use in computer programs.
>
> Suggestion  There can be national specificities. They have to be clearly
> stated by the publishers.
>
> 6.4.2- Rights ownership can be unmanageably complex
> Comment:  Copied and extracted records are one thing. There is also a
> question about the “linked data itself”. The need to quote also means, for
> the provider, being able to report about the use. In some countries
> (including France) the use of the tax-payer’s money has to be justified. You
> have to report for the money: the only way to do it is to have metrics. This
> implies knowing who is using the data, even for free.
>
> Suggestion  Thanks for feedback and quoting if you use our data!
>
> 7. Recommendations
>
> 7.1.1 Identify sets of data as possible candidates for early exposure as LD
> Comment: Structured data rely on the use of identifiers. Publishing early
> authority files and controlled vocabularies as linked data will make easier
> further publication of bibliographic records as linked data, by allowing
> links to them as a backbone for bibliographical information.
> Suggestion  Authority files can be a basket for the "low hanging fruits"
> from other libraries.
>
> 7.1.2. For each set of data, determine ROI of current practices, and costs
> and ROI of exposing as LD
> Comment: Determining costs and ROI of exposing sets of data will help
> choosing witch value vocabularies and datasets could have priority.
> Therefore, determining ROI has to be done globally.
> Suggestion  Not necessarily “for each set of data”.
>
> 7.1.3. Consider migration strategies
> Comment: Using Semantic Web technologies inside the library “catalogue”
> seems very promising, because it will allow a very more flexible and
> interoperable use of data: modelling, linking, merging, querying, removing
> redundancies, integrating external data from various formats and publishing
> as various formats, etc.
> This is obviously a great aim for libraries, but it is much more difficult
> than only publishing data as linked data. It must not be an obstacle: it may
> be better for a library to publish first some sets of data as liked data
> than trying from the beginning to migrate its entire catalogue.
> Therefore, the migration of data does not need to cover all possible data.
> It can be only the useful part. This is obviously the case when commercial
> services use RDFa for SEO, with the subset of products which people will be
> looking for. In fact, when we are just putting data into RDF, it is not
> useful if there are no links.
>
> Suggestion  Libraries can “pick and choose” what is relevant and migrate it.
> Suggestion  Using RDF inside the systems themselves is another question that
> has to be advocated.
>
> 7.2.2: Identify Linked Data literacy needed for different staff roles in the
> library
> Comment: In fact, when using the current datasets so as to use them in RDF,
> we see that cataloguing still has to address the creation of links. Mainly
> for reconciliation and alignments of concepts (for instance: “do those two
> books tell the same story?”). There, the data obviously still needs to be
> curated by humans.
> But by re-using links and data produced by others, we can expect the
> cataloguing work to be:
> -        more centralized;
> -        more about creating links (less about writing dates, names or page
> numbers…).
>
> Suggestion  These evolutions have to be clear on the business side.
>
> 7.4 Identify and Link.
> 7.4.1 “Create URIs for the items in library datasets”
> Comment: Providing identifier is the only way to make links. For big
> libraries permanent identifiers are already being used (e.g. ARK identifiers
> for all resources at the BnF).
>
> Suggestion  This is the basis.
>
> ________________________________
>
> Exposition Enluminures en terre d?Islam entre abstraction et figuration -
> jusqu'au 25 septembre 2011 - BnF - Richelieu / Galerie Mansart
>
> Avant d'imprimer, pensez à l'environnement.

Received on Tuesday, 2 August 2011 17:03:13 UTC