Re: Comments on the draft report from Emmanuelle Bermes on 2011-08-02 (public-lld@w3.org from August 2011)

From: Emmanuelle Bermes <manue@figoblog.org>
Date: Tue, 2 Aug 2011 19:02:35 +0200
To: romain.wenz@bnf.fr
Cc: public-lld@w3.org
Message-ID: <CAODLZ4ig7++_f11NrbgQ2C8LOXoQbt9jx-uQP9HjkLMPmJ4f5g@mail.gmail.com>
Dear Romain,

Thank you for reviewing our draft report. Your comments are really useful.
I've just added the reference of your mail to our list of reviews [1],
so that we will be able to process your feedback when updating the
report.

Best regards,

Emma

[1] http://www.w3.org/2005/Incubator/lld/wiki/DraftReportReviewerAssignments

On Fri, Jul 22, 2011 at 3:20 PM,  <romain.wenz@bnf.fr> wrote:
>
> Hello,
> With colleagese, we have been reviewing the draft report at
> http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion
>
> Please find enclosed some comments, section by section, and suggestions.
>
> All best,
> Romain Wenz
>
> Département de l'Information Bibliographique et Numérique
> Bibliothèque nationale de France
> Quai François Mauriac
> 75706 Paris cedex 13
> 33 (0)1 53 79 37 39
> ----------------------------------------------------
>
> 3. Benefits
>
> 3.2. Benefits of the Linked Data approach
> Comment: Libraries produce reliable data, especially vocabularies and
> authority data. If they open them as linked data, as soon as they use shared
> ontology, they can help structure the Web of Data with data that can be
> trusted, with vocabularies that anyone can link to.
>
> Suggestion  The Web needs to be structured with reliable and clean data, and
> libraries can provide them.
>
> 3.2.3. Benefits to Librarians, archivists and curators
> Comment: Among the very positive aspects of “linked data” for libraries,
> there is the possibility to act at different levels, with various benefits.
>
> Suggestion  Every approach can offer specific benefits, from internal re-use
> of data and identifiers to links or services to the end-user.
>
> 3.2.4. Benefits to Developers
> Comment: The general benefit is to get rid of specific library formats,
> which are not really interoperable (e.g. various MARCs). This is very
> important, so as to break barriers between libraries and between library
> data and other types of data. But the transition from library-specific data
> to LD won't be straightforward.
>
> Suggestion  It will be possible to work step by step, with Web protocols.
> Suggestion  A section that could be added as “3.2.5.”:
> “Benefits to service providers, software vendors and external developers:
> These developers will work with other important players: service providers,
> software vendors and external developers.
> The consequences are:
> -        Research and development could be enhanced through these players.
> They could also work with research laboratories.
> -        Libraries will still work with external vendors.
> -        A new market emerges for industrials, developers and service
> providers, which can increase their financial benefits. For instance, using
> interoperable RDF formats enable other actors to re-use structured data
> provided by libraries.”
>
> 5. Relevant technologies
> Comment: We are talking about building structure in Web content, so that
> data from the Web can be used by machines, the way it would be in databases.
>
> Suggestion  Building a « Linked data » infrastructure does not imply to
> create yet another silo.
>
> 5.5 Microformats, Microdata and RDFa
> Comment: Linked Data can go one step further from the work that has been
> done, for instance for OAI sets.
>
> Suggestion  RDFa can be a step for using existing information by distilling
> it into a Web structure.
>
> 6. Implementation challenges and barriers to adoption
> The whole section is clumsy because it makes no difference between various
> situations. We can find more or less advanced projects: as the “use case”
> section shows, libraries can be very innovative.
> http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport
>
> 6.1 Designed for stability, the library ecosystem resists change
> Comment: The library ecosystem has been changing since Zenodotus. Semantic
> Web techniques are different from traditional computer services, and budgets
> are not on a comparable basis. Furthermore, today libraries data are digital
> data and it’s not necessary to program retrospective conversion of printed
> catalogues. Data are already digital data, structured with digital formats.
> The historical depth of the libraries and librarian data is a very important
> asset in the frame of the semantic web, for which the notion of trust is
> essential. Libraries improve the quality of their data by constant
> revisions.
>
> Suggestion  Even if designed for stability, the library ecosystem moved
> early to computer systems and keeps adapting to technological changes.
>
> 6.1.2 Library Data is shareable among libraries, but not yet with the wider
> world
> Comment: Librarians often work, for instance, with the archival community.
> For instance, XML DTD EAD (Encoded Archival Initiative) was jointly created
> by librarians and archivists in order to encode descriptions of archival
> collections.
>
> Suggestion  Through cooperation with Archives and Museums, libraries already
> share data and standards with a “Wider world”. Moving to Linked Data is a
> natural continuation.
>
> 6.1.3 Libraries are understaffed in the technology area
> This part is overstrong and rude to libraries who actually recruit and work
> in the technology area.
> Suggestion  It is not just a matter of recruiting “IT people”, but of
> training librarians so that they are aware and efficient in Web
> technologies, and making sure Computing departments and librarians work
> together. This is what libraries do.
>
> 6.2 Libraries do not adapt well to technological change
> Comment: Libraries will need to manage the legacy of MARC format-based data
> for a long period of time even if they manage to shift to LD strategies and
> tools for their current practices.
> This means that before enjoying all the benefits of LD (listed in the scope
> document), libraries will need to maintain parallel systems, which means an
> increase of costs and efforts in software and format development and in data
> management.
> In the short term library developers will still have to deals with these
> formats, which are renewed.
>
> Suggestion  When convincing examples are shown, Libraries adapt very well to
> technological changes.
>
> 6.2.2 Library standardization process is cumbersome
> Comment: But possible!
> Libraries are used to transform their formats, to map them with other
> formats, to make them evolve when they work on new projects, new
> technologies, and new types of documents.
>
> Suggestion  It takes time, so that the formats fits to the need, but it is
> part of the libraries’ culture.
>
> 6.2.4 Library standards are limited to the library data
> Comment: Library data are not only bibliographic data. Libraries catalogues
> contains also authority records with many pieces of information about
> persons, families, corporate bodies, works, and subjects. Authority data
> provide nominated entities and may provide permanent identifiers for these
> entities (such as ARK identifiers in BnF catalogues).
>
> Suggestion  With reliable identifiers, Authority data are also key elements
> for the semantic web.
>
> 6.3 ROI is difficult to calculate
> Comment: Benefits are as difficult as cost to estimate precisely, but some
> can and must be underlined. Mutualisation of the creation of data reduces
> redundancies, increases staff efficiency, and allows librarians to focus on
> other tasks like research on collections or conservation.
> Linking the data of a library to cooperative metadata produced by reliable
> institutions adds value to its data. Opening library linked data may create
> economical value for a country, by allowing commercial reuses of that data
> (Open data). Opening library data increases the users traffic and the
> visibility of collections (through reuse, SEO, etc.), and thus the
> possibilities of their ROI.
> Using richer, more flexible, more relevant data improves the accessibility
> and the services to users: in public institutions, public utility is a ROI
> by itself. Helping researchers is another one.
>
> Suggestion  It is difficult to calculate ROI precisely, but it is easy to
> see financial benefits (re-use, links, cuts of redundant tasks).
>
> 6.3.3 Vocabulary changes in library data are costly
> Comment: With an Authority File providing permanent identifiers and links,
> it is relatively easy to update any field linked with it. All changes in
> authority records can be automatically transferred into related
> bibliographic records.
>
> Suggestion  Moving to linked data implies to rely on authority files and
> identifiers.
>
> 6.4.1- Some data cannot be published openly
> Comment:  In some countries, there is a distinction between “public
> information” and “information that can be processed by machines”. In that
> case, information that is available for individuals needs to be justified
> and declared for massive use in computer programs.
>
> Suggestion  There can be national specificities. They have to be clearly
> stated by the publishers.
>
> 6.4.2- Rights ownership can be unmanageably complex
> Comment:  Copied and extracted records are one thing. There is also a
> question about the “linked data itself”. The need to quote also means, for
> the provider, being able to report about the use. In some countries
> (including France) the use of the tax-payer’s money has to be justified. You
> have to report for the money: the only way to do it is to have metrics. This
> implies knowing who is using the data, even for free.
>
> Suggestion  Thanks for feedback and quoting if you use our data!
>
> 7. Recommendations
>
> 7.1.1 Identify sets of data as possible candidates for early exposure as LD
> Comment: Structured data rely on the use of identifiers. Publishing early
> authority files and controlled vocabularies as linked data will make easier
> further publication of bibliographic records as linked data, by allowing
> links to them as a backbone for bibliographical information.
> Suggestion  Authority files can be a basket for the "low hanging fruits"
> from other libraries.
>
> 7.1.2. For each set of data, determine ROI of current practices, and costs
> and ROI of exposing as LD
> Comment: Determining costs and ROI of exposing sets of data will help
> choosing witch value vocabularies and datasets could have priority.
> Therefore, determining ROI has to be done globally.
> Suggestion  Not necessarily “for each set of data”.
>
> 7.1.3. Consider migration strategies
> Comment: Using Semantic Web technologies inside the library “catalogue”
> seems very promising, because it will allow a very more flexible and
> interoperable use of data: modelling, linking, merging, querying, removing
> redundancies, integrating external data from various formats and publishing
> as various formats, etc.
> This is obviously a great aim for libraries, but it is much more difficult
> than only publishing data as linked data. It must not be an obstacle: it may
> be better for a library to publish first some sets of data as liked data
> than trying from the beginning to migrate its entire catalogue.
> Therefore, the migration of data does not need to cover all possible data.
> It can be only the useful part. This is obviously the case when commercial
> services use RDFa for SEO, with the subset of products which people will be
> looking for. In fact, when we are just putting data into RDF, it is not
> useful if there are no links.
>
> Suggestion  Libraries can “pick and choose” what is relevant and migrate it.
> Suggestion  Using RDF inside the systems themselves is another question that
> has to be advocated.
>
> 7.2.2: Identify Linked Data literacy needed for different staff roles in the
> library
> Comment: In fact, when using the current datasets so as to use them in RDF,
> we see that cataloguing still has to address the creation of links. Mainly
> for reconciliation and alignments of concepts (for instance: “do those two
> books tell the same story?”). There, the data obviously still needs to be
> curated by humans.
> But by re-using links and data produced by others, we can expect the
> cataloguing work to be:
> -        more centralized;
> -        more about creating links (less about writing dates, names or page
> numbers…).
>
> Suggestion  These evolutions have to be clear on the business side.
>
> 7.4 Identify and Link.
> 7.4.1 “Create URIs for the items in library datasets”
> Comment: Providing identifier is the only way to make links. For big
> libraries permanent identifiers are already being used (e.g. ARK identifiers
> for all resources at the BnF).
>
> Suggestion  This is the basis.
>
> ________________________________
>
> Exposition Enluminures en terre d?Islam entre abstraction et figuration -
> jusqu'au 25 septembre 2011 - BnF - Richelieu / Galerie Mansart
>
> Avant d'imprimer, pensez à l'environnement.
Received on Tuesday, 2 August 2011 17:03:13 UTC