- From: <romain.wenz@bnf.fr>
- Date: Fri, 22 Jul 2011 15:20:56 +0200
- To: public-lld@w3.org
- Message-ID: <OF0D1445D3.CB5A7972-ONC12578D5.00481F84-C12578D5.00495439@LocalDomain>
Hello, With colleagese, we have been reviewing the draft report at http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion Please find enclosed some comments, section by section, and suggestions. All best, Romain Wenz Département de l'Information Bibliographique et Numérique Bibliothèque nationale de France Quai François Mauriac 75706 Paris cedex 13 33 (0)1 53 79 37 39 ---------------------------------------------------- 3. Benefits 3.2. Benefits of the Linked Data approach Comment: Libraries produce reliable data, especially vocabularies and authority data. If they open them as linked data, as soon as they use shared ontology, they can help structure the Web of Data with data that can be trusted, with vocabularies that anyone can link to. Suggestion The Web needs to be structured with reliable and clean data, and libraries can provide them. 3.2.3. Benefits to Librarians, archivists and curators Comment: Among the very positive aspects of ?linked data? for libraries, there is the possibility to act at different levels, with various benefits. Suggestion Every approach can offer specific benefits, from internal re-use of data and identifiers to links or services to the end-user. 3.2.4. Benefits to Developers Comment: The general benefit is to get rid of specific library formats, which are not really interoperable (e.g. various MARCs). This is very important, so as to break barriers between libraries and between library data and other types of data. But the transition from library-specific data to LD won't be straightforward. Suggestion It will be possible to work step by step, with Web protocols. Suggestion A section that could be added as ?3.2.5.?: ?Benefits to service providers, software vendors and external developers: These developers will work with other important players: service providers, software vendors and external developers. The consequences are: - Research and development could be enhanced through these players. They could also work with research laboratories. - Libraries will still work with external vendors. - A new market emerges for industrials, developers and service providers, which can increase their financial benefits. For instance, using interoperable RDF formats enable other actors to re-use structured data provided by libraries.? 5. Relevant technologies Comment: We are talking about building structure in Web content, so that data from the Web can be used by machines, the way it would be in databases. Suggestion Building a « Linked data » infrastructure does not imply to create yet another silo. 5.5 Microformats, Microdata and RDFa Comment: Linked Data can go one step further from the work that has been done, for instance for OAI sets. Suggestion RDFa can be a step for using existing information by distilling it into a Web structure. 6. Implementation challenges and barriers to adoption The whole section is clumsy because it makes no difference between various situations. We can find more or less advanced projects: as the ?use case? section shows, libraries can be very innovative. http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport 6.1 Designed for stability, the library ecosystem resists change Comment: The library ecosystem has been changing since Zenodotus. Semantic Web techniques are different from traditional computer services, and budgets are not on a comparable basis. Furthermore, today libraries data are digital data and it?s not necessary to program retrospective conversion of printed catalogues. Data are already digital data, structured with digital formats. The historical depth of the libraries and librarian data is a very important asset in the frame of the semantic web, for which the notion of trust is essential. Libraries improve the quality of their data by constant revisions. Suggestion Even if designed for stability, the library ecosystem moved early to computer systems and keeps adapting to technological changes. 6.1.2 Library Data is shareable among libraries, but not yet with the wider world Comment: Librarians often work, for instance, with the archival community. For instance, XML DTD EAD (Encoded Archival Initiative) was jointly created by librarians and archivists in order to encode descriptions of archival collections. Suggestion Through cooperation with Archives and Museums, libraries already share data and standards with a ?Wider world?. Moving to Linked Data is a natural continuation. 6.1.3 Libraries are understaffed in the technology area This part is overstrong and rude to libraries who actually recruit and work in the technology area. Suggestion It is not just a matter of recruiting ?IT people?, but of t raining librarians so that they are aware and efficient in Web technologies, and making sure Computing departments and librarians work together. This is what libraries do. 6.2 Libraries do not adapt well to technological change Comment: Libraries will need to manage the legacy of MARC format-based data for a long period of time even if they manage to shift to LD strategies and tools for their current practices. This means that before enjoying all the benefits of LD (listed in the scope document), libraries will need to maintain parallel systems, which means an increase of costs and efforts in software and format development and in data management. In the short term library developers will still have to deals with these formats, which are renewed. Suggestion When convincing examples are shown, Libraries adapt very well to technological changes. 6.2.2 Library standardization process is cumbersome Comment: But possible! Libraries are used to transform their formats, to map them with other formats, to make them evolve when they work on new projects, new technologies, and new types of documents. Suggestion It takes time, so that the formats fits to the need, but it is part of the libraries? culture. 6.2.4 Library standards are limited to the library data Comment: Library data are not only bibliographic data. Libraries catalogues contains also authority records with many pieces of information about persons, families, corporate bodies, works, and subjects. Authority data provide nominated entities and may provide permanent identifiers for these entities (such as ARK identifiers in BnF catalogues). Suggestion With reliable identifiers, Authority data are also key elements for the semantic web. 6.3 ROI is difficult to calculate Comment: Benefits are as difficult as cost to estimate precisely, but some can and must be underlined. Mutualisation of the creation of data reduces redundancies, increases staff efficiency, and allows librarians to focus on other tasks like research on collections or conservation. Linking the data of a library to cooperative metadata produced by reliable institutions adds value to its data. Opening library linked data may create economical value for a country, by allowing commercial reuses of that data (Open data). Opening library data increases the users traffic and the visibility of collections (through reuse, SEO, etc.), and thus the possibilities of their ROI. Using richer, more flexible, more relevant data improves the accessibility and the services to users: in public institutions, public utility is a ROI by itself. Helping researchers is another one. Suggestion It is difficult to calculate ROI precisely, but it is easy to see financial benefits (re-use, links, cuts of redundant tasks). 6.3.3 Vocabulary changes in library data are costly Comment: With an Authority File providing permanent identifiers and links, it is relatively easy to update any field linked with it. All changes in authority records can be automatically transferred into related bibliographic records. Suggestion Moving to linked data implies to rely on authority files and identifiers. 6.4.1- Some data cannot be published openly Comment: In some countries, there is a distinction between ?public information? and ?information that can be processed by machines?. In that case, information that is available for individuals needs to be justified and declared for massive use in computer programs. Suggestion There can be national specificities. They have to be clearly stated by the publishers. 6.4.2- Rights ownership can be unmanageably complex Comment: Copied and extracted records are one thing. There is also a question about the ?linked data itself?. The need to quote also means, for the provider, being able to report about the use. In some countries (including France) the use of the tax-payer?s money has to be justified. You have to report for the money: the only way to do it is to have metrics. This implies knowing who is using the data, even for free. Suggestion Thanks for feedback and quoting if you use our data! 7. Recommendations 7.1.1 Identify sets of data as possible candidates for early exposure as LD Comment: Structured data rely on the use of identifiers. Publishing early authority files and controlled vocabularies as linked data will make easier further publication of bibliographic records as linked data, by allowing links to them as a backbone for bibliographical information. Suggestion Authority files can be a basket for the "low hanging fruits" from other libraries. 7.1.2. For each set of data, determine ROI of current practices, and costs and ROI of exposing as LD Comment: Determining costs and ROI of exposing sets of data will help choosing witch value vocabularies and datasets could have priority. Therefore, determining ROI has to be done globally. Suggestion Not necessarily ?for each set of data?. 7.1.3. Consider migration strategies Comment: Using Semantic Web technologies inside the library ?catalogue? seems very promising, because it will allow a very more flexible and interoperable use of data: modelling, linking, merging, querying, removing redundancies, integrating external data from various formats and publishing as various formats, etc. This is obviously a great aim for libraries, but it is much more difficult than only publishing data as linked data. It must not be an obstacle: it may be better for a library to publish first some sets of data as liked data than trying from the beginning to migrate its entire catalogue. Therefore, the migration of data does not need to cover all possible data. It can be only the useful part. This is obviously the case when commercial services use RDFa for SEO, with the subset of products which people will be looking for. In fact, when we are just putting data into RDF, it is not useful if there are no links. Suggestion Libraries can ?pick and choose? what is relevant and migrate it. Suggestion Using RDF inside the systems themselves is another question that has to be advocated. 7.2.2: Identify Linked Data literacy needed for different staff roles in the library Comment: In fact, when using the current datasets so as to use them in RDF, we see that cataloguing still has to address the creation of links. Mainly for reconciliation and alignments of concepts (for instance: ?do those two books tell the same story??). There, the data obviously still needs to be curated by humans. But by re-using links and data produced by others, we can expect the cataloguing work to be: - more centralized; - more about creating links (less about writing dates, names or page numbers?). Suggestion These evolutions have to be clear on the business side. 7.4 Identify and Link. 7.4.1 ?Create URIs for the items in library datasets? Comment: Providing identifier is the only way to make links. For big libraries permanent identifiers are already being used (e.g. ARK identifiers for all resources at the BnF). Suggestion This is the basis. Exposition Enluminures en terre d?Islam entre abstraction et figuration - jusqu'au 25 septembre 2011 - BnF - Richelieu / Galerie Mansart Avant d'imprimer, pensez à l'environnement.
Received on Friday, 22 July 2011 13:21:17 UTC