- From: Viljanen Kim <kim.viljanen@aalto.fi>
- Date: Mon, 13 Jun 2011 11:19:42 +0000
- To: "public-lld@w3.org" <public-lld@w3.org>
- CC: "juha.hakala@helsinki.fi" <juha.hakala@helsinki.fi>, "laila.heinemann@helsinki.fi" <laila.heinemann@helsinki.fi>
Hello, I asked for comments from the National Library of Finland on the LLD Final Report (Draft). Below are the comments I got. I think they hit the spot extremely well and I think that the issues should absolutely be addressed in the Final Report. Regards, Kim Kim Viljanen Semantic Computing Research Group SeCo, Dep. of Media Technology, Aalto University email: kim.viljanen@aalto.fi snail: P.O. Box 15500, FI00076 Aalto, Finland visit: Room 2541, Otaniementie 17, Espoo, Finland mob: +358 40 5414654 web: http://www.seco.tkk.fi/u/kimvilja/ --------------------------------------------------------------------------------------------- Comments begin: 1. Juha Hakala, National Library of Finland: Library professionals have been involved with the writing process, but it seems that these people are not a representative sample of the community as a whole. The draft contains statements and views which the majority of library professionals may not agree with. If the aim is to foster cooperation with the library community and the linked open data community, then it would be a good idea to review the text one more and remove some of the more controversial (and sweeping) statements, such as "libraries are ill-adapted to continual technological change". In their present form the recommendations look like a mixed bag. It might have been better to write the text as a strategy, specifying the current situation, the target and concrete tasks that must be done to get the job done in the library sector. Another problem which makes the document difficult to grasp is that there are far too many challenges and recommendations; it is hard to see what is really essential and what is not (to see the forest from the trees). If you are not able to drop things, try at least to prioritize. And if the authors cannot do that, stop for a while and consider why this is the case. It is not reasonable to assume that libraries would migrate their data overnight from MARC to, say, RDFa. But it is definitely possible to specify a conversion from MARC21 to RDFa and other open data formats. This should not be costly, unlike changing the native format (and what would we benefit from that). There are already library systems which do not store the data in MARC format but in XML; this allows for easy RDFa implementation once the mapping of data elements has been specified. As a highly structured format MARC21 is suitable for conversions. The report makes the point that numerous libraries have experimented conversion of MARC records to open data. Please make it clear that since there are neither guidelines for this conversion nor tools, such experiments can be time consuming, and nobody has data that they think is "right". Co-operation between libraries (content expertise) and open data community (tools expertise) is vitally important to move on from this non-optimal situation. The draft blames the MARC format for many of the current problems. But the draft does not point out that without a common metadata format, there would be a lot of different local formats, and the first task would be to convert this data into the common format, and then to open linked data (or perhaps directly into open linked data). Museums have only recently developed an exchange format (LIDO), and the archives' EAD is not that old either. The draft does not mention these formats, not even in passing, and their usefulness to these communities from the open data point of view. On the other hand, the draft does not point out one obvious problem with MARC: it has been used for decades, and over time cataloguing practices have chaged. Moreover, MARC has been used in various ways in different parts of the world (national MARC variants) or in different library sectors. Thus one conversion to open data will not be sufficient; an adaptable tool that can be modified locally is needed. And merging the MARC metadata from different sources will still be difficult (just ask OCLC). However, the library community is still more co-ordinated than any other community out there. If the open linked data is hard for us, it will be even harder for the others. Since the document is being prepared within W3C, it does not consider non-W3C options for publishing open data and their impact on libraries. But on this side of the fence, we must consider for instance schema.org carefully. Since the whole thing is new, there is no shared view yet, but we just cannot ignore Google and Microsoft. Stu Weibel summarizes some aspects well in his blog: http://weibel-lines.typepad.com/weibelines/2011/06/uncommon-cause.html Makx Dekkers provides more food for thought and links here: > I’ve been reading various reactions (two interesting ones are David Wood at http://prototypo.blogspot.com/2011/06/schemaorg-and-semantic-web.html and Manu Sporny at http://manu.sporny.org/2011/false-choice/ in contrast to Mike Berman’s post at http://www.mkbergman.com/962/structured-web-gets-massive-boost/). > > Thinking further, it seems to me that what might happen is that: > > · Organizations (mostly commercial?) that are selling products and want to compete with others in getting their stuff ranked high and presented well in Google and Bing will follow the schema.org spec and forget about Semantic Web the way W3C defines it – after all, people whose first priority is to get good rankings in general searches are likely not the people who are the most interested in linking data in an open and interoperable way; > > · Organizations (mostly public sector, academia, research?) that provide services, co-operate with others and require organisation of quality information, sharing resources and interoperability will still be looking for the more general solutions offered by Semantic Web and Linked Data. > > What I am wondering about is whether GooBing will, at some stage in the future, consider to offer special arrangements for the public sector (e.g. harvest Linked Data) – I cannot imagine they are not interested in providing good access to public sector and scientific information. Or would the public sector start helping schema.org to improve their “Type hierarchy” (http://www.schema.org/docs/full.html)? Even then, schema.org does not seem to support cross-collection linking – it’s entirely focused on how search engines can harvest information. About identifiers and cool URIs Libraries have an old tradition of separating identifiers (such as ISBN) from location (shelf number, signum). It is possible to organize collections in such a way that a book's shelf location never changes (after all, it is people who move books around), but most often this is not practical. And if the same book is available in many locations in the same time, which one is the unique identifier? And if we have to change the location, how do we warn the user about the change, when he may get another book, not the one that used to be there? From the identification / location point of view digital documents in the web are not different from the printed book: it is a good idea to separate identifiers and Uniform Resource Locators (and there is a reason why they were named locators; that is what they are). Confusing things by talking about URIs when only URLs are meant is not helpful. Perceived time scale is one key difference here: in the Preserve Linked Data vocabularies -chapter there is a following sentence: "Linked Data will only remain usable twenty years from now if its URIs persist and remain resolvable to documentation of their meaning". From the W3C point of view 20 years may be a very long time, but for a national library it is not; the whole sentence just tells libraries how difficult our world view is from the one reigning in W3C. For us, data has to remain usable for centuries, and during that time every document will be migrated over and over again. Managing this complexity is not possible with blunt tools such as cool URIs, which are assigned with no control whatsover. Already there are hundreds of versions of the W3C homepage; which one of them is identified by http://www.w3.org/? How should a user know that a cool URI (as long as the Internet Archive is alive) for for instance the December 1996 version of the home page is http://web.archive.org/web/19961227091242/http://www19.w3.org/? Due to web archives and other harvesting activities any resource has a number of URLs, of varying levels of coolness. Exactly how cool they are is never apparent from the URI itself, and for most documents there may not be a cool URI at all. These and other issues not discussed here due to lack of time and space explain why PIDs rule in some areas: the libraries prefer URNs, publishers DOIs and scientific data community Handles (to give just a few examples). The conclusion as regards the draft is that it is counterproductive to give an idea that cool URIs must be used with open data; PIDs can and usually are expressed as HTTP URIs within documents, so they match the requirements of open data. Each occurrence of "cool URI" in the text can be replaced by more neutral HTTP URI, and somewhere a point should be made that the traditional identifiers libraries use can be incorporated to open data. Using http://dbpedia.org/resource/Jane_Austen as an example is not a good idea. Libraries will not take identifiers for authors from Dbpedia, although this solution may seem sufficient for lay people. Libraries (and publishers, copyright societies etc.) will soon adopt a new system called International Standard Name Identifier (ISNI). In this system, it is possible to establish PID-based links to metadata about the author, which can include a link to dbpedia.org. There are other data elements which do not have standard identifiers yet. But libraries should be allowed to decide themselves how linking the data should be implemented. This is probably one of the most difficult aspects of the the open data publishing process, since few organisations have made anything sustainable in this area. Juha Hakala -- Juha Hakala Senior advisor, standardisation and IT The National Library of Finland P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University Email juha.hakala@helsinki.fi, tel +358 50 382 7678 ----------------------------------- 2. Laila Heinemann, National Library of Finland writes: From a library point of view the paper is already a bit outdated. It seems to assume that librarians haven't heard anything about Linked Data yet and that they by deafault are very suspicious of it. Of course, this can be true in some parts of the world, but cannot be applied generally. The tone is very patronizig, and as such not a basis for a constructive discussion. As to library systems architecture: - Library Management Systems are currently in a transition phase, where cloud computing and linked data are among the key issues. Both are included in the visions of the major vendors, but the development and implementation phase of the next generation systems worldwide is likely to be very long (and very costly). - however, library systems are not only about storing and searching for information, they also need to cover other functions, such as acquisitions, circulation, access control and patron information. Therefore linked data is only a part of a much more complex system architecture, and the expression 'niche system' seems especially ill suited. As to library cataloguing and data formats - the MARC format is in transition as well and is likely to be developed to better suit linking data (or to be killed altogether...), see http://www.loc.gov/marc/transition/ The remark made on clause 6.5.3 about the lack of common terminology is a very good one! All too often the discussion is about apples and oranges, when the aim is actually shared. Laila Heinemann tietojärjestelmäasiantuntija / Information Systems Specialist Kansalliskirjasto / The National Library of Finland Kirjastoverkkopalvelut / Library Network Services PL/POB 26 (Teollisuuskatu 23), 00014 Helsingin yliopisto (Finland) tel + 358 9 191 44339 e-mail laila.heinemann@helsinki.fi www.kansalliskirjasto.fi / www.nationallibrary.fi
Received on Monday, 13 June 2011 11:20:12 UTC