- From: Karen Coyle <kcoyle@kcoyle.net>
- Date: Mon, 13 Jun 2011 08:02:36 -0700
- To: Viljanen Kim <kim.viljanen@aalto.fi>
- Cc: "public-lld@w3.org" <public-lld@w3.org>, "juha.hakala@helsinki.fi" <juha.hakala@helsinki.fi>, "laila.heinemann@helsinki.fi" <laila.heinemann@helsinki.fi>
Thank you Kim, and special thanks to Juha! This is exactly the kind of review I was hoping we could get on the report -- a fresh view, and one from a library perspective. I hope we can get more like this. kc Quoting Viljanen Kim <kim.viljanen@aalto.fi>: > Hello, > > I asked for comments from the National Library of Finland on the LLD > Final Report (Draft). Below are the comments I got. I think they hit > the spot extremely well and I think that the issues should > absolutely be addressed in the Final Report. > > Regards, > Kim > > Kim Viljanen > Semantic Computing Research Group SeCo, Dep. of Media Technology, > Aalto University > email: kim.viljanen@aalto.fi > snail: P.O. Box 15500, FI00076 Aalto, Finland > visit: Room 2541, Otaniementie 17, Espoo, Finland > mob: +358 40 5414654 > web: http://www.seco.tkk.fi/u/kimvilja/ > > --------------------------------------------------------------------------------------------- > > Comments begin: > > 1. Juha Hakala, National Library of Finland: > > Library professionals have been involved with the writing process, but > it seems that these people are not a representative sample of the > community as a whole. The draft contains statements and views which the > majority of library professionals may not agree with. If the aim is to > foster cooperation with the library community and the linked open data > community, then it would be a good idea to review the text one more and > remove some of the more controversial (and sweeping) statements, such as > "libraries are ill-adapted to continual technological change". > > In their present form the recommendations look like a mixed bag. It > might have been better to write the text as a strategy, specifying the > current situation, the target and concrete tasks that must be done to > get the job done in the library sector. Another problem which makes the > document difficult to grasp is that there are far too many challenges > and recommendations; it is hard to see what is really essential and what > is not (to see the forest from the trees). If you are not able to drop > things, try at least to prioritize. And if the authors cannot do that, > stop for a while and consider why this is the case. > > It is not reasonable to assume that libraries would migrate their data > overnight from MARC to, say, RDFa. But it is definitely possible to > specify a conversion from MARC21 to RDFa and other open data formats. > This should not be costly, unlike changing the native format (and what > would we benefit from that). There are already library systems which do > not store the data in MARC format but in XML; this allows for easy RDFa > implementation once the mapping of data elements has been specified. > > As a highly structured format MARC21 is suitable for conversions. The > report makes the point that numerous libraries have experimented > conversion of MARC records to open data. Please make it clear that since > there are neither guidelines for this conversion nor tools, such > experiments can be time consuming, and nobody has data that they think > is "right". Co-operation between libraries (content expertise) and open > data community (tools expertise) is vitally important to move on from > this non-optimal situation. > > The draft blames the MARC format for many of the current problems. But > the draft does not point out that without a common metadata format, > there would be a lot of different local formats, and the first task > would be to convert this data into the common format, and then to open > linked data (or perhaps directly into open linked data). Museums have > only recently developed an exchange format (LIDO), and the archives' EAD > is not that old either. The draft does not mention these formats, not > even in passing, and their usefulness to these communities from the open > data point of view. > > On the other hand, the draft does not point out one obvious problem with > MARC: it has been used for decades, and over time cataloguing practices > have chaged. Moreover, MARC has been used in various ways in different > parts of the world (national MARC variants) or in different library > sectors. Thus one conversion to open data will not be sufficient; an > adaptable tool that can be modified locally is needed. And merging the > MARC metadata from different sources will still be difficult (just ask > OCLC). However, the library community is still more co-ordinated than > any other community out there. If the open linked data is hard for us, > it will be even harder for the others. > > Since the document is being prepared within W3C, it does not consider > non-W3C options for publishing open data and their impact on libraries. > But on this side of the fence, we must consider for instance schema.org > carefully. Since the whole thing is new, there is no shared view yet, > but we just cannot ignore Google and Microsoft. Stu Weibel summarizes > some aspects well in his blog: > > http://weibel-lines.typepad.com/weibelines/2011/06/uncommon-cause.html > > Makx Dekkers provides more food for thought and links here: > >> I’ve been reading various reactions (two interesting ones are David >> Wood at >> http://prototypo.blogspot.com/2011/06/schemaorg-and-semantic-web.html and >> Manu Sporny at http://manu.sporny.org/2011/false-choice/ in >> contrast to Mike Berman’s post at >> http://www.mkbergman.com/962/structured-web-gets-massive-boost/). >> >> Thinking further, it seems to me that what might happen is that: >> >> · Organizations (mostly commercial?) that are selling >> products and want to compete with others in getting their stuff >> ranked high and presented well in Google and Bing will follow the >> schema.org spec and forget about Semantic Web the way W3C defines >> it – after all, people whose first priority is to get good rankings >> in general searches are likely not the people who are the most >> interested in linking data in an open and interoperable way; >> >> · Organizations (mostly public sector, academia, research?) >> that provide services, co-operate with others and require >> organisation of quality information, sharing resources and >> interoperability will still be looking for the more general >> solutions offered by Semantic Web and Linked Data. >> >> What I am wondering about is whether GooBing will, at some stage in >> the future, consider to offer special arrangements for the public >> sector (e.g. harvest Linked Data) – I cannot imagine they are not >> interested in providing good access to public sector and scientific >> information. Or would the public sector start helping schema.org to >> improve their “Type hierarchy” >> (http://www.schema.org/docs/full.html)? Even then, schema.org does >> not seem to support cross-collection linking – it’s entirely >> focused on how search engines can harvest information. > > > About identifiers and cool URIs > > Libraries have an old tradition of separating identifiers (such as ISBN) > from location (shelf number, signum). It is possible to organize > collections in such a way that a book's shelf location never changes > (after all, it is people who move books around), but most often this is > not practical. And if the same book is available in many locations in > the same time, which one is the unique identifier? And if we have to > change the location, how do we warn the user about the change, when he > may get another book, not the one that used to be there? > > From the identification / location point of view digital documents in > the web are not different from the printed book: it is a good idea to > separate identifiers and Uniform Resource Locators (and there is a > reason why they were named locators; that is what they are). Confusing > things by talking about URIs when only URLs are meant is not helpful. > Perceived time scale is one key difference here: in the Preserve Linked > Data vocabularies -chapter there is a following sentence: > > "Linked Data will only remain usable twenty years from now if its URIs > persist and remain resolvable to documentation of their meaning". > > From the W3C point of view 20 years may be a very long time, but for a > national library it is not; the whole sentence just tells libraries how > difficult our world view is from the one reigning in W3C. For us, data > has to remain usable for centuries, and during that time every document > will be migrated over and over again. Managing this complexity is not > possible with blunt tools such as cool URIs, which are assigned with no > control whatsover. Already there are hundreds of versions of the W3C > homepage; which one of them is identified by http://www.w3.org/? How > should a user know that a cool URI (as long as the Internet Archive is > alive) for for instance the December 1996 version of the home page is > http://web.archive.org/web/19961227091242/http://www19.w3.org/? Due to > web archives and other harvesting activities any resource has a number > of URLs, of varying levels of coolness. Exactly how cool they are is > never apparent from the URI itself, and for most documents there may not > be a cool URI at all. > > These and other issues not discussed here due to lack of time and space > explain why PIDs rule in some areas: the libraries prefer URNs, > publishers DOIs and scientific data community Handles (to give just a > few examples). The conclusion as regards the draft is that it is > counterproductive to give an idea that cool URIs must be used with open > data; PIDs can and usually are expressed as HTTP URIs within documents, > so they match the requirements of open data. Each occurrence of "cool > URI" in the text can be replaced by more neutral HTTP URI, and somewhere > a point should be made that the traditional identifiers libraries use > can be incorporated to open data. > > Using http://dbpedia.org/resource/Jane_Austen as an example is not a > good idea. Libraries will not take identifiers for authors from Dbpedia, > although this solution may seem sufficient for lay people. Libraries > (and publishers, copyright societies etc.) will soon adopt a new system > called International Standard Name Identifier (ISNI). In this system, it > is possible to establish PID-based links to metadata about the author, > which can include a link to dbpedia.org. > > There are other data elements which do not have standard identifiers > yet. But libraries should be allowed to decide themselves how linking > the data should be implemented. This is probably one of the most > difficult aspects of the the open data publishing process, since few > organisations have made anything sustainable in this area. > > Juha Hakala > > -- > Juha Hakala > Senior advisor, standardisation and IT > > The National Library of Finland > P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University > Email juha.hakala@helsinki.fi, tel +358 50 382 7678 > > ----------------------------------- > > 2. Laila Heinemann, National Library of Finland writes: > > From a library point of view the paper is already a bit outdated. It > seems to assume that librarians haven't heard anything about Linked Data > yet and that they by deafault are very suspicious of it. Of course, this > can be true in some parts of the world, but cannot be applied generally. > The tone is very patronizig, and as such not a basis for a constructive > discussion. > > As to library systems architecture: > - Library Management Systems are currently in a transition phase, where > cloud computing and linked data are among the key issues. Both are > included in the visions of the major vendors, but the development and > implementation phase of the next generation systems worldwide is likely > to be very long (and very costly). > - however, library systems are not only about storing and searching for > information, they also need to cover other functions, such as > acquisitions, circulation, access control and patron information. > Therefore linked data is only a part of a much more complex system > architecture, and the expression 'niche system' seems especially ill > suited. > > As to library cataloguing and data formats > - the MARC format is in transition as well and is likely to be developed > to better suit linking data (or to be killed altogether...), see > http://www.loc.gov/marc/transition/ > > The remark made on clause 6.5.3 about the lack of common terminology is > a very good one! All too often the discussion is about apples and > oranges, when the aim is actually shared. > > Laila Heinemann > tietojärjestelmäasiantuntija / Information Systems Specialist > > Kansalliskirjasto / The National Library of Finland > Kirjastoverkkopalvelut / Library Network Services > > PL/POB 26 (Teollisuuskatu 23), 00014 Helsingin yliopisto (Finland) > tel + 358 9 191 44339 > e-mail laila.heinemann@helsinki.fi > > www.kansalliskirjasto.fi / www.nationallibrary.fi > > > -- Karen Coyle kcoyle@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Received on Monday, 13 June 2011 15:03:08 UTC