Re: Comments on the Final Report (10th of June) from National Library of Finland from Karen Coyle on 2011-06-13 (public-lld@w3.org from June 2011)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Mon, 13 Jun 2011 08:02:36 -0700
To: Viljanen Kim <kim.viljanen@aalto.fi>
Cc: "public-lld@w3.org" <public-lld@w3.org>, "juha.hakala@helsinki.fi" <juha.hakala@helsinki.fi>, "laila.heinemann@helsinki.fi" <laila.heinemann@helsinki.fi>
Message-ID: <20110613080236.19724yz6t0l20fws@kcoyle.net>
Thank you Kim, and special thanks to Juha! This is exactly the kind of  
review I was hoping we could get on the report -- a fresh view, and  
one from a library perspective. I hope we can get more like this.

kc

Quoting Viljanen Kim <kim.viljanen@aalto.fi>:

> Hello,
>
> I asked for comments from the National Library of Finland on the LLD  
> Final Report (Draft). Below are the comments I got. I think they hit  
> the spot extremely well and I think that the issues should  
> absolutely be addressed in the Final Report.
>
> Regards,
> Kim
>
> Kim Viljanen
> Semantic Computing Research Group SeCo, Dep. of Media Technology,  
> Aalto University
> email: kim.viljanen@aalto.fi
> snail: P.O. Box 15500, FI00076 Aalto, Finland
> visit: Room 2541, Otaniementie 17, Espoo, Finland
> mob: +358 40 5414654
> web: http://www.seco.tkk.fi/u/kimvilja/
>
> ---------------------------------------------------------------------------------------------
>
> Comments begin:
>
> 1. Juha Hakala, National Library of Finland:
>
> Library professionals have been involved with the writing process, but
> it seems that these people are not a representative sample of the
> community as a whole. The draft contains statements and views which the
> majority of library professionals may not agree with. If the aim is to
> foster cooperation with the library community and the linked open data
> community, then it would be a good idea to review the text one more and
> remove some of the more controversial (and sweeping) statements, such as
> "libraries are ill-adapted to continual technological change".
>
> In their present form the recommendations look like a mixed bag. It
> might have been better to write the text as a strategy, specifying the
> current situation, the target and concrete tasks that must be done to
> get the job done in the library sector. Another problem which makes the
> document difficult to grasp is that there are far too many challenges
> and recommendations; it is hard to see what is really essential and what
> is not (to see the forest from the trees). If you are not able to drop
> things, try at least to prioritize. And if the authors cannot do that,
> stop for a while and consider why this is the case.
>
> It is not reasonable to assume that libraries would migrate their data
> overnight from MARC to, say, RDFa. But it is definitely possible to
> specify a conversion from MARC21 to RDFa and other open data formats.
> This should not be costly, unlike changing the native format (and what
> would we benefit from that). There are already library systems which do
> not store the data in MARC format but in XML; this allows for easy RDFa
> implementation once the mapping of data elements has been specified.
>
> As a highly structured format MARC21 is suitable for conversions. The
> report makes the point that numerous libraries have experimented
> conversion of MARC records to open data. Please make it clear that since
> there are neither guidelines for this conversion nor tools, such
> experiments can be time consuming, and nobody has data that they think
> is "right". Co-operation between libraries (content expertise) and open
> data community (tools expertise) is vitally important to move on from
> this non-optimal situation.
>
> The draft blames the MARC format for many of the current problems. But
> the draft does not point out that without a common metadata format,
> there would be a lot of different local formats, and the first task
> would be to convert this data into the common format, and then to open
> linked data (or perhaps directly into open linked data). Museums have
> only recently developed an exchange format (LIDO), and the archives' EAD
> is not that old either. The draft does not mention these formats, not
> even in passing, and their usefulness to these communities from the open
> data point of view.
>
> On the other hand, the draft does not point out one obvious problem with
> MARC: it has been used for decades, and over time cataloguing practices
> have chaged. Moreover, MARC has been used in various ways in different
> parts of the world (national MARC variants) or in different library
> sectors. Thus one conversion to open data will not be sufficient; an
> adaptable tool that can be modified locally is needed. And merging the
> MARC metadata from different sources will still be difficult (just ask
> OCLC). However, the library community is still more co-ordinated than
> any other community out there. If the open linked data is hard for us,
> it will be even harder for the others.
>
> Since the document is being prepared within W3C, it does not consider
> non-W3C options for publishing open data and their impact on libraries.
> But on this side of the fence, we must consider for instance schema.org
> carefully. Since the whole thing is new, there is no shared view yet,
> but we just cannot ignore Google and Microsoft. Stu Weibel summarizes
> some aspects well in his blog:
>
> http://weibel-lines.typepad.com/weibelines/2011/06/uncommon-cause.html
>
> Makx Dekkers provides more food for thought and links here:
>
>> I’ve been reading various reactions (two interesting ones are David  
>> Wood at  
>> http://prototypo.blogspot.com/2011/06/schemaorg-and-semantic-web.html and  
>> Manu Sporny at http://manu.sporny.org/2011/false-choice/ in  
>> contrast to Mike Berman’s post at  
>> http://www.mkbergman.com/962/structured-web-gets-massive-boost/).
>>
>> Thinking further, it seems to me that what might happen is that:
>>
>> ·         Organizations (mostly commercial?) that are selling  
>> products and want to compete with others in getting their stuff  
>> ranked high and presented well in Google and Bing will follow the  
>> schema.org spec and forget about Semantic Web the way W3C defines  
>> it – after all, people whose first priority is to get good rankings  
>> in general searches are likely not the people who are the most  
>> interested in linking data in an open and interoperable way;
>>
>> ·         Organizations (mostly public sector, academia, research?)  
>> that provide services, co-operate with others and require  
>> organisation of quality information, sharing resources and  
>> interoperability will still be looking for the more general  
>> solutions offered by Semantic Web and Linked Data.
>>
>> What I am wondering about is whether GooBing will, at some stage in  
>> the future, consider to offer special arrangements for the public  
>> sector (e.g. harvest Linked Data) – I cannot imagine they are not  
>> interested in providing good access to public sector and scientific  
>> information. Or would the public sector start helping schema.org to  
>> improve their “Type hierarchy”  
>> (http://www.schema.org/docs/full.html)? Even then, schema.org does  
>> not seem to support cross-collection linking – it’s entirely  
>> focused on how search engines can harvest information.
>
>
> About identifiers and cool URIs
>
> Libraries have an old tradition of separating identifiers (such as ISBN)
> from location (shelf number, signum). It is possible to organize
> collections in such a way that a book's shelf location never changes
> (after all, it is people who move books around), but most often this is
> not practical. And if the same book is available in many locations in
> the same time, which one is the unique identifier? And if we have to
> change the location, how do we warn the user about the change, when he
> may get another book, not the one that used to be there?
>
>  From the identification / location point of view digital documents in
> the web are not different from the printed book: it is a good idea to
> separate identifiers and Uniform Resource Locators (and there is a
> reason why they were named locators; that is what they are). Confusing
> things by talking about URIs when only URLs are meant is not helpful.
> Perceived time scale is one key difference here: in the Preserve Linked
>   Data vocabularies -chapter there is a following sentence:
>
> "Linked Data will only remain usable twenty years from now if its URIs
> persist and remain resolvable to documentation of their meaning".
>
>  From the W3C point of view 20 years may be a very long time, but for a
> national library it is not; the whole sentence just tells libraries how
> difficult our world view is from the one reigning in W3C. For us, data
> has to remain usable for centuries, and during that time every document
> will be migrated over and over again. Managing this complexity is not
> possible with blunt tools such as cool URIs, which are assigned with no
> control whatsover. Already there are hundreds of versions of the W3C
> homepage; which one of them is identified by http://www.w3.org/? How
> should a user know that a cool URI (as long as the Internet Archive is
> alive) for for instance the December 1996 version of the home page is
> http://web.archive.org/web/19961227091242/http://www19.w3.org/? Due to
> web archives and other harvesting activities any resource has a number
> of URLs, of varying levels of coolness. Exactly how cool they are is
> never apparent from the URI itself, and for most documents there may not
> be a cool URI at all.
>
> These and other issues not discussed here due to lack of time and space
> explain why PIDs rule in some areas: the libraries prefer URNs,
> publishers DOIs and scientific data community Handles (to give just a
> few examples). The conclusion as regards the draft is that it is
> counterproductive to give an idea that cool URIs must be used with open
> data; PIDs can and usually are expressed as HTTP URIs within documents,
> so they match the requirements of open data. Each occurrence of "cool
> URI" in the text can be replaced by more neutral HTTP URI, and somewhere
> a point should be made that the traditional identifiers libraries use
> can be incorporated to open data.
>
> Using http://dbpedia.org/resource/Jane_Austen as an example is not a
> good idea. Libraries will not take identifiers for authors from Dbpedia,
> although this solution may seem sufficient for lay people. Libraries
> (and publishers, copyright societies etc.) will soon adopt a new system
> called International Standard Name Identifier (ISNI). In this system, it
> is possible to establish PID-based links to metadata about the author,
> which can include a link to dbpedia.org.
>
> There are other data elements which do not have standard identifiers
> yet. But libraries should be allowed to decide themselves how linking
> the data should be implemented. This is probably one of the most
> difficult aspects of the the open data publishing process, since few
> organisations have made anything sustainable in this area.
>
> Juha Hakala
>
> --
>   Juha Hakala
>   Senior advisor, standardisation and IT
>
>   The National Library of Finland
>   P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
>   Email juha.hakala@helsinki.fi, tel +358 50 382 7678
>
> -----------------------------------
>
> 2. Laila Heinemann, National Library of Finland writes:
>
>  From a library point of view the paper is already a bit outdated. It
> seems to assume that librarians haven't heard anything about Linked Data
> yet and that they by deafault are very suspicious of it. Of course, this
> can be true in some parts of the world, but cannot be applied generally.
> The tone is very patronizig, and as such not a basis for a constructive
> discussion.
>
> As to library systems architecture:
> - Library Management Systems are currently in a transition phase, where
> cloud computing and linked data are among the key issues. Both are
> included in the visions of the major vendors, but the development and
> implementation phase of the next generation systems worldwide is likely
> to be very long (and very costly).
> - however, library systems are not only about storing and searching for
> information, they also need to cover other functions, such as
> acquisitions, circulation, access control and patron information.
> Therefore linked data is only a part of a much more complex system
> architecture, and the expression 'niche system' seems especially ill
> suited.
>
> As to library cataloguing and data formats
> - the MARC format is in transition as well and is likely to be developed
> to better suit linking data (or to be killed altogether...), see
> http://www.loc.gov/marc/transition/
>
> The remark made on clause 6.5.3 about the lack of common terminology is
> a very good one! All too often the discussion is about apples and
> oranges, when the aim is actually shared.
>
> Laila Heinemann
> tietojärjestelmäasiantuntija / Information Systems Specialist
>
> Kansalliskirjasto / The National Library of Finland
> Kirjastoverkkopalvelut / Library Network Services
>
> PL/POB 26 (Teollisuuskatu 23), 00014 Helsingin yliopisto (Finland)
> tel + 358 9 191 44339
> e-mail laila.heinemann@helsinki.fi
>
> www.kansalliskirjasto.fi / www.nationallibrary.fi
>
>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Monday, 13 June 2011 15:03:08 UTC