Comments on the Final Report (10th of June) from National Library of Finland from Viljanen Kim on 2011-06-13 (public-lld@w3.org from June 2011)

From: Viljanen Kim <kim.viljanen@aalto.fi>
Date: Mon, 13 Jun 2011 11:19:42 +0000
To: "public-lld@w3.org" <public-lld@w3.org>
CC: "juha.hakala@helsinki.fi" <juha.hakala@helsinki.fi>, "laila.heinemann@helsinki.fi" <laila.heinemann@helsinki.fi>
Message-ID: <4D3A02236A400A47AB3BDEEEF80F24E554EF329C@EXMDB05.org.aalto.fi>

Hello,

I asked for comments from the National Library of Finland on the LLD Final Report (Draft). Below are the comments I got. I think they hit the spot extremely well and I think that the issues should absolutely be addressed in the Final Report.

Regards,
Kim

Kim Viljanen
Semantic Computing Research Group SeCo, Dep. of Media Technology, Aalto University
email: kim.viljanen@aalto.fi
snail: P.O. Box 15500, FI00076 Aalto, Finland
visit: Room 2541, Otaniementie 17, Espoo, Finland
mob: +358 40 5414654
web: http://www.seco.tkk.fi/u/kimvilja/

---------------------------------------------------------------------------------------------

Comments begin:

1. Juha Hakala, National Library of Finland:

Library professionals have been involved with the writing process, but
it seems that these people are not a representative sample of the
community as a whole. The draft contains statements and views which the
majority of library professionals may not agree with. If the aim is to
foster cooperation with the library community and the linked open data
community, then it would be a good idea to review the text one more and
remove some of the more controversial (and sweeping) statements, such as
"libraries are ill-adapted to continual technological change".

In their present form the recommendations look like a mixed bag. It
might have been better to write the text as a strategy, specifying the
current situation, the target and concrete tasks that must be done to
get the job done in the library sector. Another problem which makes the
document difficult to grasp is that there are far too many challenges
and recommendations; it is hard to see what is really essential and what
is not (to see the forest from the trees). If you are not able to drop
things, try at least to prioritize. And if the authors cannot do that,
stop for a while and consider why this is the case.

It is not reasonable to assume that libraries would migrate their data
overnight from MARC to, say, RDFa. But it is definitely possible to
specify a conversion from MARC21 to RDFa and other open data formats.
This should not be costly, unlike changing the native format (and what
would we benefit from that). There are already library systems which do
not store the data in MARC format but in XML; this allows for easy RDFa
implementation once the mapping of data elements has been specified.

As a highly structured format MARC21 is suitable for conversions. The
report makes the point that numerous libraries have experimented
conversion of MARC records to open data. Please make it clear that since
there are neither guidelines for this conversion nor tools, such
experiments can be time consuming, and nobody has data that they think
is "right". Co-operation between libraries (content expertise) and open
data community (tools expertise) is vitally important to move on from
this non-optimal situation.

The draft blames the MARC format for many of the current problems. But
the draft does not point out that without a common metadata format,
there would be a lot of different local formats, and the first task
would be to convert this data into the common format, and then to open
linked data (or perhaps directly into open linked data). Museums have
only recently developed an exchange format (LIDO), and the archives' EAD
is not that old either. The draft does not mention these formats, not
even in passing, and their usefulness to these communities from the open
data point of view.

On the other hand, the draft does not point out one obvious problem with
MARC: it has been used for decades, and over time cataloguing practices
have chaged. Moreover, MARC has been used in various ways in different
parts of the world (national MARC variants) or in different library
sectors. Thus one conversion to open data will not be sufficient; an
adaptable tool that can be modified locally is needed. And merging the
MARC metadata from different sources will still be difficult (just ask
OCLC). However, the library community is still more co-ordinated than
any other community out there. If the open linked data is hard for us,
it will be even harder for the others.

Since the document is being prepared within W3C, it does not consider
non-W3C options for publishing open data and their impact on libraries.
But on this side of the fence, we must consider for instance schema.org
carefully. Since the whole thing is new, there is no shared view yet,
but we just cannot ignore Google and Microsoft. Stu Weibel summarizes
some aspects well in his blog:

http://weibel-lines.typepad.com/weibelines/2011/06/uncommon-cause.html

Makx Dekkers provides more food for thought and links here:

> I’ve been reading various reactions (two interesting ones are David Wood at http://prototypo.blogspot.com/2011/06/schemaorg-and-semantic-web.html and Manu Sporny at http://manu.sporny.org/2011/false-choice/ in contrast to Mike Berman’s post at http://www.mkbergman.com/962/structured-web-gets-massive-boost/).
>
> Thinking further, it seems to me that what might happen is that:
>
> · Organizations (mostly commercial?) that are selling products and want to compete with others in getting their stuff ranked high and presented well in Google and Bing will follow the schema.org spec and forget about Semantic Web the way W3C defines it – after all, people whose first priority is to get good rankings in general searches are likely not the people who are the most interested in linking data in an open and interoperable way;
>
> · Organizations (mostly public sector, academia, research?) that provide services, co-operate with others and require organisation of quality information, sharing resources and interoperability will still be looking for the more general solutions offered by Semantic Web and Linked Data.
>
> What I am wondering about is whether GooBing will, at some stage in the future, consider to offer special arrangements for the public sector (e.g. harvest Linked Data) – I cannot imagine they are not interested in providing good access to public sector and scientific information. Or would the public sector start helping schema.org to improve their “Type hierarchy” (http://www.schema.org/docs/full.html)? Even then, schema.org does not seem to support cross-collection linking – it’s entirely focused on how search engines can harvest information.

About identifiers and cool URIs

Libraries have an old tradition of separating identifiers (such as ISBN)
from location (shelf number, signum). It is possible to organize
collections in such a way that a book's shelf location never changes
(after all, it is people who move books around), but most often this is
not practical. And if the same book is available in many locations in
the same time, which one is the unique identifier? And if we have to
change the location, how do we warn the user about the change, when he
may get another book, not the one that used to be there?

From the identification / location point of view digital documents in
the web are not different from the printed book: it is a good idea to
separate identifiers and Uniform Resource Locators (and there is a
reason why they were named locators; that is what they are). Confusing
things by talking about URIs when only URLs are meant is not helpful.
Perceived time scale is one key difference here: in the Preserve Linked
Data vocabularies -chapter there is a following sentence:

"Linked Data will only remain usable twenty years from now if its URIs
persist and remain resolvable to documentation of their meaning".

From the W3C point of view 20 years may be a very long time, but for a
national library it is not; the whole sentence just tells libraries how
difficult our world view is from the one reigning in W3C. For us, data
has to remain usable for centuries, and during that time every document
will be migrated over and over again. Managing this complexity is not
possible with blunt tools such as cool URIs, which are assigned with no
control whatsover. Already there are hundreds of versions of the W3C
homepage; which one of them is identified by http://www.w3.org/? How
should a user know that a cool URI (as long as the Internet Archive is
alive) for for instance the December 1996 version of the home page is
http://web.archive.org/web/19961227091242/http://www19.w3.org/? Due to
web archives and other harvesting activities any resource has a number
of URLs, of varying levels of coolness. Exactly how cool they are is
never apparent from the URI itself, and for most documents there may not
be a cool URI at all.

These and other issues not discussed here due to lack of time and space
explain why PIDs rule in some areas: the libraries prefer URNs,
publishers DOIs and scientific data community Handles (to give just a
few examples). The conclusion as regards the draft is that it is
counterproductive to give an idea that cool URIs must be used with open
data; PIDs can and usually are expressed as HTTP URIs within documents,
so they match the requirements of open data. Each occurrence of "cool
URI" in the text can be replaced by more neutral HTTP URI, and somewhere
a point should be made that the traditional identifiers libraries use
can be incorporated to open data.

Using http://dbpedia.org/resource/Jane_Austen as an example is not a
good idea. Libraries will not take identifiers for authors from Dbpedia,
although this solution may seem sufficient for lay people. Libraries
(and publishers, copyright societies etc.) will soon adopt a new system
called International Standard Name Identifier (ISNI). In this system, it
is possible to establish PID-based links to metadata about the author,
which can include a link to dbpedia.org.

There are other data elements which do not have standard identifiers
yet. But libraries should be allowed to decide themselves how linking
the data should be implemented. This is probably one of the most
difficult aspects of the the open data publishing process, since few
organisations have made anything sustainable in this area.

Juha Hakala

--
Juha Hakala
Senior advisor, standardisation and IT

The National Library of Finland
P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
Email juha.hakala@helsinki.fi, tel +358 50 382 7678

-----------------------------------

2. Laila Heinemann, National Library of Finland writes:

From a library point of view the paper is already a bit outdated. It
seems to assume that librarians haven't heard anything about Linked Data
yet and that they by deafault are very suspicious of it. Of course, this
can be true in some parts of the world, but cannot be applied generally.
The tone is very patronizig, and as such not a basis for a constructive
discussion.

As to library systems architecture:
- Library Management Systems are currently in a transition phase, where
cloud computing and linked data are among the key issues. Both are
included in the visions of the major vendors, but the development and
implementation phase of the next generation systems worldwide is likely
to be very long (and very costly).
- however, library systems are not only about storing and searching for
information, they also need to cover other functions, such as
acquisitions, circulation, access control and patron information.
Therefore linked data is only a part of a much more complex system
architecture, and the expression 'niche system' seems especially ill
suited.

As to library cataloguing and data formats
- the MARC format is in transition as well and is likely to be developed
to better suit linking data (or to be killed altogether...), see
http://www.loc.gov/marc/transition/

The remark made on clause 6.5.3 about the lack of common terminology is
a very good one! All too often the discussion is about apples and
oranges, when the aim is actually shared.

Laila Heinemann
tietojärjestelmäasiantuntija / Information Systems Specialist

Kansalliskirjasto / The National Library of Finland
Kirjastoverkkopalvelut / Library Network Services

PL/POB 26 (Teollisuuskatu 23), 00014 Helsingin yliopisto (Finland)
tel + 358 9 191 44339
e-mail laila.heinemann@helsinki.fi

www.kansalliskirjasto.fi / www.nationallibrary.fi

Received on Monday, 13 June 2011 11:20:12 UTC