W3C home > Mailing lists > Public > public-lod@w3.org > December 2009

AW: [pedantic-web] ANN: 20th Century Press Archives as ORE / Linked Data application - Technical Preview

From: Neubert Joachim <J.Neubert@zbw.eu>
Date: Tue, 29 Dec 2009 23:27:14 +0100
Message-ID: <3A59BB6451C972429019B12996F92DAD02B18B07@frodo.zbw-nett.zbw-kiel.de>
To: <oai-ore@googlegroups.com>, <pedantic-web@googlegroups.com>
Cc: <public-lod@w3.org>
Hi Ed,
 
It's great to see you exploring our data with your own tools (and thanks for releasing them - I will happily add them to my toolset!). 
 
Your type counts seem right (336+185+2+2+1=526) - everything is an ore:Aggregation (defined by a ore:ResourceMap) and, except the top Aggregation, has another type attached, in order to be able to handle them differently for purposes like web page generation.
 
You spotted correctly that the homegrown types are not yet defined and need much more work. The skos:prefLabel is not intended to mean that PmPersonFolder is a specialisation of skos:Concept - it was used to indicate that this is an *unique* label (which could be used e.g. within a list of search results) - sorry for causing this misunderstanding. For our STW project (http://zbw.eu/stw) I started with a few extensions to SKOS and named the vocab file "skos-extensions" accordingly. Now I'm trapped, and hesitate to really add dcterms- or bibo-extensions to this file. However, I also don't want to add "dcterms-extensions", "bibo-extensions", etc. etc. files. And neither all these rather trivial and highly custom extensions do constitute a cohesive vocabulary on it's own. 
 
Irrespective of this self-introduced hiccup I was searching for some RDF types, and didn't find yet something which fits well for
 
- Folders: The physical archives consists of folders (which contain sheets of paper with affixed articles). Since 
folders are also in broad use in file archives, historical records etc., I think about suggesting the bibo guys to introduce a bibo:Folder subclass of bibo:Collection. PmPersonFolder, PmCompanyFolder etc. could be smoothly derived from such a superclass. 
 
- Pages: I'm sure that somebody has coined a RDF type for pages, but wasn't able to figure out where. I assume you had a similar problem at chroniclingamerica.gov (even though a whole newspaper page is not exactly the same as (a part of) an article glued onto a sheet of paper). How did you solve it?
 
The dct:subject links to DNB authority file are quite preliminary, and I will gladly substitute them when DNB publishes it's LOD version of the authority files. 
 
Of cause it's OK for you to experiment also with the JPEGs (which have been on the web for years already). Without them, it's only half of the fun ;) We have not figured out, however, under which license conditions this data could be re-used. German law ("Urheberrecht") requires an OK by the original author or her legal heirs until 70 years after her death. This is almost impossible for most of the newspaper articles (which often were published without any denomination of an author at all). So I'm not sure which kind of license could be granted to a third party, and lack of legal security may even prohibit the replication of the JPEGs in a LOCKSS scenario. If however the data could be usefull for demonstrating what can be achieved with ORE harvesting, I would be really happy.
 
Thanks again for all your comments, encouragement, and - most exciting - using the data. That's the idea of the tribe and the whole linked data community!
 
Cheers, Joachim


________________________________

Von: oai-ore@googlegroups.com im Auftrag von Ed Summers
Gesendet: Mo 28.12.2009 19:29
An: pedantic-web@googlegroups.com
Cc: public-lod@w3.org; oai-ore@googlegroups.com
Betreff: Re: [pedantic-web] ANN: 20th Century Press Archives as ORE / Linked Data application - Technical Preview



On Mon, Dec 28, 2009 at 8:07 AM, Neubert Joachim <J.Neubert@zbw.eu> wrote:
> Please feel invited to take a look at it - we would highly appreciate any
> feedback about our approach.

Thanks for announcing this Joachim. It is great to see more linked
data as rdfa getting out on the web. I'm particularly excited because
of your use of the oai-ore vocabulary to make historic newspaper
archives available, since we are doing something similar at the
Library of Congress [1].

You must've done something right because I just wrote a little naive
crawler [2] in a matter of minutes to pull down what looks like all
the rdfa you've put out there so far. It seem to have collected about
11,427 triples [3]. My rdfsum unix command line hack [4] came up with
these rdf:type counts:

   1533 <http://www.openarchives.org/ore/terms/AggregatedResource>
    526 <http://www.openarchives.org/ore/terms/ResourceMap>
    526 <http://www.openarchives.org/ore/terms/Aggregation>
    336 <http://zbw.eu/namespaces/skos-extensions/PmPage>
    185 <http://purl.org/ontology/bibo/Article>
      2 <http://zbw.eu/namespaces/skos-extensions/PmPersonFolder>
      2 <http://zbw.eu/namespaces/skos-extensions/PmCollection>

Does that sound about right for this initial release?

I noticed that you have chosen to link to names in the German National
Authority file like:

  <http://zbw.eu/beta/pm20/person/00012> dct:subject
<http://d-nb.info/gnd/118646419> .

I seem to remember hearing at SWIB09 [5] that the Deutsche National
Bibliothek was thinking about minting URIs for entries in the
authority file that follow Linked Data best practices (hash or 303,
etc). Were you planning on modifying these appropriately when those
URLs became available? Right now the d-nb URL returns 200 OK, and it
isn't a hash URI. Theoretically it would be pretty easy to layer in
some rdfa into the page at d-nb that describes:

  http://d-nb.info/gnd/118646419#person

But I realize this is somewhat out of your control. I guess it would
also be possible to create a partial PURL [6] for
http://d-nb.info/gnd/ that would redirect, since I think the new PURL
software supports 303.

I was also interested to see that you have published some SKOS
Extensions [7] that are used to type each ore:Aggregation as a
specialization of skos:Concept:

<http://zbw.eu/beta/pm20/person/00012>
    a ore:Aggregation,
<http://zbw.eu/namespaces/skos-extensions/PmPersonFolder> ;
    skos:prefLabel "Abbe, Ernst; 1840-1905 (PM20 Personenarchiv)"@de,
"Abbe, Ernst; 1840-1905 (PM20 Persons Archives)"@en .

It looks like the rdf that comes back for your skos extensions
vocabulary (nice hack with the rdf validator btw) doesn't define
PmPersonFolder--but perhaps I missed it? I'm guessing from the
skos:prefLabel assertion that the PmPersonFolder is a specialization
of skos:Concept?

Would it be OK for me to experiment with pulling down the aggregated
resource bitstreams (jpg, etc) and storing them on disk? It would just
be a single threaded little script. Part of the rationale behind the
ore use at LC [1] is to foster LOCKSS [8] scenarios where digital
objects are easier to meaningfully harvest.

Anyhow, I have rattled on enough for now I suppose -- I mainly wanted
to say how exciting it was to see your announcement, being from the
digital library tribe in the linked data community :-)

//Ed

[1] http://chroniclingamerica.loc.gov
[2] http://inkdroid.org/bzr/ptolemy/crawl.py
[3] http://inkdroid.org/data/pm20.txt
[4] http://inkdroid.org/bzr/bin/rdfsum
[5] http://www.swib09.de/
[6] http://purl.org
[7] http://zbw.eu/namespaces/skos-extensions/
[8] http://en.wikipedia.org/wiki/LOCKSS

--

You received this message because you are subscribed to the Google Groups "OAI-ORE" group.
To post to this group, send email to oai-ore@googlegroups.com.
To unsubscribe from this group, send email to oai-ore+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/oai-ore?hl=en.
Received on Tuesday, 29 December 2009 22:27:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 29 December 2009 22:27:57 GMT