W3C home > Mailing lists > Public > public-lod@w3.org > December 2009

Re: [pedantic-web] ANN: 20th Century Press Archives as ORE / Linked Data application - Technical Preview

From: Ed Summers <ehs@pobox.com>
Date: Mon, 28 Dec 2009 13:29:28 -0500
Message-ID: <f032cc060912281029t4c84776bu589eac7266f0440d@mail.gmail.com>
To: pedantic-web@googlegroups.com
Cc: public-lod@w3.org, oai-ore@googlegroups.com
On Mon, Dec 28, 2009 at 8:07 AM, Neubert Joachim <J.Neubert@zbw.eu> wrote:
> Please feel invited to take a look at it - we would highly appreciate any
> feedback about our approach.

Thanks for announcing this Joachim. It is great to see more linked
data as rdfa getting out on the web. I'm particularly excited because
of your use of the oai-ore vocabulary to make historic newspaper
archives available, since we are doing something similar at the
Library of Congress [1].

You must've done something right because I just wrote a little naive
crawler [2] in a matter of minutes to pull down what looks like all
the rdfa you've put out there so far. It seem to have collected about
11,427 triples [3]. My rdfsum unix command line hack [4] came up with
these rdf:type counts:

   1533 <http://www.openarchives.org/ore/terms/AggregatedResource>
    526 <http://www.openarchives.org/ore/terms/ResourceMap>
    526 <http://www.openarchives.org/ore/terms/Aggregation>
    336 <http://zbw.eu/namespaces/skos-extensions/PmPage>
    185 <http://purl.org/ontology/bibo/Article>
      2 <http://zbw.eu/namespaces/skos-extensions/PmPersonFolder>
      2 <http://zbw.eu/namespaces/skos-extensions/PmCollection>

Does that sound about right for this initial release?

I noticed that you have chosen to link to names in the German National
Authority file like:

  <http://zbw.eu/beta/pm20/person/00012> dct:subject
<http://d-nb.info/gnd/118646419> .

I seem to remember hearing at SWIB09 [5] that the Deutsche National
Bibliothek was thinking about minting URIs for entries in the
authority file that follow Linked Data best practices (hash or 303,
etc). Were you planning on modifying these appropriately when those
URLs became available? Right now the d-nb URL returns 200 OK, and it
isn't a hash URI. Theoretically it would be pretty easy to layer in
some rdfa into the page at d-nb that describes:

  http://d-nb.info/gnd/118646419#person

But I realize this is somewhat out of your control. I guess it would
also be possible to create a partial PURL [6] for
http://d-nb.info/gnd/ that would redirect, since I think the new PURL
software supports 303.

I was also interested to see that you have published some SKOS
Extensions [7] that are used to type each ore:Aggregation as a
specialization of skos:Concept:

<http://zbw.eu/beta/pm20/person/00012>
    a ore:Aggregation,
<http://zbw.eu/namespaces/skos-extensions/PmPersonFolder> ;
    skos:prefLabel "Abbe, Ernst; 1840-1905 (PM20 Personenarchiv)"@de,
"Abbe, Ernst; 1840-1905 (PM20 Persons Archives)"@en .

It looks like the rdf that comes back for your skos extensions
vocabulary (nice hack with the rdf validator btw) doesn't define
PmPersonFolder--but perhaps I missed it? I'm guessing from the
skos:prefLabel assertion that the PmPersonFolder is a specialization
of skos:Concept?

Would it be OK for me to experiment with pulling down the aggregated
resource bitstreams (jpg, etc) and storing them on disk? It would just
be a single threaded little script. Part of the rationale behind the
ore use at LC [1] is to foster LOCKSS [8] scenarios where digital
objects are easier to meaningfully harvest.

Anyhow, I have rattled on enough for now I suppose -- I mainly wanted
to say how exciting it was to see your announcement, being from the
digital library tribe in the linked data community :-)

//Ed

[1] http://chroniclingamerica.loc.gov
[2] http://inkdroid.org/bzr/ptolemy/crawl.py
[3] http://inkdroid.org/data/pm20.txt
[4] http://inkdroid.org/bzr/bin/rdfsum
[5] http://www.swib09.de/
[6] http://purl.org
[7] http://zbw.eu/namespaces/skos-extensions/
[8] http://en.wikipedia.org/wiki/LOCKSS
Received on Monday, 28 December 2009 18:30:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 28 December 2009 18:30:08 GMT