AW: Chronicling America and Linked Data from Chris Bizer on 2009-05-26 (public-lod@w3.org from May 2009)

From: Chris Bizer <chris@bizer.de>
Date: Tue, 26 May 2009 18:39:34 +0200
To: "'Ed Summers'" <ehs@pobox.com>, <public-lod@w3.org>
Message-ID: <012101c9de20$8d0d0410$a7270c30$@de>
Hi Ed,

sounds like a great new source of live Linked Data which is directly served
by the organization producing the data and not by university projects or
engaged individuals as it is still the case with many data sources in the
cloud.

Things are moving :-)

and I'm looking forward to the first applications that mashes up Chronicling
America data with DBpedia and Geonames.

Also a important signal for the libraries and digital archives community,
that you found OAI-ORE to be extremely useful.

Keep up the great work!

Cheers,

Chris


> -----Ursprüngliche Nachricht-----
> Von: public-lod-request@w3.org [mailto:public-lod-request@w3.org] Im
> Auftrag von Ed Summers
> Gesendet: Dienstag, 26. Mai 2009 17:20
> An: public-lod@w3.org
> Betreff: Chronicling America and Linked Data
> 
> There is a new pool of linked-data up at the Library of Congress in
> the Chronicling America application [1]. Chronicling America is the
> web view on data collected for the National Digital Newspaper Program
> (NDNP). NDNP is a 20-year joint project of the National Endowment for
> the Humanities and the Library of Congress to digitize and aggregate
> historic newspaper in the United States.
> 
> Right now there are close to a million digitized newspaper pages
> available, and information about 140,000 newspaper titles...all of
> which have individual web views, for example:
> 
>  Newspaper Title: San Francisco Call [2]
>  Issue: San Francisco Call, 1895-03-05 [3]
>  Page: San Francisco Call, 1895-03-05, page sequence 1 [4]
> 
> If you view source on them you should be able to auto-discover the
> application/rdf+xml representations that bundle up information about
> the newspaper titles, issues and pages. You can also browse around
> using a linked data viewer like uriburner [5].
> 
> The implementation is a moving target, but you'll see we've cherry
> picked a few vocabularies: Dublin Core [6], Bibliographic Ontology
> [7], FOAF [8], and Object Reuse and Exchange (OAI-ORE) [9]. ORE in
> particular was extremely useful to us, since we wanted to enable the
> application's repository function, by exposing the digital objects
> (image files, ocr/xml files, pdfs) that make up the individual Page
> resources. For example:
> 
> <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-
> 1#page>
>     ore:aggregates
> <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-
> 1.jp2>,
> <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-
> 1.pdf>,
> <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-
> 1/ocr.txt>,
> <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-
> 1/ocr.xml>,
> <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-
> 1/thumbnail.jpg>
> .
> 
> The idea is to enable the harvesting of these repository objects out
> of the Chronicling American webapp. The only links out we have so far
> are from Newspaper Titles to the geographic regions that they are
> "about", and languages. So for example:
> 
> <http://chroniclingamerica.loc.gov/lccn/sn85066387#title>
>     dcterms:coverage
> <http://dbpedia.org/resource/San_Francisco%2C_California>,
> <http://sws.geonames.org/5391959/> ;
>     dcterms:language <http://www.lingvoj.org/lang/en> .
> 
> Just these minimal links provide a huge amount of data enrichment to
> our original data. We also needed to create a handful of new
> vocabulary terms, which we made available as RDFa [10]. I would be
> interested in any feedback you have. Also, please feel free to fire up
> linked-data bots to crawl the space.
> 
> //Ed
> 
> [1] http://chroniclingamerica.loc.gov
> [2] http://chroniclingamerica.loc.gov/lccn/sn85066387/
> [3] http://chroniclingamerica.loc.gov/lccn/sn85066387/1895-03-05/ed-1/
> [4] http://chroniclingamerica.loc.gov/lccn/sn85066387/1895-03-05/ed-
> 1/seq-1/
> [5]
> http://linkeddata.uriburner.com/about/html/http/chroniclingamerica.loc.
> gov/lccn/sn84026749%23title
> [6] http://dublincore.org/
> [7] http://bibliontology.com/
> [8] http://xmlns.com/foaf/spec/
> [9] http://www.openarchives.org/ore/1.0/vocabulary.html
> [10] http://chroniclingamerica.loc.gov/terms/
Received on Tuesday, 26 May 2009 16:39:21 UTC