- From: Ed Summers <ehs@pobox.com>
- Date: Tue, 26 May 2009 11:19:58 -0400
- To: "public-lod@w3.org" <public-lod@w3.org>
There is a new pool of linked-data up at the Library of Congress in the Chronicling America application [1]. Chronicling America is the web view on data collected for the National Digital Newspaper Program (NDNP). NDNP is a 20-year joint project of the National Endowment for the Humanities and the Library of Congress to digitize and aggregate historic newspaper in the United States. Right now there are close to a million digitized newspaper pages available, and information about 140,000 newspaper titles...all of which have individual web views, for example: Newspaper Title: San Francisco Call [2] Issue: San Francisco Call, 1895-03-05 [3] Page: San Francisco Call, 1895-03-05, page sequence 1 [4] If you view source on them you should be able to auto-discover the application/rdf+xml representations that bundle up information about the newspaper titles, issues and pages. You can also browse around using a linked data viewer like uriburner [5]. The implementation is a moving target, but you'll see we've cherry picked a few vocabularies: Dublin Core [6], Bibliographic Ontology [7], FOAF [8], and Object Reuse and Exchange (OAI-ORE) [9]. ORE in particular was extremely useful to us, since we wanted to enable the application's repository function, by exposing the digital objects (image files, ocr/xml files, pdfs) that make up the individual Page resources. For example: <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-1#page> ore:aggregates <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-1.jp2>, <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-1.pdf>, <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-1/ocr.txt>, <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-1/ocr.xml>, <http://chroniclingamerica.loc.gov/lccn/sn84026749/1905-01-29/ed-1/seq-1/thumbnail.jpg> . The idea is to enable the harvesting of these repository objects out of the Chronicling American webapp. The only links out we have so far are from Newspaper Titles to the geographic regions that they are "about", and languages. So for example: <http://chroniclingamerica.loc.gov/lccn/sn85066387#title> dcterms:coverage <http://dbpedia.org/resource/San_Francisco%2C_California>, <http://sws.geonames.org/5391959/> ; dcterms:language <http://www.lingvoj.org/lang/en> . Just these minimal links provide a huge amount of data enrichment to our original data. We also needed to create a handful of new vocabulary terms, which we made available as RDFa [10]. I would be interested in any feedback you have. Also, please feel free to fire up linked-data bots to crawl the space. //Ed [1] http://chroniclingamerica.loc.gov [2] http://chroniclingamerica.loc.gov/lccn/sn85066387/ [3] http://chroniclingamerica.loc.gov/lccn/sn85066387/1895-03-05/ed-1/ [4] http://chroniclingamerica.loc.gov/lccn/sn85066387/1895-03-05/ed-1/seq-1/ [5] http://linkeddata.uriburner.com/about/html/http/chroniclingamerica.loc.gov/lccn/sn84026749%23title [6] http://dublincore.org/ [7] http://bibliontology.com/ [8] http://xmlns.com/foaf/spec/ [9] http://www.openarchives.org/ore/1.0/vocabulary.html [10] http://chroniclingamerica.loc.gov/terms/
Received on Tuesday, 26 May 2009 15:20:41 UTC