- From: Ed Summers <ehs@pobox.com>
- Date: Sun, 19 Sep 2010 06:12:55 -0400
- To: Antoine Isaac <aisaac@few.vu.nl>
- Cc: public-lld <public-lld@w3.org>
On Sat, Sep 18, 2010 at 10:42 AM, Antoine Isaac <aisaac@few.vu.nl> wrote: > So the question is whether the current situation results rather from: > - a conscious choice of ignoring part of the legacy data you had in the > original data sources, in the light of the requirements of your scenarios? > - a too great effort needed to move legacy data to linked data, considering > the resources you had? > - the lack of legacy data--you just converted all what you had? That's a really good question Antoine. I think it's often easy to get lost (and demoralized) when trying to figure out what granularity to model existing data at in RDF...especially when there's a lot of it. In our case we had quite a bit of machine readable data in METS XML [1] and MARC. The Chronicling America web application needed to model only a small fraction of this data in order to deliver content meaningfully on the web. For example, when we loaded our "batches" of content into Chronicling America, we didn't need to model (in the database) the intricacies of the MIX metadata (colorspaces, scanning systems, sampling frequencies, etc) -- we just needed to know that the image format, and that it had particular dimensions in order to render the page. And when we loaded newspaper title metadata we didn't need to model all of the MARC record, we just needed to model its name, where it was published, its geographic coverage, etc. When we decided to use Linked Data, and in particular OAI-ORE, to make a digital resources harvestable we didn't go back and exhaustively model all the things we could have in order to make them available in RDF. We simply made the things we already had modeled in our relational database available, using pre-existing vocabularies wherever possible. This made the task of implementing Linked Data pretty easy, and it was coded up in a couple days of work. In some cases it was also possible to link resources to non-RDF documents (like the METS and MARC XML). We focused on the use case of making the titles, issues, pages, and their bit streams web harvestable. One early consumer of the data was another unit at the Library of Congress that wanted to periodically select and upload images of newspaper front pages to Flickr [2]. In order to do this they wanted a thumbnail, and to know the dimensions of the original jpeg2000 file in order to construct a custom URL for a high resolution image to upload to Flickr. So we added these things to the RDF representation of the Page. If you are interested I described this process a bit more last year [3]. I guess this is a long way of saying our Linked Data was "a conscious choice of ignoring part of the legacy data [we] had in the original data sources, in the light of the requirements of [our] scenarios". Letting what's easy and actually useful to someone drive what Linked Data gets published is a good way to get something working quickly, and for enriching it over time. //Ed [1] http://www.loc.gov/ndnp/pdf/NDNP_201113TechNotes.pdf (73 pages of notes on the data) [2] http://www.flickr.com/photos/library_of_congress/sets/72157619452486566/ [3] http://inkdroid.org/journal/2009/07/09/flickr-digital-curation-and-the-web/
Received on Sunday, 19 September 2010 10:13:23 UTC