- From: Antoine Isaac <aisaac@few.vu.nl>
- Date: Sun, 19 Sep 2010 20:17:34 +0200
- CC: public-lld <public-lld@w3.org>
Hi Ed, Joachim, Thanks a lot for your answers! That's really insightful. In fact maybe you could link to your mails from your use cases, for the sake of the XG's own use case process :-) Cheers, Antoine > On Sat, Sep 18, 2010 at 10:42 AM, Antoine Isaac<aisaac@few.vu.nl> wrote: >> So the question is whether the current situation results rather from: >> - a conscious choice of ignoring part of the legacy data you had in the >> original data sources, in the light of the requirements of your scenarios? >> - a too great effort needed to move legacy data to linked data, considering >> the resources you had? >> - the lack of legacy data--you just converted all what you had? > > That's a really good question Antoine. I think it's often easy to get > lost (and demoralized) when trying to figure out what granularity to > model existing data at in RDF...especially when there's a lot of it. > In our case we had quite a bit of machine readable data in METS XML > [1] and MARC. The Chronicling America web application needed to model > only a small fraction of this data in order to deliver content > meaningfully on the web. > > For example, when we loaded our "batches" of content into Chronicling > America, we didn't need to model (in the database) the intricacies of > the MIX metadata (colorspaces, scanning systems, sampling frequencies, > etc) -- we just needed to know that the image format, and that it had > particular dimensions in order to render the page. And when we loaded > newspaper title metadata we didn't need to model all of the MARC > record, we just needed to model its name, where it was published, its > geographic coverage, etc. > > When we decided to use Linked Data, and in particular OAI-ORE, to make > a digital resources harvestable we didn't go back and exhaustively > model all the things we could have in order to make them available in > RDF. We simply made the things we already had modeled in our > relational database available, using pre-existing vocabularies > wherever possible. This made the task of implementing Linked Data > pretty easy, and it was coded up in a couple days of work. In some > cases it was also possible to link resources to non-RDF documents > (like the METS and MARC XML). We focused on the use case of making the > titles, issues, pages, and their bit streams web harvestable. > > One early consumer of the data was another unit at the Library of > Congress that wanted to periodically select and upload images of > newspaper front pages to Flickr [2]. In order to do this they wanted a > thumbnail, and to know the dimensions of the original jpeg2000 file in > order to construct a custom URL for a high resolution image to upload > to Flickr. So we added these things to the RDF representation of the > Page. If you are interested I described this process a bit more last > year [3]. > > I guess this is a long way of saying our Linked Data was "a conscious > choice of ignoring part of the legacy data [we] had in the original > data sources, in the light of the requirements of [our] scenarios". > Letting what's easy and actually useful to someone drive what Linked > Data gets published is a good way to get something working quickly, > and for enriching it over time. > > //Ed > > [1] http://www.loc.gov/ndnp/pdf/NDNP_201113TechNotes.pdf (73 pages of > notes on the data) > [2] http://www.flickr.com/photos/library_of_congress/sets/72157619452486566/ > [3] http://inkdroid.org/journal/2009/07/09/flickr-digital-curation-and-the-web/ >
Received on Sunday, 19 September 2010 18:18:13 UTC