Re: Question on press use cases from Antoine Isaac on 2010-09-19 (public-lld@w3.org from September 2010)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Sun, 19 Sep 2010 20:17:34 +0200
CC: public-lld <public-lld@w3.org>
Message-ID: <4C9653BE.80903@few.vu.nl>
Hi Ed, Joachim,

Thanks a lot for your answers!
That's really insightful. In fact maybe you could link to your mails from your use cases, for the sake of the XG's own use case process :-)

Cheers,

Antoine


> On Sat, Sep 18, 2010 at 10:42 AM, Antoine Isaac<aisaac@few.vu.nl>  wrote:
>> So the question is whether the current situation results rather from:
>> - a conscious choice of ignoring part of the legacy data you had in the
>> original data sources, in the light of the requirements of your scenarios?
>> - a too great effort needed to move legacy data to linked data, considering
>> the resources you had?
>> - the lack of legacy data--you just converted all what you had?
>
> That's a really good question Antoine. I think it's often easy to get
> lost (and demoralized) when trying to figure out what granularity to
> model existing data at in RDF...especially when there's a lot of it.
> In our case we had quite a bit of machine readable data in METS XML
> [1] and MARC. The Chronicling America web application needed to model
> only a small fraction of this data in order to deliver content
> meaningfully on the web.
>
> For example, when we loaded our "batches" of content into Chronicling
> America, we didn't need to model (in the database) the intricacies of
> the MIX metadata (colorspaces, scanning systems, sampling frequencies,
> etc) -- we just needed to know that the image format, and that it had
> particular dimensions in order to render the page. And when we loaded
> newspaper title metadata we didn't need to model all of the MARC
> record, we just needed to model its name, where it was published, its
> geographic coverage, etc.
>
> When we decided to use Linked Data, and in particular OAI-ORE, to make
> a digital resources harvestable we didn't go back and exhaustively
> model all the things we could have in order to make them available in
> RDF. We simply made the things we already had modeled in our
> relational database available, using pre-existing vocabularies
> wherever possible. This made the task of implementing Linked Data
> pretty easy, and it was coded up in a couple days of work. In some
> cases it was also possible to link resources to non-RDF documents
> (like the METS and MARC XML). We focused on the use case of making the
> titles, issues, pages, and their bit streams web harvestable.
>
> One early consumer of the data was another unit at the Library of
> Congress that wanted to periodically select and upload images of
> newspaper front pages to Flickr [2]. In order to do this they wanted a
> thumbnail, and to know the dimensions of the original jpeg2000 file in
> order to construct a custom URL for a high resolution image to upload
> to Flickr. So we added these things to the RDF representation of the
> Page. If you are interested I described this process a bit more last
> year [3].
>
> I guess this is a long way of saying our Linked Data was "a conscious
> choice of ignoring part of the legacy data [we] had in the original
> data sources, in the light of the requirements of [our] scenarios".
> Letting what's easy and actually useful to someone drive what Linked
> Data gets published is a good way to get something working quickly,
> and for enriching it over time.
>
> //Ed
>
> [1] http://www.loc.gov/ndnp/pdf/NDNP_201113TechNotes.pdf (73 pages of
> notes on the data)
> [2] http://www.flickr.com/photos/library_of_congress/sets/72157619452486566/
> [3] http://inkdroid.org/journal/2009/07/09/flickr-digital-curation-and-the-web/
>
Received on Sunday, 19 September 2010 18:18:13 UTC