Re: [Image] How to identify images in an RDF context from Richard Newman on 2005-02-10 (semantic-web@w3.org from February 2005)

From: Richard Newman <r.newman@reading.ac.uk>
Date: Thu, 10 Feb 2005 00:07:57 +0000
To: Karl Dubost <karl@w3.org>
Cc: semantic-web@w3.org
Message-Id: <7e82c9fece8935085f6b2f10e607e7c7@reading.ac.uk>
Karl,
   I've had a few thoughts on this (some of which have percolated into 
my iPhoto RDF exporter[1], but by no means all! I should do some more 
work on that...). I'll reply in-line.

[1] <http://www.holygoat.co.uk/applications/iphoto-rdf/iphoto-rdf>

On Feb 9, 2005, at 21:46, Karl Dubost wrote:
> a. Is the good strategy one RDF file for one image? or an EXIF file? 
> and an geo file? and a description file?

I'm not convinced that it matters. I'd expect it all to go into a 
store, really, but you could also separate 'factual' from user-centric 
data (EXIF vs. rating, for example).

> b. How do I identify the image ? a urn (which kind?) or an http uri?
> 	<http://example.org/photo/2005/02/06/foo.jpg> even if the image is 
> not online.

If the EXIF contains a precise date, that's possible, but you would 
need some way of mapping back to a file somewhere. I made up a scheme 
that uses iPhoto's unique IDs, but some URN based on a timestamp might 
be better. Giving it a URI would only work if you (or rather, the user 
of the software) has a prefix that they control. Tough one, that. See 
later answers for more.

> c. Images are between 1 and 3 Mo each, I keep them on DVDs or external 
> hard drive.
> 	Can I use the previous identifier?

I would hope so, though the mapping between URI and a file to load may 
be more complex than you would like! I'd prefer to see a more complex 
schema (see answer to next question).

> d. I have many versions of the same image.
> 	* Original Image (2000 x 3000 px)
> 	* Thumbnail	(75 x 75 px)
> 	* small version (400 x 600 px)
> 	* cropped version
> 	* published version in different context (different HTML pages, web 
> sites)
> 	* Sampling (different images or part of the images associated with 
> others)
> All these versions share one part or all parts of the information 
> which is about the image. How do I define the model to identify it and 
> gives information about it?

I took/take an FRBR-esque view of things: the photo is an abstract 
entity (probably a RAW image that existed between CCD and 
CompactFlash). It has a canonical representation (the JPEG you dragged 
off the CompactFlash), which has at least one location 
(file:///Users...), and a number of other representations (thumbnails, 
web exports, etc.). Furthermore, there are a number of other derived 
works which also have canonical representations, thumbnails, etc. --- 
these would be crops, colour-corrected versions, etc.

Then there would be format changes and so on, and there are also 
multiple copies (each JPEG version is linked to multiple instances by 
various properties, so you could have the original JPEG 
(file:///...orig.jpg), a backup burned to DVD, a copy on your Web site, 
etc.).

Each of these corresponds with one of FRBR's 4 layers, and has its own 
sets of properties.

I'd highly recommend digging up some stuff on FRBR, it makes this kind 
of thing much clearer. Note, though, that it still doesn't make it 
easier to identify the original picture, though if you know the 
original import location (iPhoto does) you can actually use that as an 
IFP to identify the abstract entity, skilfully avoiding the problem of 
giving it a URI! E.g.

pic:originalImportLocation a owl:InverseFunctionalProperty .
_b1 a pic:Photo;
       pic:originalImportLocation <file:///...> .

I've done a bit of work on modelling FRBR in RDFS and OWL, which I 
should also get round to finishing. It might be nice to cast it down to 
images as an exemplar.

> e. Is it better to have a large RDF file with information of all 
> images? Or a small individual RDF file for each image?

Whatever works best for communication. If you're sharing 10 pics, smush 
their RDF together. It'll probably hit a store before it's used, 
anyway.

> * RDF inside or outside the image.
>
> RDFPic recommends to put metadata inside the comment zone of JPEG. XMP 
> does it in the binary. In both case, I don't think it's always a good 
> idea, for privacy reason. Many softwares do not propose to wipe 
> metadata before publication on the Web. Problems will arise with cell 
> phones and GPS information.

There are file strippers available, but I quite understand the concern. 
I'd keep it separate.

> You may have information you don't want to publish on the Web. 
> Personal comments on the image, geo-localization of the image.
>
> Scenario: Someone is at a party at your place and likes very much your 
> painting or you computer. Cool. He takes a picture of it and send it 
> on his moblog, which displays the GPS information, then the latitude 
> and longitude with a comment "We are having so much fun at Peter's 
> place. He will write something about it on his weblog".
>
> Well no problems :))) Peter is leaving for holidays in Africa for one 
> month. Some people have noticed that. It's time for robbery !!! We 
> know the stuff inside, we know that Peter is not there. Let's go.

Good scenario. That's why I don't let it leave my machine ;)
Keeping things out of the file also avoids having to re-write files 
whenever the info changes, and means you don't have to read the file to 
get information about it (which is great for query servers). EXIF can 
stay, as that's supposed to be representative of the basic facts about 
the image, but RDF annotation should probably hang about elsewhere.

Interesting post, Karl, thanks.

-R
Received on Thursday, 10 February 2005 00:08:46 UTC