brief description of IPTC-Photo Metadata and some comments/possible requirements

Sorry about the late fulfillment of my task, I had a writer's blank... 
and I am not really sure yet whether this document is what was expected, 
but I made a try and I'm waiting for your comments! I have had a look at 
the IPTC Standard - Photo Metadata 2008 [1] to draw a set of 
requirements for a MM description ontology. And I have also a question 
for the list: do we consider cataloging information too or only "pure 
content" description?

The document [1] is issued by the International Press Telecommunications 
Council and is the result of a larger collaboration; it "specifies 
metadata properties intended to be used primarily but not exclusively 
with photos". More specifically, "IPTC Photo Metadata provides data 
about photographs and the values can be processed by software. Each 
individual metadata entity is called a property and they are grouped 
into Administrative, Descriptive and Rights Related properties." These 
metadata could be applied to describe Multimedia documents too, and some 
links between different vocabularies are made: the metadata are 
described in natural language and show possible links with the 
“G2-Standard” (see [2] for example) and XMP [3] representation format.

As for the link with different vocabularies: for instance, the Title 
property aligns with the Dublin Core "Title" element and the properties 
that have the mention (legacy) should be filled in by keywords from 
different controlled vocabularies.

This set of metadata is aimed primarily at journalists, which explains 
some of the modeling choices. It is, in my opinion, a good starting 
point (amongst others) for listing mandatory description/metadata items, 
nevertheless it contains a number of drawbacks for a generic 
image/multimedia description scheme:

- Ambiguous modeling decisions: the “Keyword” property is supposed to 
get a free text value, and not keyword value as expected ("Keywords to 
express the subject of the content. Keywords may be free text and don't 
have to be taken from a controlled vocabulary."), whereas the "Subject 
Code" field has to be filled with controlled vocabulary from the IPTC 
Subject NewsCodes [4].

- Redundant (and thus ambiguous) modeling decisions: the metadata set 
contains a Title, Header, Caption field that all describe the content of 
the image, but that should/can all be different: it is hard to make the 
distinction between these if you are not one of the expert users the 
Specification is aiming at. In a generic multimedia annotation ontology, 
we could make a selection between these and decide to align either with 
all of these fields (and find a way to define their semantics 
precisely), but most likely only with one subset.

- Lack of relationships between the fields: there are some content 
description fields like Events, Location, Person, Object or Artwork 
Shown on the image, but one image, and moreover one Multimedia document, 
contains often more than one event, person, location; multiple Events 
etc can be specified with this description model, but to get 
satisfactory answers to precise queries, or to be able to disambiguate 
between different documents (particularly relevant in large homogeneous 
document collections), a formal relationship between the event, person 
and location has to be made. For example if a picture is about two Heads 
of State shaking hands at a Summit, attended by other Heads of State, an 
explicit relationship has to be made between the ones who are shaking 
hands and the event “shaking hands”.

The StructuredAnnotation of MPEG-7 (see [5] and example below) enables 
to explicit such a relationship; more genrally, I think that an 
annotation system based on graphs explicating relationships between the 
Who/What/When/Where/Why/How would improve browsing and searching in 
Multimedia documents collections.

Example of StructuredAnnotation, taken from [5].

<StructuredAnnotation>

<Who>

<Name xml:lang="en">Zinedine Zidane</Name>

</Who>

<WhatAction>

<Name xml:lang="en">Zinedine Zidane scoring against England.</Name>

</WhatAction>

</StructuredAnnotation>

The NewsML ontology [6], associated with Named Graphs, could also enable 
this type of links. I think that the possibility of such graphs should 
be present in a multimedia annotation schema, to enable as precise 
annotations as possible; the relationship between the different metadata 
elements (person/event/location) could be derived automatically in some 
cases (from text or context in the flow/still image), so having the 
possibility to integrate this context in an annotation would bring an 
added value, in my opinion. And I would be very interested to know what 
you think about this point!

[1] 
http://www.iptc.org/std/photometadata/2008/specification/IPTC-PhotoMetadata-2008_2.pdf

[2] http://www.newsml.org/pages/

[3] http://www.adobe.com/products/xmp/

[4] http://www.iptc.org/NewsCodes/

[5] http://www.w3.org/2005/Incubator/mmsem/XGR-mpeg7/

[6] http://homepages.cwi.nl/~troncy/research.html

Received on Tuesday, 16 September 2008 10:09:18 UTC