Summary and requirements from the MMSEM multimedia annotation interoperability report

Dear all,

as some of the participants in the working group did not participate in
the Multimedia Semantics XG [1] and perhaps are not aware of the reports
which have been produced there, I proposed to summarize the use cases
which are relevant for the work of the media annotation working group and
extract some relevant requirements.

So first of all the MMSEM multimedia interoperability report [2]
summarizes the work of the XG in its use cases as detailed in [3] and
spots interoperability issues among these use cases and demonstrates how
semantic technologies can help to overcome some of these interoperability
issues.

The report covers the following use cases:

(1) The photo use case which is covering the extraction of semantics from
photos, their annotation, and cross-application compatibility of
annotation and organization tools and systems

(2) The music use which deals with the annotation of different aspects of
music on the Web, the interoperability of different vocabularies and
standards and the aggregation of related information

(3) The news use case that concerns annotation of news which are mostly
available on the Web as textual information illustrated by images, videos
or audio files. The use case contains an explanation of different
standards and vocabularies to describe news content.

(4) The tagging use case which tackles the problem of interoperability and
portability of tagging systems and personal tags. The use case sketches a
solution based on SKOS Core [4]

(5) The semantic media analysis use case which highlights challenges in
media analysis and which shows how to exploit different modalities of
multimedia content for analysis.

(6) The algorithm representation use case which is about the
interoperability of existing multimedia analysis systems in terms of
descriptions of their in- / and output.


Of particular relevance to the work of the media annotation working group
are the photo, music, news and semantic media analysis I think. These use
cases discuss the variety of content and description schemes available on
the Web.

For example the photo use case is motivated by the need for a common
exchange format of photo annotation to enable finding, sharing and reusing
photos across the borders of single sites and tools. The use case
discusses pros and cons of EXIF, XMP, PhotoRDF, DIG35 and MPEG-7 for using
as a lingua franca among tools and sites. The conclusion of the use case
authors is, that none of the standards is perfectly suited and that a
"limited and simple but at the same time comprehensive vocabulary in a
machine-readable, exchangeable, but not over complicated representation is
needed" [2].

The music use case discusses integration of different description schemes
for audio files and the integration of further information. Discussed
formats include OGG Vorbis, ID3 and the Music Ontology. The authors of the
use case also present typical metadata fields which are used to describe
music data.

The semantic media analysis use case most notably highlights the
integration of information in different modalities, i.e. relation of
persons mentioned in an audio track with their depiction in a video, the
relation of captions of images with objects in the image or the relation
of text fragments to objects in an image which the text is illustrated by.
Making this cross-modality-links possible demands for a basic
interoperability between audio-, video- and/or image-related description
schemes. Furthermore the use case highlights the necessity of linking
low-level features to high-level semantics which is important for some
retrieval scenarios. The relation of semantics across modalities may give
support for reasoning mechanism to infer further high-level concepts.

The report contains much more details thus please have a look at [1] and
[3] to get a more detailed introduction to the use cases.

Some basic requirements which I can extract from these use cases and more
general observations for our common media ontology is that

(1) Predominant media types (besides text) on the Web are images, video
and audio files.
(2) Semantics and annotations coming from authoring, organisation, and
sharing tools should be preserved as much as possible
(3) Descriptions should be exchangeable between sites and tools
(3) Linking between different media types, resp. fragments thereof should
be possible. This connects the work of the media annotation working group
to the work of the fragment working group
(4) Despite the fact that MPEG-7 is excluded from the focus of the group,
it makes sense for some use cases to keep descriptions of low-level
semantics. This can however be accomplished with mechanisms like GRDDL [5]
as proposed by for example the ramm.x model [6].

Perhaps we can build upon these 4 points.

Best,

Tobias

[1] http://www.w3.org/2005/Incubator/mmsem/
[2] http://www.w3.org/2005/Incubator/mmsem/XGR-interoperability/
[3] http://www.w3.org/2005/Incubator/mmsem/wiki/
[4] http://www.w3.org/TR/swbp-skos-core-guide
[5] http://www.w3.org/2004/01/rdxh/spec
[6] http://sw.joanneum.at/rammx/


-- 
_________________________________________________
Dipl.-Inf. Univ. Tobias Bürger

STI Innsbruck
University of Innsbruck, Austria
http://www.sti-innsbruck.at/

tobias.buerger@sti2.at
__________________________________________________

Received on Monday, 15 September 2008 14:31:31 UTC