Re: [MM] First version of multimedia ontology requirements document from Raphaël Troncy on 2006-02-03 (public-swbp-wg@w3.org from February 2006)

From: Raphaël Troncy <Raphael.Troncy@cwi.nl>
Date: Fri, 03 Feb 2006 17:50:00 +0100
To: swbp <public-swbp-wg@w3.org>
CC: Antoine Isaac <Antoine.Isaac@kb.nl>, Jacco van Ossenbruggen <Jacco.van.Ossenbruggen@cwi.nl>
Message-ID: <43E389B8.CC9BF2FA@cwi.nl>
Hi all,

Following the thread Jacco has begun, I bring my own contribution about some
requirements for a Common Multimedia Ontology Framework. This contribution is
stongly influenced by an archiving point of view.

Regards.

    Raphaël Troncy

------------

Archived-Oriented Requirements for a Multimedia Ontology Framework
by Raphaël Troncy and Antoine Isaac

1) Introduction.

The following text provides a short list of requirements that originate from the
work at INA during our PhD Thesis. They have partly been previously described in
[Isaac and Troncy, 2004] where an Audio-Visual Description Core Ontology has
been proposed. In [Troncy 2004a, Troncy 2004b], we have proposed an Extensible
Audio-Visual Description Language that fullfill some of these requirements,
while overcoming the current proposal.

2) Using Audio-Visual Documents for Various Purposes.

The applications that use audio-visual documents are interested in different
aspects. They have their own viewpoint on this complex media and usually they
are just concerned with selected pieces of information corresponding to their
needs. For instance:

    - Many tools aim at indexing automatically audio-visual content by
extracting low-level features from the signal. These features concern video
segmentation (in shots or in sequences), speech transcription, detection and
recognition of camera motion, faces, texts, etc. This family of applications
needs a common vocabulary to store and exchange the results of their algorithms.
The MPEG-7 standard defines such descriptors, without giving them a formal
semantics. Therefore, the common multimedia ontology framework should provide
this missing semantics.

    - A TV (or radio) broadcaster may want to publish the program listings on
its web site. Therefore, it is interested in identifying and cataloguing its
programs. The channel would like also to know the detail of the audience and the
peak viewing times in order to adapt its advertisement rates. Broadcasters have
recently adopted the TV Anytime (note: The TV Anytime Forum
(http://www.tv-anytime.org/) is an association of organizations which seeks to
develop specifications to provide value-added interactive services in the
context of TV digital broadcasting. The forum identified metadata as one of the
key technologies enabling their vision and have adopted MPEG-7 as the
description language.) format and its terminologies to exchange all these
metadata. Again the lack of formal semantics of these metadata will prevent many
possible uses.

     - A news agency may aim at delivering program information to newspapers. It
could receive the TV Anytime metadata, and enrich them with the cast or the
recommended audience of the program, the last minute changes in the program
listings, etc. The ProgramGuideML (note: the ProgramGuideML initiative is
developed by the International Press Telecommunications Council (IPTC)
(http://www.programguideml.org) and aims to be the global XML standard for the
interchange of Radio/TV Program Information.) format is currently developed for
this purpose.

    - Education or humanities research use more and more the audio-visual media.
Their needs concern the possibility to analyse its production (e.g. number,
position and angle of the camera, sound recording) and to select and describe
deeply some excerpts according to domain theories, focusing for example on
action analysis (i.e. a praxeological viewpoint).

    - Finally, an institute like INA has to collect and describe an audio-visual
cultural heritage. It is interested in all the aspects given above, with a
strong emphasis on a documentary archive viewpoint. A multimedia ontology
framework should allow here to identify and classify each program and
collection, to describe the way it has been shot, produced and broadcasted, to
describe both its structure and its content. It should then be enough open to be
linked with any domain specific-ontology in order to describe precisely the
content of each program.

3) A Proposed Audio-Visual Description Core Ontology.

Despite this variety, all these specific applications share common concepts and
properties when describing an AV document. For instance, the concept of genre or
some production and broadcast properties are always necessary, either for
cataloguing and indexing the document, or to parameterize an algorithm whose
goal is to extract automatically some features from the signal. We observe also
that the archive point of view is an aggregation of the usual description
facets. We have therefore formalized the practices of the documentalists of INA
as well as the terminology they use, in order to design an audio-visual
description core ontology [Isaac and Troncy, 2004] which could be a good
starting point for a Common Multimedia Ontology Framework. This ontology is also
linked to the DOLCE foundational ontology, which gives it a sound and consensual
upper-level justification.

4) References.

[Isaac and Troncy, 2004]
Antoine Isaac and Raphaël Troncy. Designing and Using an Audio-Visual
Description Core Ontology. In Workshop on Core Ontologies in Ontology
Engineering held in conjunction with the 14th International Conference on
Knowledge Engineering and Knowledge Management (EKAW'04), Whittlebury Hall,
Northamptonshire, UK, October 8th.

[Troncy, 2004a]
Raphaël Troncy and Jean Carrive - A Reduced Yet Extensible Audio-Visual
Description Language: How to Escape From the MPEG-7 Bottleneck. In 4th ACM
Symposium on Document Engineering (DocEng'04), J. Y. Vion-Dury (editor), pages
87-89, Milwaukee, Wisconsin, USA, October 28-30.

[Troncy, 2004b]
Raphaël Troncy, Jean Carrive, Steffen Lalande, and Jean-Philippe Poli. A
Motivating Scenario for Designing an Extensible Audio-Visual Description
Language. In The International Workshop on Multidisciplinary Image, Video, and
Audio Retrieval and Mining (CoRIMedia), Sherbrooke, Canada, October 25-26.

--
Raphaël Troncy
CWI (Centre for Mathematics and Computer Science),
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
e-mail: raphael.troncy@cwi.nl & raphael.troncy@gmail.com
Tel: +31 (0)20 - 592 4093
Fax: +31 (0)20 - 592 4312
Web: http://www.cwi.nl/ins2/
Received on Friday, 3 February 2006 16:50:23 UTC