RE: [MM] First version of multimedia ontology requirements document

Hi all,
 
One more requirement covering multimedia analysis issues. 
 
Regards,
Vassilis
 
 
Content-based analysis of multimedia requires methods which will automatically segment video sequences and key frames into image areas corresponding to salient objects (e.g. cars, road, people, field, etc), track these objects in time, and provide a flexible framework for object recognition, indexing, retrieval and for further analysis of their relative motion and interactions. This problem can be viewed as relating symbolic terms to visual information by utilizing syntactic and semantic structure in a manner related to approaches in speech and language processing. More specifically, low-level multimedia features (e.g. MPEG-7 descriptors) must be assigned to semantic concepts and visual processing algorithms must be assigned to object attributes thus forming an a-priori knowledge base. Processing may then be performed by relating high-level symbolic representations to extracted features in the signal (image and temporal feature) domain. Basing such a representation on an ontology, one can capture both concrete and abstract relationships between salient visual properties. 
 
The research on low-level visual feature ontology must concentrate on modelling of the concepts and properties that describe visual features of objects, especially the visualizations of still images and videos in terms of low-level features and media structure descriptions. The ontology must follows closely the specification of the MPEG-7 Visual part, with the appropriate adaptation of the complex data type representations. Sub-concepts include standard MPEG-7 features like colour, shape, texture, motion, localization and basic descriptors. Additional features that are not part of the MPEG-7 Visual must also be modelled and included in the visual feature ontology, following a requirements collection process for low-level processing algorithms.
 
 
-----Original Message-----
From: public-swbp-wg-request@w3.org [mailto:public-swbp-wg-request@w3.org] On Behalf Of Christian Halaschek-Wiener
Sent: Saturday, February 04, 2006 6:12 PM
To: Raphaλl Troncy; Jacco van Ossenbruggen
Cc: swbp
Subject: Re: [MM] First version of multimedia ontology requirements document
 
Hi all,
  Seems that Raphael and Jacco have touched almost all of the points we would have made. That said, I will provide two additional points:
 
Additional Requirements for a Common Multimedia Ontology Framework
 
1) A common multimedia ontology framework should agree upon the terminology for linking multimedia object to resource defined in Semantic Web representation languages. Further, this agreed upon terminology should take into account the past approaches so that as much existing data can be repurposed with minimal syntactic and semantic transformations. For example, it is to the best of our knowledge that a majority of approaches current provide this linking via foaf:depicts, etc. (Note that this point has probably already been pointed out)
 
2) A common multimedia ontology framework should provide an agreed upon way to localize sub-regions of multimedia objects (e.g., sub-regions of images). Again this terminology should take into account past approaches. To the best of our knowledge, this has been accomplished using bounding box coordinates and/or SVG snippets describing such regions.
 
 
Cheers,
Chris
 
 
 
-- 
Christian Halaschek-Wiener
PhD Student, Dept. of Computer Science
GRA, MINDSWAP Research Group,
University of Maryland, College Park
Web page: http://www.mindswap.org/~chris
 
 
On Feb 3, 2006, at 11:50 AM, Raphaël Troncy wrote:



 
Hi all,
 
Following the thread Jacco has begun, I bring my own contribution about some
requirements for a Common Multimedia Ontology Framework. This contribution is
stongly influenced by an archiving point of view.
 
Regards.
 
    Raphaël Troncy
 
------------
 
Archived-Oriented Requirements for a Multimedia Ontology Framework
by Raphaël Troncy and Antoine Isaac
 
1) Introduction.
 
The following text provides a short list of requirements that originate from the
work at INA during our PhD Thesis. They have partly been previously described in
[Isaac and Troncy, 2004] where an Audio-Visual Description Core Ontology has
been proposed. In [Troncy 2004a, Troncy 2004b], we have proposed an Extensible
Audio-Visual Description Language that fullfill some of these requirements,
while overcoming the current proposal.
 
2) Using Audio-Visual Documents for Various Purposes.
 
The applications that use audio-visual documents are interested in different
aspects. They have their own viewpoint on this complex media and usually they
are just concerned with selected pieces of information corresponding to their
needs. For instance:
 
    - Many tools aim at indexing automatically audio-visual content by
extracting low-level features from the signal. These features concern video
segmentation (in shots or in sequences), speech transcription, detection and
recognition of camera motion, faces, texts, etc. This family of applications
needs a common vocabulary to store and exchange the results of their algorithms.
The MPEG-7 standard defines such descriptors, without giving them a formal
semantics. Therefore, the common multimedia ontology framework should provide
this missing semantics.
 
    - A TV (or radio) broadcaster may want to publish the program listings on
its web site. Therefore, it is interested in identifying and cataloguing its
programs. The channel would like also to know the detail of the audience and the
peak viewing times in order to adapt its advertisement rates. Broadcasters have
recently adopted the TV Anytime (note: The TV Anytime Forum
(http://www.tv-anytime.org/) is an association of organizations which seeks to
develop specifications to provide value-added interactive services in the
context of TV digital broadcasting. The forum identified metadata as one of the
key technologies enabling their vision and have adopted MPEG-7 as the
description language.) format and its terminologies to exchange all these
metadata. Again the lack of formal semantics of these metadata will prevent many
possible uses.
 
     - A news agency may aim at delivering program information to newspapers. It
could receive the TV Anytime metadata, and enrich them with the cast or the
recommended audience of the program, the last minute changes in the program
listings, etc. The ProgramGuideML (note: the ProgramGuideML initiative is
developed by the International Press Telecommunications Council (IPTC)
(http://www.programguideml.org) and aims to be the global XML standard for the
interchange of Radio/TV Program Information.) format is currently developed for
this purpose.
 
    - Education or humanities research use more and more the audio-visual media.
Their needs concern the possibility to analyse its production (e.g. number,
position and angle of the camera, sound recording) and to select and describe
deeply some excerpts according to domain theories, focusing for example on
action analysis (i.e. a praxeological viewpoint).
 
    - Finally, an institute like INA has to collect and describe an audio-visual
cultural heritage. It is interested in all the aspects given above, with a
strong emphasis on a documentary archive viewpoint. A multimedia ontology
framework should allow here to identify and classify each program and
collection, to describe the way it has been shot, produced and broadcasted, to
describe both its structure and its content. It should then be enough open to be
linked with any domain specific-ontology in order to describe precisely the
content of each program.
 
3) A Proposed Audio-Visual Description Core Ontology.
 
Despite this variety, all these specific applications share common concepts and
properties when describing an AV document. For instance, the concept of genre or
some production and broadcast properties are always necessary, either for
cataloguing and indexing the document, or to parameterize an algorithm whose
goal is to extract automatically some features from the signal. We observe also
that the archive point of view is an aggregation of the usual description
facets. We have therefore formalized the practices of the documentalists of INA
as well as the terminology they use, in order to design an audio-visual
description core ontology [Isaac and Troncy, 2004] which could be a good
starting point for a Common Multimedia Ontology Framework. This ontology is also
linked to the DOLCE foundational ontology, which gives it a sound and consensual
upper-level justification.
 
4) References.
 
[Isaac and Troncy, 2004]
Antoine Isaac and Raphaël Troncy. Designing and Using an Audio-Visual
Description Core Ontology. In Workshop on Core Ontologies in Ontology
Engineering held in conjunction with the 14th International Conference on
Knowledge Engineering and Knowledge Management (EKAW'04), Whittlebury Hall,
Northamptonshire, UK, October 8th.
 
[Troncy, 2004a]
Raphaël Troncy and Jean Carrive - A Reduced Yet Extensible Audio-Visual
Description Language: How to Escape From the MPEG-7 Bottleneck. In 4th ACM
Symposium on Document Engineering (DocEng'04), J. Y. Vion-Dury (editor), pages
87-89, Milwaukee, Wisconsin, USA, October 28-30.
 
[Troncy, 2004b]
Raphaël Troncy, Jean Carrive, Steffen Lalande, and Jean-Philippe Poli. A
Motivating Scenario for Designing an Extensible Audio-Visual Description
Language. In The International Workshop on Multidisciplinary Image, Video, and
Audio Retrieval and Mining (CoRIMedia), Sherbrooke, Canada, October 25-26.
 
--
Raphaël Troncy
CWI (Centre for Mathematics and Computer Science),
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
e-mail: raphael.troncy@cwi.nl & raphael.troncy@gmail.com
Tel: +31 (0)20 - 592 4093
Fax: +31 (0)20 - 592 4312
Web: http://www.cwi.nl/ins2/
 
 
 
 

Received on Tuesday, 14 February 2006 13:02:09 UTC