Re: [MM] First version of multimedia ontology requirements document

Hi all,
   Seems that Raphael and Jacco have touched almost all of the points  
we would have made. That said, I will provide two additional points:

Additional Requirements for a Common Multimedia Ontology Framework

1) A common multimedia ontology framework should agree upon the  
terminology for linking multimedia object to resource defined in  
Semantic Web representation languages. Further, this agreed upon  
terminology should take into account the past approaches so that as  
much existing data can be repurposed with minimal syntactic and  
semantic transformations. For example, it is to the best of our  
knowledge that a majority of approaches current provide this linking  
via foaf:depicts, etc. (Note that this point has probably already  
been pointed out)

2) A common multimedia ontology framework should provide an agreed  
upon way to localize sub-regions of multimedia objects (e.g., sub- 
regions of images). Again this terminology should take into account  
past approaches. To the best of our knowledge, this has been  
accomplished using bounding box coordinates and/or SVG snippets  
describing such regions.


Christian Halaschek-Wiener
PhD Student, Dept. of Computer Science
GRA, MINDSWAP Research Group,
University of Maryland, College Park
Web page:

On Feb 3, 2006, at 11:50 AM, RaphaŽl Troncy wrote:

> Hi all,
> Following the thread Jacco has begun, I bring my own contribution  
> about some
> requirements for a Common Multimedia Ontology Framework. This  
> contribution is
> stongly influenced by an archiving point of view.
> Regards.
>     RaphaŽl Troncy
> ------------
> Archived-Oriented Requirements for a Multimedia Ontology Framework
> by RaphaŽl Troncy and Antoine Isaac
> 1) Introduction.
> The following text provides a short list of requirements that  
> originate from the
> work at INA during our PhD Thesis. They have partly been previously  
> described in
> [Isaac and Troncy, 2004] where an Audio-Visual Description Core  
> Ontology has
> been proposed. In [Troncy 2004a, Troncy 2004b], we have proposed an  
> Extensible
> Audio-Visual Description Language that fullfill some of these  
> requirements,
> while overcoming the current proposal.
> 2) Using Audio-Visual Documents for Various Purposes.
> The applications that use audio-visual documents are interested in  
> different
> aspects. They have their own viewpoint on this complex media and  
> usually they
> are just concerned with selected pieces of information  
> corresponding to their
> needs. For instance:
>     - Many tools aim at indexing automatically audio-visual content by
> extracting low-level features from the signal. These features  
> concern video
> segmentation (in shots or in sequences), speech transcription,  
> detection and
> recognition of camera motion, faces, texts, etc. This family of  
> applications
> needs a common vocabulary to store and exchange the results of  
> their algorithms.
> The MPEG-7 standard defines such descriptors, without giving them a  
> formal
> semantics. Therefore, the common multimedia ontology framework  
> should provide
> this missing semantics.
>     - A TV (or radio) broadcaster may want to publish the program  
> listings on
> its web site. Therefore, it is interested in identifying and  
> cataloguing its
> programs. The channel would like also to know the detail of the  
> audience and the
> peak viewing times in order to adapt its advertisement rates.  
> Broadcasters have
> recently adopted the TV Anytime (note: The TV Anytime Forum
> ( is an association of organizations  
> which seeks to
> develop specifications to provide value-added interactive services  
> in the
> context of TV digital broadcasting. The forum identified metadata  
> as one of the
> key technologies enabling their vision and have adopted MPEG-7 as the
> description language.) format and its terminologies to exchange all  
> these
> metadata. Again the lack of formal semantics of these metadata will  
> prevent many
> possible uses.
>      - A news agency may aim at delivering program information to  
> newspapers. It
> could receive the TV Anytime metadata, and enrich them with the  
> cast or the
> recommended audience of the program, the last minute changes in the  
> program
> listings, etc. The ProgramGuideML (note: the ProgramGuideML  
> initiative is
> developed by the International Press Telecommunications Council (IPTC)
> ( and aims to be the global XML  
> standard for the
> interchange of Radio/TV Program Information.) format is currently  
> developed for
> this purpose.
>     - Education or humanities research use more and more the audio- 
> visual media.
> Their needs concern the possibility to analyse its production (e.g.  
> number,
> position and angle of the camera, sound recording) and to select  
> and describe
> deeply some excerpts according to domain theories, focusing for  
> example on
> action analysis (i.e. a praxeological viewpoint).
>     - Finally, an institute like INA has to collect and describe an  
> audio-visual
> cultural heritage. It is interested in all the aspects given above,  
> with a
> strong emphasis on a documentary archive viewpoint. A multimedia  
> ontology
> framework should allow here to identify and classify each program and
> collection, to describe the way it has been shot, produced and  
> broadcasted, to
> describe both its structure and its content. It should then be  
> enough open to be
> linked with any domain specific-ontology in order to describe  
> precisely the
> content of each program.
> 3) A Proposed Audio-Visual Description Core Ontology.
> Despite this variety, all these specific applications share common  
> concepts and
> properties when describing an AV document. For instance, the  
> concept of genre or
> some production and broadcast properties are always necessary,  
> either for
> cataloguing and indexing the document, or to parameterize an  
> algorithm whose
> goal is to extract automatically some features from the signal. We  
> observe also
> that the archive point of view is an aggregation of the usual  
> description
> facets. We have therefore formalized the practices of the  
> documentalists of INA
> as well as the terminology they use, in order to design an audio- 
> visual
> description core ontology [Isaac and Troncy, 2004] which could be a  
> good
> starting point for a Common Multimedia Ontology Framework. This  
> ontology is also
> linked to the DOLCE foundational ontology, which gives it a sound  
> and consensual
> upper-level justification.
> 4) References.
> [Isaac and Troncy, 2004]
> Antoine Isaac and RaphaŽl Troncy. Designing and Using an Audio-Visual
> Description Core Ontology. In Workshop on Core Ontologies in Ontology
> Engineering held in conjunction with the 14th International  
> Conference on
> Knowledge Engineering and Knowledge Management (EKAW'04),  
> Whittlebury Hall,
> Northamptonshire, UK, October 8th.
> [Troncy, 2004a]
> RaphaŽl Troncy and Jean Carrive - A Reduced Yet Extensible Audio- 
> Visual
> Description Language: How to Escape From the MPEG-7 Bottleneck. In  
> 4th ACM
> Symposium on Document Engineering (DocEng'04), J. Y. Vion-Dury  
> (editor), pages
> 87-89, Milwaukee, Wisconsin, USA, October 28-30.
> [Troncy, 2004b]
> RaphaŽl Troncy, Jean Carrive, Steffen Lalande, and Jean-Philippe  
> Poli. A
> Motivating Scenario for Designing an Extensible Audio-Visual  
> Description
> Language. In The International Workshop on Multidisciplinary Image,  
> Video, and
> Audio Retrieval and Mining (CoRIMedia), Sherbrooke, Canada, October  
> 25-26.
> --
> RaphaŽl Troncy
> CWI (Centre for Mathematics and Computer Science),
> Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
> e-mail: &
> Tel: +31 (0)20 - 592 4093
> Fax: +31 (0)20 - 592 4312
> Web:

Received on Saturday, 4 February 2006 16:18:48 UTC