- From: Vassilis Tzouvaras <tzouvaras@image.ntua.gr>
- Date: Tue, 14 Feb 2006 15:01:21 +0200
- To: "'Christian Halaschek-Wiener'" <halasche@cs.umd.edu>, "'Raphaλl Troncy'" <Raphael.Troncy@cwi.nl>, "'Jacco van Ossenbruggen'" <Jacco.van.Ossenbruggen@cwi.nl>, "'Giorgos Stamou'" <gstam@softlab.ntua.gr>
- Cc: "'swbp'" <public-swbp-wg@w3.org>
- Message-ID: <000301c63166$c36773c0$b60b6693@FARFALINA>
Hi all, One more requirement covering multimedia analysis issues. Regards, Vassilis Content-based analysis of multimedia requires methods which will automatically segment video sequences and key frames into image areas corresponding to salient objects (e.g. cars, road, people, field, etc), track these objects in time, and provide a flexible framework for object recognition, indexing, retrieval and for further analysis of their relative motion and interactions. This problem can be viewed as relating symbolic terms to visual information by utilizing syntactic and semantic structure in a manner related to approaches in speech and language processing. More specifically, low-level multimedia features (e.g. MPEG-7 descriptors) must be assigned to semantic concepts and visual processing algorithms must be assigned to object attributes thus forming an a-priori knowledge base. Processing may then be performed by relating high-level symbolic representations to extracted features in the signal (image and temporal feature) domain. Basing such a representation on an ontology, one can capture both concrete and abstract relationships between salient visual properties. The research on low-level visual feature ontology must concentrate on modelling of the concepts and properties that describe visual features of objects, especially the visualizations of still images and videos in terms of low-level features and media structure descriptions. The ontology must follows closely the specification of the MPEG-7 Visual part, with the appropriate adaptation of the complex data type representations. Sub-concepts include standard MPEG-7 features like colour, shape, texture, motion, localization and basic descriptors. Additional features that are not part of the MPEG-7 Visual must also be modelled and included in the visual feature ontology, following a requirements collection process for low-level processing algorithms. -----Original Message----- From: public-swbp-wg-request@w3.org [mailto:public-swbp-wg-request@w3.org] On Behalf Of Christian Halaschek-Wiener Sent: Saturday, February 04, 2006 6:12 PM To: Raphaλl Troncy; Jacco van Ossenbruggen Cc: swbp Subject: Re: [MM] First version of multimedia ontology requirements document Hi all, Seems that Raphael and Jacco have touched almost all of the points we would have made. That said, I will provide two additional points: Additional Requirements for a Common Multimedia Ontology Framework 1) A common multimedia ontology framework should agree upon the terminology for linking multimedia object to resource defined in Semantic Web representation languages. Further, this agreed upon terminology should take into account the past approaches so that as much existing data can be repurposed with minimal syntactic and semantic transformations. For example, it is to the best of our knowledge that a majority of approaches current provide this linking via foaf:depicts, etc. (Note that this point has probably already been pointed out) 2) A common multimedia ontology framework should provide an agreed upon way to localize sub-regions of multimedia objects (e.g., sub-regions of images). Again this terminology should take into account past approaches. To the best of our knowledge, this has been accomplished using bounding box coordinates and/or SVG snippets describing such regions. Cheers, Chris -- Christian Halaschek-Wiener PhD Student, Dept. of Computer Science GRA, MINDSWAP Research Group, University of Maryland, College Park Web page: http://www.mindswap.org/~chris On Feb 3, 2006, at 11:50 AM, Raphaël Troncy wrote: Hi all, Following the thread Jacco has begun, I bring my own contribution about some requirements for a Common Multimedia Ontology Framework. This contribution is stongly influenced by an archiving point of view. Regards. Raphaël Troncy ------------ Archived-Oriented Requirements for a Multimedia Ontology Framework by Raphaël Troncy and Antoine Isaac 1) Introduction. The following text provides a short list of requirements that originate from the work at INA during our PhD Thesis. They have partly been previously described in [Isaac and Troncy, 2004] where an Audio-Visual Description Core Ontology has been proposed. In [Troncy 2004a, Troncy 2004b], we have proposed an Extensible Audio-Visual Description Language that fullfill some of these requirements, while overcoming the current proposal. 2) Using Audio-Visual Documents for Various Purposes. The applications that use audio-visual documents are interested in different aspects. They have their own viewpoint on this complex media and usually they are just concerned with selected pieces of information corresponding to their needs. For instance: - Many tools aim at indexing automatically audio-visual content by extracting low-level features from the signal. These features concern video segmentation (in shots or in sequences), speech transcription, detection and recognition of camera motion, faces, texts, etc. This family of applications needs a common vocabulary to store and exchange the results of their algorithms. The MPEG-7 standard defines such descriptors, without giving them a formal semantics. Therefore, the common multimedia ontology framework should provide this missing semantics. - A TV (or radio) broadcaster may want to publish the program listings on its web site. Therefore, it is interested in identifying and cataloguing its programs. The channel would like also to know the detail of the audience and the peak viewing times in order to adapt its advertisement rates. Broadcasters have recently adopted the TV Anytime (note: The TV Anytime Forum (http://www.tv-anytime.org/) is an association of organizations which seeks to develop specifications to provide value-added interactive services in the context of TV digital broadcasting. The forum identified metadata as one of the key technologies enabling their vision and have adopted MPEG-7 as the description language.) format and its terminologies to exchange all these metadata. Again the lack of formal semantics of these metadata will prevent many possible uses. - A news agency may aim at delivering program information to newspapers. It could receive the TV Anytime metadata, and enrich them with the cast or the recommended audience of the program, the last minute changes in the program listings, etc. The ProgramGuideML (note: the ProgramGuideML initiative is developed by the International Press Telecommunications Council (IPTC) (http://www.programguideml.org) and aims to be the global XML standard for the interchange of Radio/TV Program Information.) format is currently developed for this purpose. - Education or humanities research use more and more the audio-visual media. Their needs concern the possibility to analyse its production (e.g. number, position and angle of the camera, sound recording) and to select and describe deeply some excerpts according to domain theories, focusing for example on action analysis (i.e. a praxeological viewpoint). - Finally, an institute like INA has to collect and describe an audio-visual cultural heritage. It is interested in all the aspects given above, with a strong emphasis on a documentary archive viewpoint. A multimedia ontology framework should allow here to identify and classify each program and collection, to describe the way it has been shot, produced and broadcasted, to describe both its structure and its content. It should then be enough open to be linked with any domain specific-ontology in order to describe precisely the content of each program. 3) A Proposed Audio-Visual Description Core Ontology. Despite this variety, all these specific applications share common concepts and properties when describing an AV document. For instance, the concept of genre or some production and broadcast properties are always necessary, either for cataloguing and indexing the document, or to parameterize an algorithm whose goal is to extract automatically some features from the signal. We observe also that the archive point of view is an aggregation of the usual description facets. We have therefore formalized the practices of the documentalists of INA as well as the terminology they use, in order to design an audio-visual description core ontology [Isaac and Troncy, 2004] which could be a good starting point for a Common Multimedia Ontology Framework. This ontology is also linked to the DOLCE foundational ontology, which gives it a sound and consensual upper-level justification. 4) References. [Isaac and Troncy, 2004] Antoine Isaac and Raphaël Troncy. Designing and Using an Audio-Visual Description Core Ontology. In Workshop on Core Ontologies in Ontology Engineering held in conjunction with the 14th International Conference on Knowledge Engineering and Knowledge Management (EKAW'04), Whittlebury Hall, Northamptonshire, UK, October 8th. [Troncy, 2004a] Raphaël Troncy and Jean Carrive - A Reduced Yet Extensible Audio-Visual Description Language: How to Escape From the MPEG-7 Bottleneck. In 4th ACM Symposium on Document Engineering (DocEng'04), J. Y. Vion-Dury (editor), pages 87-89, Milwaukee, Wisconsin, USA, October 28-30. [Troncy, 2004b] Raphaël Troncy, Jean Carrive, Steffen Lalande, and Jean-Philippe Poli. A Motivating Scenario for Designing an Extensible Audio-Visual Description Language. In The International Workshop on Multidisciplinary Image, Video, and Audio Retrieval and Mining (CoRIMedia), Sherbrooke, Canada, October 25-26. -- Raphaël Troncy CWI (Centre for Mathematics and Computer Science), Kruislaan 413, 1098 SJ Amsterdam, The Netherlands e-mail: raphael.troncy@cwi.nl & raphael.troncy@gmail.com Tel: +31 (0)20 - 592 4093 Fax: +31 (0)20 - 592 4312 Web: http://www.cwi.nl/ins2/
Received on Tuesday, 14 February 2006 13:02:09 UTC