- From: Suzanne Little <Suzanne.Little@isti.cnr.it>
- Date: Wed, 14 Mar 2007 20:08:30 +0100
- To: public-xg-mmsem@w3.org
Hi all, Sorry for the delay. Below are my comments on the Semantic Media Analysis for Intelligent Media Retrieval use case as per action 16 from the last telecon. Talk to you tomorrow. Regards, Suzanne --------------------------------------------------------- Semantic Media Analysis for Intelligent Media Retrieval (using version last edited 2007-03-01 14:06:26 by SofiaTsekeridou) This use case was a little difficult to review since the problem it aims to address is very large and critically important in the field of multimedia semantics. I've tried to summarise what I think are the main points, looked at possible connections with other use cases and then made some general comments. Disclaimer (if that's the right term!): Significant parts of my thesis and of my current post-doc project are concerned with relating low-level features to high-level semantics defined in ontologies (e.g. see [1]) so that probably colours how I interpret some of this use case. *Summary* Intelligent media retrieval means that users are shielded from needing in-depth understanding of low-level descriptors and can instead query for media content based on high-level meaningful concepts. To support this type of query, semantic indices and annotations are needed. This use case aims to identify the problems involved with automatically creating these "semantic indices". The use case looks at two main issues or perhaps two approaches for finding high-level semantics: 1. turning low-level descriptors into exchangeable high-level semantics (the "multimedia semantic gap") and 2. exploiting cross-modal data to infer higher-level semantics. *The Multimedia Semantic Gap* The 3rd paragraph summarises the standard multimedia issue of the gap beween automatic, objective, low-level descriptors and manual, subjective, high-level descriptors and the benefits and disadvantages of each. There's quite a lot of work and many different approaches in this field. You mention Naphade (in another context) but also Arnold Smeulders et al.; William Grosky and Rong Zhao; Oge Marques; Jane Hunter and myself; Guus Schreiber and Laura Hollink etc. to name just a small set of the people I'm aware of. I'm not sure what the procedure is for referencing or reviewing related work like this in W3C documents. *MPEG-7* The 4th paragraph, which discusses MPEG-7, talks about how to integrate, align or link the MPEG-7 descriptors with domain semantics. This is a good overlap with the MPEG-7 deliverable. Perhaps (eventually) it should reference the discussion in that document. I think perhaps the salient point here is how "MPEG-7 metadata descriptions [can be] *properly* linked to domain-specific ontologies" (emphasis mine). I'm interested to see how people define 'properly' and what solutions are suggested for identifying and describing the links. *Connections with the Algorithm Use Case* There are obvious connections with the Algorithm Use case. Particularly the list of required attributes for the classifier ontology listed in the proposed solution for example 1 and as support for the creation of "feature vectors" mentioned in example 2. The ontologies in each case would need to be related. At this stage I don't think it would be useful for the two use cases to be formally combined. It's more important to note that there is a connection and to consider how they effect each other. I view the algorithm ontology as being a supporting layer below the constructs being discussed in this use case. Since both these use cases deal with fundamental underlying semantics and are essentially domain independent you could argue for them to be combined with almost any of the others. *Overall* The problem is a significant one and highly relevant to this group. It underpins many of the other problems that we're trying to address. The motivating examples are good; sufficiently realistic and challenging. The possible solutions list a number of ontologies (classifier, multimedia core, visual/audio/textual descriptor, domain, upper multimedia, cross-modality) which should be related or linked together to address the two issues of this use case. This is a fairly complex mix of semantics. The specifics about where concepts (e.g. creator, capture device, environment, feature vector etc.) are placed and how the ontologies are aligned are important to consider. The solution discussed for example 1 (generating feature vectors and associating them to an object class) is a challenging task. What do you think the limitations are in the current technologies (e.g. semantic web framework) for supporting this task? Some questions: * what do the links or mappings (low-level to domain) look like and how are they created? * how are spatial (or temporal) relationships defined? * how are media segments or regions identified? * what is meant by low-level, medium-level and high-level semantics and how do these terms apply to the ontologies? * some more details about what concepts belong in each ontology and where the connections between them are may also be useful for discussion. [1] S. Little and J. Hunter "Rules-By-Example - a Novel Approach to Semantic Indexing and Querying of Images" ISWC2004 http://www.springerlink.com/index/2EQ603G8CCXF3E8Y.pdf
Received on Wednesday, 14 March 2007 19:09:53 UTC