W3C

Image annotation on the Semantic Web

Editors' Draft $Date: 2005/10/17 09:20:45 $ $Revision: 1.17 $

This version:

N/A

Latest version:

N/A

Previous version:

N/A

Editors:

TO BE REVISED AT THE END

Giorgos Stamou, IVML, National Technical University of Athens, <gstam@softlab.ece.ntua.gr>

Jacco van Ossenbruggen, Center for Mathematics and Computer Science (CWI), < Jacco.van.Ossenbruggen@cwi.nl >

Raphaël Troncy, Center for Mathematics and Computer Science (CWI), < Raphael.Troncy@cwi.nl >

Additional Contributors and Special Thanks to:

TO BE REVISED AT THE END

Jane Hunter, DSTC, <jane@dstc.edu.au>

Guus Schreiber, VU,<schreiber@cs.vu.nl>

Vassilis Tzouvaras, IVML, National Technical University of Athens, <tzouvaras@image.ece.ntua.gr>

Nikolaos Simou, IVML, National Technical University of Athens, <nsimou@image.ece.ntua.gr>

Christian Halaschek-Wiener, UMD, <halasche@cs.umd.edu>

Jeff Pan, University of Manchester, <pan@cs.man.ac.uk>

Jeremy Caroll, HP, <jjc@hplb.hpl.hp.com>

John Smith, IBM,  <rsmith@watson.ibm.com>

Copyright © 2003 W3C ® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.


Abstract

Many applications that involve multimedia content make use of some form of metadata that describe this content. This document provides guidelines for using Semantic Web languages and technologies in order to create, store, manipulate, interchange and process image metadata. It gives a number of use cases to exemplify the use of Semantic Web technology for image annotation, an overview of RDF and OWL vocabularies developed for this task and an overview of relevant tools.

Note that many approaches to image annotation predate Semantic Web technology. Interoperability between these technologies and RDF and OWL-based approaches, however, will be addressed in a future document.

Target Audience

Institutions and organizations with research and standardization activities in the area of multimedia, professional (museums, libraries, audiovisual archives, media production and broadcast industry, image and video banks) and non-professional (end-users) multimedia annotators.

Objectives

  • Provide use cases with examples of multimedia annotations
  • Collect currently used vocabularies for multimedia annotations (like Dublin Core, VRA, ...)
  • Investigate existing tools and other formats (ID3, EXIF, XMP etc)

Status of this document

This is a public (WORKING DRAFT) Working Group Note produced by the Multimedia Annotation in the Semantic Web Task Force of the W3C Semantic Web Best Practices & Deployment Working Group, which is part of the W3C Semantic Web activity.

Discussion of this document is invited on the public mailing list public-swbp-wg@w3.org (public archives). Public comments should include "comments: [MM]" at the start of the Subject header.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Other documents may supersede this document.

1. Introduction

The need for annotating digital image data is recognized in a wide variety of different applications, covering both professional and personal usage of image data. At the time of writing, most work done is this area is not based on Semantic Web technology, either because it predates the Semantic Web or for other reasons. This document explains the advantages of using Semantic Web languages and technology for image annotations and provides guidelines for doing so. It is organized around a number of representative use cases, and a description of Semantic Web vocabularies and tools that could be used to help accomplish the task mentioned in the uses cases. The remainder of this introductory section first gives an overview of image annotation in general, followed by a short description of the key Semantic Web concepts that are relevant for image annotation.

1.1 Image annotation basics

Annotating images on a small scale for personal usage can be relatively simple. The reader should be aware, however, that large scale, industrial strength image annotation is notoriously complex. Trade offs along several dimensions make the task difficult:

·         Generic vs task-specific annotation Annotating images without having a specific goal or task in mind is often not cost effective: after the target application has beed developed, it turns out that images have been annotated using the wrong type of information, or on the wrong abstraction level, etc. Redoing the annotations is then an unavoidable, but costly solution. On the other hand, annotating with only the target application in mind may also not be cost effective. The annotations may work well with that one application, but if the same metadata is to be reused in the context of other applications, it may turn out to be too specific, and unsuited for reuse in a different context. In most situations the range of applications in which the metadata will be used in the future is unknown at the time of annotation. When lacking a crystal ball, the best the annotator can do in practice is use an approach that is sufficiently specific for the application under developement, while avoiding unnessary application-specific assumptions as much as possible.

·         Manual versus automatic annotation and the "Semantic Gap" . In general, manual annotation can provide image desciptions at the right level of abstraction. It is, however, time consuming and thus expensive. In addition, it proves to be highly subjective: different human annotators tend to "see" different things in the same image. On the other hand, annotation based on automatic feature extraction is relatively fast and cheap, and free of human bias. It tends to result, however, in image descriptions that are too low level for many applications. The difference between the low level feature descriptions provided by image analysis tools and the high level content descriptions required by the applications is often referred to, in the literature, as the Semantic Gap. In the remainder, we will discuss use cases, vocabularies and tools for both manual and automatic image annotation.

·         Different vocabularies for different types of metadata. While various classifications of metadata have been described in the literature, every annotator should at least be aware of the difference between annotations describing properties of the image itself, and those describing the subject matter of the image, that is, the properties of the objects, persons or concepts depicted by the image. In the first category, typical annotations provide information about title, creator, resolution, image format, image size, copyright, year of publication, etc. Many applications use a common, predefined and relatively small vocabulary defining such properties. Examples include the Dublin Core and VRA Core vocabularies [add refs]. The second category describes what is depicted by the image, which can vary wildly with the type of image at hand. As a result, one sees a large variation in vocabularies used for this purpose. Typical examples vary from domain-specific vocabularies (for example, with terms that are very specific for astronomy images, or sport images, etc) to domain-independent ones (for example, a vocabulary with terms that are sufficiently generic to describe any news photo). In addition, vocabularies tend to differ in size, granularity, formality etc. In the remainder, we discuss the above metadata categories. Note that in the first type it is not uncommon that a vocabulary only defines the properties and defers the definitions of the values of those properties to another vocabulary. This is true, for example, for both Dublin Core and VRA Core. This means that typically, in order to annotate a single image one needs terms from multiple vocabularies.

·         Lack of Syntactic and Semantic Interoperability Many different file formats and tools for image annotations are currenty in use. Reusing metdata developed for one set of tools in another is often hindered by a lack of interoperability. First, different tools use different file formats, so tool A may not be able to read in the metadata provided by tool B (syntax-level interoperability). Solving the problem is relatively easy if the inner structure of both file formats are known by developing a conversion tool. Second, tool A may assign a different meaning to the same annotation as tool B does (semantic interoperability). Solving this problem is much harder and can be done automatically only when the semantics of the vocabulary used is explicitly defined for both tools.

1.2 Semantic Web Basics

TO BE DONE [By Jacco]

While much of the current work in this area is not (yet) based on Semantic Web languages and technology, we believe using this has many potential advantages.

  1. Now digital images are published and exchanged more and more over the Web, it is increasingly important that the associated annotations can also be shared and published over the Web. The Semantic Web is designed with this goal in mind.
  2. Because the Semantic Web is inherently Web-based, it is build on top of open, platform and application neutral languages, which reduces the syntactic interoperability problems mentioned above ([add ref to other document about SemWEn/Non-SemWeb interoperability]).
  3. Because the Semantic Web allows for machine readable and explicitly defined semantics, it also provides practical solution for solving the semantic interoperability problems. For example, a very specific term in an annotation produced by tool A may be recognized by tool B that requires more generic terminology by using the explicit BroaderTerm/NarrowerTerm relations from a predefined thesaurus [make sure this matches with SKOS].
  4. By (re)using the Web's URI ([IRI?]) scheme for identifying the resources that are annotated, the annotations and the defintions of the concepts used in the annotations, everyone can unambiguously publish and exchange annotations and annotation vocabulary.

2. Use Cases

Use case: Management of Personal Digital Photo Collections

The late advances in digital technologies (cameras, computers, storage, communication etc) caused a huge increase of digital multimedia information captured stored and distributed by personal users over the web. Digital formats provide now the most cheap, safe and easy way to broadly capture, store and deliver multimedia content. Most personal users have thousands of photos (from vacations, parties, traveling, conferences, everyday life etc), usually stored in several resolutions on the hard disk of their computers in a simple directory structure without any metadata. Ideally, the user wants to easily access this content, view it, create presentations, use it in his homepage, deliver it over the Internet to other people, make part of it accessible for other people or even sale part of it to image banks etc. But unfortunately, the only way for this content to be accessed is by browsing the directories, the name of which usually provides the date and describes with one-two words the original event captured by the specific photos. Obviously, this access becomes more and more difficult since the number of photos increases everyday and unfortunately very soon the content will practically become unaccessible. The best solution to this problem covering almost all the present and future uses of the content is the description and archiving of each photo with the aid of a tool providing a semantic metadata structure using the Semantic Web technologies (see for example the tools below). Using the above tools, the users can access the photos with the aid of virtual views (taxonomies), keywords describing their content or administrative information (like the date of archiving, the resolution of the photo) etc. And the most important, the standardization of the metadata format ensures the accessibility of the content from other people, the sharing and use of it over the web.

Use case: Cultural Heritage

A museum in fine arts has asked a specialized company to produce high resolution digital scans of the most important art works of their collections. The museum's quality assurance requires the possibility to track when, where and by whom every scan was made, with what equipment, etc. The museum's internal IT departement, maintainaing the underlying image database, needs the size, resolution and format of every resulting image. It also needs to know the repository ID of the original work of art. The company developing the museum's website additionally requires copyright information (that varies for every scan, depending on the age of the original work of art and the collection it originates from). It also want to give the users of the website access to the collection, not only based on the titles of the paintings and names of their painters, but also based on the topics depicted ('sun sets'), genre ('self portraits'), style ('post-impressionism'), period ('fin de siecle'), region ('west european').

Use case: Scientific/medical image annotations

To be done [by Jane].

Use case: Television news archive

Audiovisual archives manage very large multimedia databases. For instance, INA, the French Audiovisual National Institute, has been archiving TV documents for 50 years and radio documents for 65 years and stores more than 1 million hours of broadcast programs. The images and sound archives kept at INA are either intended for professional use (journalists, film directors, producers, audiovisual and multimedia programmers and publishers, in France and worldwide) or communicated for research purposes (for a public of students, research workers, teachers and writers). In order to allow an efficient access to the data stored, most of the parts of these video documents are described and indexed by their content. The global multimedia information system should then be fine-grain enough detailed to support some very complex and precise queries. For example, a journalist or a film director client might ask for an excerpt of a previously broadcasted program showing the first goal of a given football player in its national team, scored with its head. The query could additionally contain some more technical requirements such that the goal action should be available according to both the front camera view and the reverse angle camera view. Finally, the client might or might not remember some general information about this football game, such that the date, the place and the final score.

Use case: Media Production Services

A media production house requires several web services in order to organise and implement its projects. Usually, the pre-production and production start from location, people, image and footage search and retrieval in order to speed up the process and reduce as much as possible the cost of the production. For that reason, several multimedia archives (image and video banks, location management databases, casting houses etc) provide the above information through the web. Everyday, media producers, location managers, casting managers etc, are looking in the above archives in order to find the appropriate resources for their project. The quality of this search and retrieval process directly affects the quality of the service that the archives provide to the users. In order to facilitate the above process, the annotation of image content should make use of the Semantic Web technologies, also following the multimedia standards in order to be interoperable with other archives. Using for example the tools described below, people that archives the content in the media production chain can provide all the necessary information (administrative, structural and descriptive metadata) in a standard form (RDF, OWL) that will be easily accessible for other people over the web. Using the Semantic Web standards, the archiving, search and retrieval processes will then make use of semantic vocabularies (ontologies) describing information concerning the structure of the content from thematic categories to description of the main objects appearing in the content with its main visual characteristics etc. In this way, multimedia archives will make their content easily accessible over the web, providing a unified framework for media production resource allocation.

Use Case: large-scale image collections at NASA

Many organizations maintain extremely large-scale image collections. The National Aeronautics and Space Administration (NASA) is such an example, which has hundreds of thousands of images, stored in different formats, levels of availability and resolution, and with associated descriptive information at various levels of detail and formality. Such an organization also generates thousands of images on an ongoing basis that are collected and cataloged. Thus, a mechanism is needed to catalog all the different types of image content across various domains. Information about both the image itself (e.g., its creation date, dpi, source) and about the specific content of the image is required. Additionally, the associated metadata must be maintainable and extensible so that associated relationships between images and data can evolve cumulatively. Lastly, management functionality should provide mechanisms flexible enough to enforce restriction based on content type, ownership, authorization, etc.

This section needs to be moved to a future section about solutions to use cases.

One possible solution for such image management requirements is an annotation environment that enables users to annotate information about images and/or their regions using concepts in ontologies (OWL and/or RDFS). More specifically, subject matter experts will be able to assert metadata elements about images and their specific content. Multimedia related ontologies can be used to localize and represent regions within particular images. These regions can then be related to the image via a depiction/annotation property. This functionality can be provided, for example, by the MINDSWAP digital-media ontology (to represent image regions), in conjunction with FOAF (to assert image depictions). Additionally, in order to represent the low level image features of regions, the aceMedia Visual Descriptor Ontology can be used.

Existing toolkits, such as PhotoStuff and M-OntoMat-Annotizer, currently provide graphical environments to accomplish the tasks as defined above. Using such tools, users can load images, create regions around parts of the image, automatically extract low-level features of selected regions (via M-OntoMat-Annotizer), assert statements about the selected regions, etc. Additionally, the resulting annotations can be exported as RDF/XML, thus allowing them be shared, indexed, and used by advanced annotation-based browsing (and searchable) environments.

3. Vocabularies for image annotation

MPEG-7 translations to RDFS and OWL

The "Multimedia Content Description" standard, widely known as MPEG-7 aims to be the standard for describing any multimedia content. MPEG-7 standardizes tools or ways to define multimedia Descriptors (Ds), Description Schemes (DSs) and the relationships between them. The descriptors correspond to the data features themselves, generally low-level features such as visual (e.g. texture, camera motion) or audio (e.g. melody), while the description schemes refer to more abstract description entities. These tools as well as their relationships are represented using the Description Definition Language (DDL), the core part of the language. The W3C XML Schema recommendation has been adopted as the most appropriate schema for the MPEG-7 DDL. Note that several extensions (array and matrix datatypes) have been added in order to satisfy specific MPEG-7 requirements.

The set of MPEG-7 XML Schemas define 1182 elements, 417 attributes and 377 complex types which is usually seen as a difficulty when managing MPEG-7 descriptions. Moreover, several works have already pointed out the lack of formal semantics of the standard that could extend the traditionnal text descriptions into machine understandable ones. These attempts that aim to bridge the gap between the multimedia community and the Semantic Web are detailed below.

MPEG-7 Upper MDS Ontology by Hunter et al.

Link: http://maenad.dstc.edu.au/slittle/mpeg7.owl

Summary: Chronologically the first one, this MPEG-7 ontology was firstly developped in RDFS [1], then converted into DAML+OIL, and is now available in OWL. This is an OWL Full ontology (note: execpt for the corrections of three small mistakes inside the OWL file. The &xsd;nil should be replace by &rdf;nil, otherwise it is not OWL valid).

The ontology covers the upper part of the Multimedia Description Scheme (MDS) part of the MPEG-7 standard. It consists in about 60 classes and 40 properties.

References:

MPEG-7 MDS Ontology by Tsinaraki et al.

Link: http://elikonas.ced.tuc.gr/ontologies/av_semantics.zip.

Summary: Starting from the previous ontology, this MPEG-7 ontology covers the full Multimedia Description Scheme (MDS) part of the MPEG-7 standard. It contains 420 classes and 175 properties. This is an OWL DL ontology.

References:

MPEG-7 Ontology by DMAG

Link: http://dmag.upf.edu/ontologies/mpeg7ontos/.

Summary: This MPEG-7 ontology has been produced fully automatically from the MPEG-7 standard in order to give it a formal semantics. For such a purpose, a generic mapping XSD2OWL has been implemented. The definitions of the XML Schema types and elements of the ISO standard have been converted into OWL definitions according to the table given in [3]. This ontology could then serve as a top ontology thus easing the integration of other more specific ontologies such as MusicBrainz. The authors have also proposed to transform automatically the XML data (instances of MPEG-7) into RDF triples (instances of this top ontology).

This ontology aims to cover the whole standard and it thus the most complete one (with respect to the previous mentioned). It contains finally 2372 classes and 975 properties. This is an OWL Full ontology since it employs the rdf:Property construct to cope with the fact that there are properties that have both datatype and object type ranges.

References:

  • [3] Semantic Integration and Retrieval of Multimedia Metadata: R. Garcia and O. Celma. In Proc. of the 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2005) to be held with ISWC 2005, Galway, Ireland, 7 November 2005.

INA Ontology

Link: store this ontology on CWI for ease of reference?

Summary: This ontology is not really an MPEG-7 ontology since it does not cover the whole standard. It is rather a core audio-visual ontology inspired by several terminologies, either standardized (like MPEG-7 and TV Anytime) or still under development (ProgramGuideML). Furthermore, this ontology benefits from the practices of the French INA institute, the English BBC and the Italian RAI channels, which have also developed a complete terminology for describing radio and TV programs.

This core ontology contains currently 1100 classes and 220 properties and it is represented in OWL Full

References:

  • [4]Designing and Using an Audio-Visual Description Core Ontology: A. Isaac and R. Troncy. In Workshop on Core Ontologies in Ontology Engineering held in conjunction with the 14th International Conference on Knowledge Engineering and Knowledge Management (EKAW'04), Whittlebury Hall, Northamptonshire, UK, 8 October 2004.
  • [5] Integrating Structure and Semantics into Audio-visual Documents: R. Troncy. In Proc. of the 2nd International Semantic Web Conference (ISWC'03), LNCS 2870, pages 566-581, Sanibel Island, Florida, USA, 21-23 October 2003.

Visual Ontologies

The MPEG-7 standard is divided into several parts reflecting the various media one can find in multimedia content. This section focus on various attempts to design ontologies that correspond to the visual part of the standard.

2.1 - aceMedia Visual Descriptor Ontology

Link:http://www.acemedia.org/aceMedia/reference/resource/index.html, the current version is 9.0.

Summary: The Visual Descriptor Ontology (VDO) developed within the aceMedia project for semantic multimedia content analysis and reasoning, contains representations of MPEG-7 visual descriptors and models Concepts and Properties that describe visual characteristics of objects. By the term descriptor we mean a specific representation of a visual feature (color, shape, texture etc) that defines the syntax and the semantics of a specific aspect of the feature. For example, the dominant color descriptor specifies among others, the number and value of dominant colors that are present in a region of interest and the percentage of pixels that each associated color value has. Although the construction of the VDO is tightly coupled with the specification of the MPEG-7 Visual Part, several modifications were carried out in order to adapt to the XML Schema provided by MPEG-7 to an ontology and the data type representations available in RDF Schema

References:

2.2 - mindswap Image Region Ontology

Link: http://www.mindswap.org/2005/owl/digital-media.

Summary:

References:

  • [7] A Flexible Approach for Managing Digital Images on the Semantic Web: C. Halaschek-Wiener, A. Schain, J. Golbeck, M. Grove, B. Parsia and J. Hendler. In Proc. of the 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2005) to be held with ISWC 2005, Galway, Ireland, 7 November 2005.

2.3 - Hollink Visual Ontology

Link: http://www.cs.vu.nl/~laurah/VO/visualWordnetschema2a.rdfs.

Summary:

References:

  • [8] Building a Visual Ontology for Video Retrieval: L. Hollink, M. Worring and G. Schreiber. In Proc. of the ACM Multimedia, Singapore, November 2005.

4. Tools

TO BE DONE: Categorisation of important tools [by Nikolaos]:

      1) Type of content: Jpeg Images,Video etc

      2) Type of metadata: Descriptive, administrative, structural etc

      3) Format of metadata: OWL, RDF

      4) Annotation Level: Extraction of visual characteristics and

      association with domain ontology concepts-operation using

      ontologies.

      5) Operation mode: plug-in, stand-alone

      6) Open source: YES, NO

      Suggestions by Jane:

      7) Collaborative or individual

      8) Granularity - file-based or segment-based (and sub-categories of types of segmentation)

      9) Threaded or unthreaded (By threaded I mean the ability to

      respond or add to a previous annotation and to stagger/structure

      the presentation of annotations to reflect this.)

      10) Access controlled or open access

    
  • flickr2rdf This Web-based service uses Flickr API to extract metadata from Flickr's photo repository, and generates an RDF description.Flickr is an online photo management and sharing application  in which users can upload their photos and also annotate them (http://www.flickr.com/).The flick2rdf Web-based service converts mainly the tags of a  flickr image metadata  to FOAF (Friend of a Friend) RDF by Masahide Kanzaki.

 

 

Type of content

Images

Type of metadata

Administrative

Format of metadata

FOF RDF

Annotation Level

Low

Operation mode

Web Based

Open source

No/Flickr API

Collaborative or Individual

Individual

Granularity

Segment based

Threaded or unthreaded

Unthreaded

Access controlled or open access

Open access

 

  • jpegRDF Open source java tool by Norman Walsh. JpegRDF reads and manipulates RDF metadata stored in the comment section of JPEG images. It can extract, query, and augment the data. Manipulating JPEG images with jpegrdf does not modify the actual image data or any other sections of the file.Ending JpegRDF can also be used to convert EXIF to RDF.

 

Type of content

Images

Type of metadata

Administrative/Structural

Format of metadata

RDF

Annotation Level

Low

Operation mode

Stand alone

Open source

Yes

Collaborative or Individual

Individual

Granularity

File based

Threaded or unthreaded

Unthreaded

Access controlled or open access

Open access

 

  • M-OntoMat-Annotizer (M stands for Multimedia) is a user-friendly tool developed inside the aceMedia project that allows the semantic annotation of images and videos for multimedia analysis and retrieval. It is an extension of the CREAM (CREAting Metadata for the Semantic Web) framework and its reference implementation, OntoMat-Annotizer. The Visual Descriptor Extraction Tool (VDE)  was developed as a plug-in to OntoMat-Annotizer and is the core component for extending its capabilities and supporting the initialization and linking of RDF(S) domain ontologies with low-level MPEG-7 visual descriptors.

 

Type of content

Images and Videos

Type of metadata

All

Format of metadata

RDF

Annotation Level

High

Operation mode

Stand alone

Open source

No

Collaborative or Individual

Collaborative

Granularity

Segment based

Threaded or unthreaded

Threaded

Access controlled or open access

Open access

 

  • PHP JPEG Metadata Toolkit The PHP JPEG Metadata Toolkit is a library of functions which allows manipulation of many types of metadata that reside in a JPEG image file. It's main advantages are that it has been tested with over 450 popular digital cameras, it provides access to lots of metadata for which php has no built in support, it works with many files that have corrupted metadata and  it can also work with PHP4 and does not require the enable exif extension.

 

Type of content

Images

Type of metadata

Administrative/Structural

Format of metadata

RDF

Annotation Level

Low

Operation mode

Stand alone

Open source

Yes

Collaborative or Individual

Individual

Granularity

Fie based

Threaded or unthreaded

Unthreaded

Access controlled or open access

Open access

 

  • PhotoStuff. This image annotation tool for the Semantic Web allows to annotate images and contents of specific regions in images according to several OWL ontologies of any domain. Moreover, the metadata embedded inside the JPEG files are converted into RDF. The annotations can then be published and shared on the Web.

 

Type of content

Images

Type of metadata

All

Format of metadata

RDF

Annotation Level

Low

Operation mode

Stand alone

Open source

Yes

Collaborative or Individual

Individual

Granularity

Segment based

Threaded or unthreaded

Threaded

Access controlled or open access

Open access

 

  • SWAD The tool is written in Javascript and uses RESTful web services to access remote information. It is designed to be a quick and easy means of creating structured information about images, including who or what is depicted in the image; where and when it was created; creator and licensing information. The aim is to create and enable the reuse of alternative formats for both text and images for use in an accessibility context, although the potential application is much wider.

 

Type of content

Images

Type of metadata

Administrative

Format of metadata

RDF

Annotation Level

Low

Operation mode

Web Based

Open source

NO

Collaborative or Individual

Individual

Granularity

File based

Threaded or unthreaded

Unthreaded

Access controlled or open access

Open access

  • SCHEMA  This tool integrates a non-normative part of the MPEG-7 standard and the MPEG-7 XM software, for extracting, coding and storing in the database standardized descriptors based on the output of the analysis modules. The system can also support high level (semantic) descriptors and the integration of visual media indexing and retrieval with other modalities (like text and audio based indexing and retrieval).
  • Vannotea - a jabber-based system for real-time collaborative indexing, annotation and discussion of high quality (MPEG-1, MPEG-2, MPEG-4, JPEG, TIFF, JPEG-2000) images and video. http://metadata.net/filmed/

 

  • Rules-By-Example - a graphical user interface that uses examples to interactively define rules for inferring high level semantic descriptions of image regions from combinations of low-level automatically extracted features http://maenad.dstc.edu.au/papers/2004/iswc2004-rbe.pdf

5. Examples of image annotations on the Semantic Web

TO BE DONE: Rewrite this section in terms of solutions to the use cases based on the vocabularies and tools described above. Template proposed by Raphael and Jacco:

    a) how to localize some parts of target media content

    b) how to characterize the annotation link between the annotation

and the media (distinction between work and representation, à la VRA)

    c) how to distinguish the domain specific part and the multimedia

part of the annotation => different ontologies

    d) which annotation tools should be used for which purpose

    

[the work should by primarily done by use case owners, helped by others]

  • Photo annotation and social networking
    • Introduction & background reading (see Easy Image Annotation for the Semantic Web, ILRT Tech report)
    • FOAF co-depiction "Co-depiction is simply the state of being depicted in the same picture as someone else. We're cataloguing this using FOAF RDF documents, sharing and collecting these in the Web, as a way of documenting in a visual way some connections between people."
    • w3photo "envisioins a royalty-free archive of conference pictures from WWW1 to Today -- searchable by the Semantic Web and ready for your tools". It uses various vocabulary, including Dublin Core, FOAF, CYC, Creative Commons, FotoNotes etc.
    • CONFOTO "is an experimental sharing and annotation service for conference photos. It utilizes common RDF vocabularies (dc, foaf, rev, cc, ical, w3photo) to combine simple tagging with rich annotations (e.g. depicted persons, related events, ratings). RDF data is accessible via SPARQL, URIQA, or a link at the bottom of each page."
    • FotoNotes "The goal of the Fotonotes specification is to make it significantly easier for individuals and groups to share meaningful information about (a) what is visually depicted within the photograph and (b) what is contextually (and/or personally) significant about what is (or is not) visually represented."
  • Other image annotation projects
  • Combining RDF and MPEG7
    • Introduction and background reading (IEEE Multimedia papers, Part I and Part II)
    • Jane Hunter et al on annotation of fusion cell images (see SWWS 2001 paper, ISWC04 paper, etc)
    • Troncy et al on combining XML and RDF for audio visual archiving within INA (see ISWC2003 paper) (note: AV-annotation is also relevant for images)
    • Chrisa Tsinaraki et al on MPEG-7 and OWL (Coupling OWL with MPEG-7 and TV-Anytime for Domain-specific Multimedia Information Integration and Retrieval, see RIAO 2004 paper)
  • Using RDF for describing visual resources in the art domain
    • Laura Hollink et al on spatial semantics, also using WordNet, Sumo, VRA, AAT etc in RDF (ISWC 04 workshop paper,K
    • -CAP 2003 paper

)

6. Other (non-RDF) Relevant Work

TO BE DONE: Short description and categorisation of important relevant work

7. Relevant Projects and Events

TO BE DONE: Short description and categorisation of important projects and events

8. Acknowledgments

  Thanks to ...

Appendix A. Informative References

[Hunter]

[Stamou05]

[Troncy2003]

[Ossenbruggen04]

[Ossenbruggen05]