Expressing complex regions with media fragments - use cases + possible solution

Dear all,

we are currently working on media annotation tools for the European digital library (http://www.europeana.eu), which will allow users to contribute their knowledge to digital items. So far we developed four different tools: one for the annotation of images, one for historic maps (which are a special kind of images), one for video, and one for audio annotation. Demos and screencasts are available at: http://dme.arcs.ac.at/annotation/

In all these tools - except in the audio tool of course - we allow the user to define spatial regions in a media object, which are the segments an annotation is actually about. The possible segment shapes range from simple rectangles, over ellipses to polygons within a single media object. Allowing complex segments was a central requirement coming from the users.

We also want to exchange the created annotations as raw data and have chosen the linked data approach to do that. The question is now how to represent annotation data and the targeted segments within a media object as RDF. So far, the Europeana clients follow the Annotea standard (http://www.w3.org/2001/Annotea/) and represent information about the annotated segment in the fragment identifier part of the media object's URI. We are using the syntax defined by the MPEG-21 standard and introduced our own syntax for complex segments.

For interoperability purposes we would like to implement the W3C Media Fragments specification for our addressing our segments. But at the moment we have the problem that the syntax defined for the spatial dimension is insufficient for our use cases.

Therefore I would like to contribute two use-case scenarios and propose a possible technical solution. It would be great, if the specification could support our use cases in foreseeable time.

Regards,
Bernhard


== Use Cases ==

Use Case: Annotating Media Fragments

- Scenario 1: Spatial Annotation of Historic Maps (Images) with complex region-shapes

Rainer annotates a region in an online historical map using a map annotation tool. He draws a polygon around the geographical area he wants to address with his annotation and starts writing a note about this specific region on the map. The system exposes his annotation as an RDF document, where the annotated map region is identified by a media fragment URI.

- Scenario 2: Spatial and Temporal Annotation of Videos with complex region-shapes

Bernhard selects a video sequence (start- & end-point) in an online video and creates a new annotation for that sequence. He draws an ellipse around a specific region in a frame in order to identify the spatial dimension of his annotation. Then he writes a note for this region. The system exposes his annotation as an RDF document, where the annotated map region is identified by a media fragment URI.


== Possible Technical Solution ==

Since it is hardly possible to address all possible segment shapes in a fragment identification specification, we propose to introduce a new fragment key/value pair for the spatial dimension, which enables fragment identification by reference. The key could be "ptr", "ref" or something similar and the value a URI. The URI points to a resource, which provides further information about the properties of the spatial region/segment.

For example:

http://www.example.com/map1.jpg#ref=http://www.example.com/region/1 addresses a complex segment (polygon) in a map (image)

http://www.example.com/video1.avi#t=10,20&ref=http://www.example.com/region/2 addresses a complex segment (ellipse) within a video sequence

We propose to leave the interpretation of by-reference fragments to the client. In our annotation use cases this information will typically be delivered as part of the annotation RDF document and the fragment nodes (http://www.example.com/region/1, http://www.example.com/region/2) will have types (e.g., xyz:SVGFragment, xyz:MPEG7Fragment, etc.) assigned that indicate how to correctly interpret the information. If clients do not understand the used fragment identification type, they can still fallback and at least display the annotation for the full media object.






______________________________________________________
Research Group Multimedia Information Systems
Department of Distributed and Multimedia Systems
Faculty of Computer Science
University of Vienna

Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria
Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649
E-Mail: bernhard.haslhofer@univie.ac.at
WWW: http://www.cs.univie.ac.at/bernhard.haslhofer

Received on Tuesday, 7 September 2010 15:51:38 UTC