FPWD: EMMA: Extensible MultiModal Annotation markup language Version 2.0

EMMA: Extensible MultiModal Annotation markup language Version 2.0

http://www.w3.org/TR/2015/WD-emma20-20150908/

Abstract

The W3C Multimodal Interaction Working Group aims to develop specifications to enable access to the Web using multimodal interaction. This document is part of a set of specifications for multimodal systems, and provides details of an XML markup language for containing and annotating the interpretation of user input and production of system output. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers. Examples of stages in the production of a system output, are creation of a semantic representation, an assignment of that represntation to a particular modality or modalities, and a surface string for realization by, for example, a text-to-speech engine. The production of the system's output is expected to be generated by output production processes, such as a dialog manager, multimodal presentation planner, content planner, and other types of processors such as surface generation.

Status of the Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the 8 September 2015 First Public Working Draft of "EMMA: Extensible MultiModal Annotation markup language Version 2.0". It has been produced by the Multimodal Interaction Working Group, which is part of the Multimodal Interaction Activity.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This specification describes markup for representing interpretations of user input (speech, keystrokes, pen input etc.) and productions of system output together with annotations for confidence scores, timestamps, medium etc., and forms part of the proposals for the W3C Multimodal Interaction Framework.

The EMMA: Extensible Multimodal Annotation 1.0 specification was published as a W3C Recommendation in February 2009. Since then there have been numerous implementations of the standard and extensive feedback has come in regarding desired new features and clarifications requested for existing features. The W3C Multimodal Interaction Working Group examined a range of different use cases for extensions of the EMMA specification and published a W3C Note on Use Cases for Possible Future EMMA Features [EMMA Use Cases]. In this working draft of EMMA 2.0, we have developed a set of new features based on feedback from implementers and have also added clarification text in a number of places throughout the specification. The new features include: support for adding human annotations (emma:annotation, emma:annotated-tokens), support for inline specification of process parameters (emma:parameters, emma:parameter, emma:parameter-ref), support for specification of models used in processing beyond grammars (emma:process-mdel, emma:process-model-ref), extensions to emma:grammar to enable inline specification of grammars, a new mechanism for indicating which grammars are active (emma:grammar-active, emma:active), support for non-XML semantic payloads (emma:result-format), support for multiple emma:info elements and reference to the emma:info relevant to an interpretation (emma:info-ref), and a new attribute to complement the emma:medium and emma:mode attributes that enables specification of the modality used to express an input (emma:expressed-through). In addition we have extended the specification to handle the production of system output, by adding the new element, emma:output and added a series of annotations enabling the use of EMMA for incremental results (Section 4.2.24).

Not addressed in this draft, but planned for a later Working Draft of EMMA 2.0, is a JSON serialization of EMMA documents for use in contexts were JSON is better suited than XML for representing user inputs and system outputs.

Comments are welcome on www-multimodal@w3.org (archive). See W3C mailing list and archive usage guidelines.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

The sections in the main body of this document are normative unless otherwise specified. The appendices in this document are informative unless otherwise indicated explicitly.

This document is governed by the 1 September 2015 W3C Process Document.Changes from EMMA 1.1

This working draft also adds a new emma:location element for specification of the location of the device or sensor which captured the input. The ref attribute was added to a number of elements allowing for shorter EMMA documents which use URIs to point to content stored outside of the document: emma:one-of, emma:sequence, emma:group, emma:info, emma:parameters, emma:lattice. A new attribute emma:partial-content is introduced which indicates whether the content in an element with ref, is the full content or whether it is partial and more can be retrieved by following the URI in ref. The emma:emma element is extended with doc-ref and prev-doc attributes that indicate where the document can be retrieved from and where the previous document in a sequence of inputs can be retrieved from. The application of emma:lattice is also extended so that an EMMA document can contain both a N-best and a lattice side-by-side. A new Section 3.3 includes an initial proposal for the extension of EMMA to output and the new elemet emma:output. A new Section 4.2.24 describes new attributes that extend EMMA so that it support incremental results.

A diff-marked version from EMMA 1.1 is available for comparison purposes. Also changes from EMMA 1.0 can be found in Appendix F.

Received on Tuesday, 8 September 2015 13:44:31 UTC