InkXML Specification

Version:
2002-08-16
Authors:
Yi-Min Chee, Gregory F. Russell, Zon-Yin Shae, Jayashree Subrahmonia, Thomas Zimmerman - IBM
Jim A. Larson, Jiming Sun - Intel
Giovanni Seni - Motorola
Lambert Schomaker - International Unipen Foundation
Carol Larson - Technical Editor

Copyright ©2002 IBM, Intel, Motorola, International Unipen Foundation. All Rights Reserved.

Abstract

An often overlooked means of input, the pen can be used for handwriting, gestures, drawings, and specific notations for mathematics, music, chemistry and other fields. This specification defines InkXML - a markup language for the exchange of virtual ink, conveying such information as the kind of pen, the color of the ink and the nature of the medium, the pressure applied to the pen, its position and speed. InkXML can be used to exchange virtual ink among devices, such as handhelds, laptops, desktops, and servers. InkXML is intended to provide the ink component of Web-based multimodal applications.

Table of Contents

1. Introduction
1.1 Motivation
1.2 General Requirements
1.3 Role of Ink Standards
1.4 Benefits
1.5 Approach
2. High-level Architecture
2.1 Elements
2.2 InkXML Exchange Modes
2.3 Layers of an Ink-enabled System
3. Primitive Elements
3.1 Device Level
3.2 ScreenContext
3.3 Encoding and Traces
3.4 Events
3.5 Chunks
4. Illustrative Application-specific Elements
4.1 Document Storage and Retrieval
4.2 Command and Control
4.3 Forms Processing
4.4 Handwriting Recognition
4.5 Ink Messaging
4.6 Electronic Signatures and Authentication
4.7 Content Representation/Retrieval
5. Binary Ink (Compression)
5.1 Background
5.2 Compression Approach
6. Relationship to Other Standards
6.1 MPEG-7
6.2 SVG
6.3 JOT
6.4 UNIPEN
6.5 XForms
6.6 SMIL
6.7 VML
6.8 ITU T.150
6.9 Recognition Grammars
7. Appendices
Appendix A Primitive Schema
Appendix B Application-specific Schema
Appendix C Glossary

Introduction

Contents:
1.1 Motivation
1.2 General Requirements
1.3 Role of Ink Standards
1.4 Benefits
1.5 Approach
InkXML's primary goal is to bring the full power of Web development and content delivery to ink applications. It enables the exchange of virtual ink among devices, such as handhelds, laptops, desktops, and servers; as well as, support ink-related services which include ink capture, compression and decompression, ink rendering, ink gesture and handwriting recognition, ink presentation, user verification and identification, and ink searching and retrieval. InkXML will provide the ink component of Web-based multimodal applications.

1.1 Motivation

Information technology has significantly changed the way people record information. Dramatically more people are now keyboard literate and use computers to record a wide variety of information formerly written by hand on paper. While there have been many speculations that handwriting input to computer applications could significantly expand the use of computers to other domains and to users who will not take the time to become comfortable with keyboards, no movement has been made in this direction.

While there is still extensive use of handwriting in many domains; in general, handwriting has not evolved in any widespread way to digital form. Why, then, is there a significant need for a digital ink standard?

1.1.1 Growth in mobile computing

Two broad domains-mobile devices and expansion into new markets-are expected to be the principal drivers for the increased use of digital ink, both as an input means and as a representation.

One significant trend influencing the exploration of digital ink is the rapidly growing demand for mobile computing. Small devices requiring some input mechanism have proliferated over the last five years. Also, many companies are exploring new interactive applications for existing devices, such as cell phones and pagers. Many of these applications will require a means for the user to input information. These small devices and associated applications differ from traditional computing platforms because they are too small to accommodate a keyboard. Other input modalities, including pen input, are being used increasingly as a substitute for keyboards.

In the last five years, another trend has been the increased use of computing platforms for informal communications. Ten years ago, most writing on computers was destined for a formal report or semi-formal e-mail. Social expectations generally demanded that they be typed. With the expanding use of the Internet for informal communications, ranging from e-mails about groceries to online chatting among teens, there is now a large amount of communications traffic leaving behind former social requirements on form.

At the same time, when social expectations on form are becoming more relaxed, other social expectations are forcing people away from intrusive technologies. I.T. devices are becoming more mobile and ubiquitous, pushing into meetings and other settings where social considerations are important. In these settings, the use of keyboards often is socially unacceptable. Potential substitutes, such as voice and speech recognition, are even more impractical than keyboards. Additionally, the very low cognitive load required for taking handwritten notes is attractive in settings where the user's attention must be directed elsewhere.

These two trends, toward smaller devices and more informal use, are likely to drive the increased use of pen input and, in many cases, direct ink representations.

Another completely independent area of development that may fuel additional digital ink applications is the expansion of computer use in countries where ideographic writing is prevalent. Markets, such as Asia, are growing rapidly so there is demand for direct handwritten input of ideographic characters.

Other applications may become more popular as the technology becomes more ubiquitous. Currently, the increasing demand for collaboration with widely distributed personnel, such as distance education, motivates shared whiteboard. Some applications, including direct capture of freeform drawings, illustrations, and sketches, are inconvenient or impossible to use with a traditional keyboard and mouse interface. Even applications, such as markup and annotation, are difficult to use without a stylus.

1.1.2 Why another standard?

Numerous standards already exist that are closely related to or could be used to represent digital ink. Most notably, ITU T-150, UNIPEN, and Jot are targeted directly at the representation of digitized handwriting. Additionally, SVG and VML are designed for vector representations of data and are applicable for representing and rendering handwriting-like data. Also, ink could be rendered into many image representations, including jpeg, bmp, and GIF.

So why is another standard necessary?

Several factors are important for the digital ink standard. The standard should:

Currently, no existing standard addresses these concerns. ITU T.150 and UNIPEN probably come the closest to meeting the specific needs of ink applications, and each is sufficient for some class of applications. ITU T.150 is excellent for low bandwidth communications and storage of simple medium resolution handwriting. However, ITU T.150 is limited to 10-bit coordinate representations, so it does not support sufficient annotation and is not extensible to handle additional channels or higher resolution data. UNIPEN is very focused on handwriting recognition requirements with features to support labeling of ink data, but UNIPEN is not flexible enough to meet the requirements of other applications. Neither has a description language rich enough for specification of capture device characteristics.

As proposed by the (now defunct) Slate Corporation, Jot is a proprietary format that avoids any abstract characterization of ink.

The Scalable Vector Graphics standard (SVG) is an extremely rich language for describing vector images. This specification is easy to use when describing a page of ink data and how it should be rendered. However, it does not have any means for describing device characteristics and appears to be fairly heavyweight, both in viewer implementation and in data bandwidth.

The Vector Markup Language (VML) supported by some Microsoft products is similar to SVG and shares similar limitations. VML is not a standard.

Standards, such as SMIL and MPEG-7, do not actually address data representation. Instead, they provide for high-level descriptions of presentation scripts and searchable semantic features, respectively.

Image formats, such as jpeg and GIF, are relatively inefficient representations for ink from a size perspective. More importantly, image formats do not preserve any vector information required for many handwriting applications.

There is no existing standard with all the capabilities important for a digital ink standard. Nor does there appear to be any standard that could be readily extended to become a comprehensive digital ink standard. However, several standards should be considered while developing a new digital ink standard because they are complimentary. SVG, MPEG-7, and SMIL are relevant directly to digital ink. The new standard should address how these existing and emerging standards should be used in relation to the new standard. Additionally, ITU T.150 and UNIPEN, as well as other proprietary representations, should be considered from the point of view of feature coverage and transcription in the new representation.

1.2 General Requirements

There are two types of general requirements-functional and pragmatic. Functional requirements enumerate functions required by ink applications. Pragmatic requirements make InkXML usable and efficient for developing ink applications.

1.2.1 Functional Requirements

The following tables introduce classes of attributes that characterize electronic ink, and then presents a variety of applications and their functional requirements in terms of these attributes. Applications have been grouped into six broad categories: Command and Control, Handwriting Recognition, Authentication, Communication, Multimodal, and Document Management. Each broad category contains more specific application categories.

Attributes have been organized into seven different levels of abstraction, from low to high: Device Level, Point Level, Trace Level, Screen Context Level, Derived Level, Packing Level, and Meta Level. Device level information describes the digitizer. Point level information describes an individual ink point. Trace level information describes a contiguous pen trace. The CaputureUI level information describes the user interface when the ink was collected. Derived level information corresponds to features that can be derived from other lower-level features. Packing level information relates to transmission and access issues. Finally, meta-level information corresponds to data that typically conveys some semantics about the ink (for example, who wrote the ink).
 

Attribute classes
Device 
x- and y-resolution 
z-resolution 
pressure resolution 
tilt resolution 
sampling rate 
device-id 
Point 
x- and y-coordinate 
z-coordinate (altitude) 
p-coordinate (pressure) 
tilt along x and y axes 
Trace 
color 
brush size and shape 
inking mode 
Screen Context
input region bounding box 
input region color 
writing guideline 
ink bounding box (relative) 
Derived 
ink bounding box (absolute) 
velocity 
acceleration 
direction 
curvature 
Packing 
compression 
integrity checking 
streaming data 
random access 
Meta 
embedded text 
embedded shapes 
writing style 
writer info 
"truth" label 
segmentation hierarchy 
interpretation result 
grammar 
semantic tags 
time tags (e.g. synch) 

The following tables list the attributes needed for various categories of applications:
 

Command and Control
Pointing, Cursor Control 
x- and y-coordinate 
z-coordinate (optional) 
Drawing Pen 
x- and y-coordinate 
z-coordinate (optional) 
pressure 
tilt (along x and y axes) 
color 
brush (size and shape) 
inking mode 
Gesture Analysis 
x- and y-coordinate 
z-coordinate (optional) 
inking mode (optional) 
input region bounding box 
ink bounding box (relative) 
truth label (optional) 

Handwriting Recognition
Isolated Characters 
x- and y-resolution 
sampling rate 
x- and y-coordinate 
z-coordinate (optional) 
brush size and shape (optional) 
input region bounding box 
ink bounding box (relative) 
writer info (optional) 
"truth" label (optional) 
segmentation hierarchy (optional) 
Continuous Script 
x- and y-resolution 
sampling rate 
x- and y-coordinate 
z-coordinate (optional) 
brush size and shape (optional) 
input region bounding box 
writing guideline 
ink bounding box (relative) 
writing style (optional) 
writer info (optional) 
"truth" label (optional) 
segmentation hierarchy (optional) 
grammar 
2D Languages (e.g. Math, Chemistry and Music) 
x- and y-resolution 
sampling rate 
x- and y-coordinate 
z-coordinate (optional) 
brush size and shape (optional) 
input region bounding box 
writing guideline 
ink bounding box (relative) 
writing style (optional) 
writer info (optional) 
"truth" label (optional) 
segmentation hierarchy (optional) 
grammar 

Authentication
Signature Verification 
x- and y-resolution 
z-resolution 
pressure resolution 
tilt resolution 
sampling rate 
x- and y-coordinate 
z-coordinate (altitude) 
p-coordinate (pressure) 
tilt along x and y axes 
Writer/Forgery identification 
(unidentified requirements) 

Communication
Ink Messaging 
x- and y-resolution 
z-resolution (optional) 
pressure resolution 
x- and y-coordinate 
z-coordinate (optional) 
color 
brush size and shape 
input region bounding box 
input region color 
compression (optional) 
integrity checking (optional) 
streaming data (optional) 
embedded text 
embedded shapes 
time tags (optional) 
Electronic Whiteboard 
x- and y-coordinate 
z-coordinate (optional) 
color 
brush size and shape 
inking mode 
compression (optional) 
integrity checking (optional) 
streaming data (optional) 
embedded text 
embedded shapes 
semantic tags 
time tags (optional) 

Multimodal
SMIL 
x- and y-coordinate 
z-coordinate (optional) 
color 
brush size and shape 
compression (optional) 
integrity checking (optional) 
streaming data (optional) 
embedded text 
embedded shapes 
Command & Control 
x- and y-resolution (optional) 
device-id (optional) 
x- and y-coordinate 
z-coordinate (optional) 
inking mode 
input region bounding box 
ink bounding box (relative) 
time tags (optional) 

Document Management
Form Filling 
x- and y-resolution 
sampling rate 
device-id (optional) 
x- and y-coordinate 
z-coordinate (optional) 
brush size and shape (optional) 
inking mode (optional) 
input region bounding box 
writing guideline 
ink bounding box (relative) 
segmentation hierarchy 
interpretation result 
semantic tags 
Note Taking - PIM 
x- and y-resolution (optional) 
sampling rate (optional) 
device-id (optional) 
x- and y-coordinate 
z-coordinate (optional) 
color 
brush size and shape 
inking mode (optional) 
input region bounding box 
ink bounding box (relative) 
embedded text 
embedded shapes 
segmentation hierarchy 
interpretation result (optional) 
semantic tags 
Archiving & Retrieval 
x- and y-resolution (optional) 
z-resolution (optional) 
sampling rate (optional) 
device-id (optional) 
x- and y-coordinate 
z-coordinate (optional) 
color (optional) 
brush size and shape (optional) 
compression (optional) 
integrity checking (optional) 
segmentation hierarchy 
interpretation result (optional) 
semantic tags 

1.2.2 Pragmatic Requirements

InkXML is a language that:

1.3 The Role of Ink Standards

While the market will continue to move ahead, with or without a comprehensive ink standard, there will be many advantages to establishing an ink standard at this point in time. (See Section 6, "Relationship to Other Standards.")

The opportunities for using ink as a communications medium is severely limited by the small number of users who are capable of exchanging ink data. There are a modest number of ink applications, each using a proprietary ink representation. Consequently, the audience for any given ink document that someone may author is exceedingly small, so the motivation for a user to invest in an ink-enabled device or application is limited.

With the establishment of a non-proprietary ink standard, it will be possible for device and application developers to support a common ink representation, in place of or in addition to their own proprietary representation. This would expand the audience for any particular user from the currently installed base of one device or the application to the combined installed base of all devices and applications that have implemented the standard.

1.4 Benefits

The primary benefits of InkXML include:

1.5 Approach

IBM, Intel, and Motorola created this draft of InkXML to determine if the concept for InkXML is feasible. The resulting document, although incomplete, convinced this committee that it is possible to define InkXML to meet the requirements stated above. Also, schemas have been written (See Appendices A and B), and several InkXML specifications have been compiled.

Intel intends to use this version of InkXML in an InkChat application to demonstrate the utility of InkXML. All of the code for this application will be made available via a royalty-free license to any ink application developers.

Currently, IBM and Motorola are using InkXML internally.

The following steps for the design and standardization of the next version of InkXML are recommended:

  1. IBM, Intel, and Motorola submit this version of InkXML to an appropriate standards body.
  2. The standards body establishes a working group of interested ink application developers.
  3. The working group establishes requirements for InkXML.
  4. The working group establishes an InkXML specification and follows the process to publish it as a standard specification.

2. High-level Architecture

Contents:
2.1 Elements
2.2 InkXML Exchange Modes
2.3 Layers of an Ink-enabled System
InkXML provides a framework for sharing digital ink data between applications. InkXML applications fall into two broad categories:
  1. Applications that create and/or parse InkXML files share InkXML files.
     


     

  2. When digital ink is shared between applications that create/parse InkXML files and applications that create/parse proprietary ink files, converters from InkXML to proprietary ink file formats (and vice versa) must be written for exchanging digital ink data.
     

2.1 Elements

The InkXML file format consists of two types of elements:
  1. Primitive elements. Primitive elements are a set of rudimentary elements sufficient for all basic ink applications. Few semantics are attached to primitive elements. Primitive element parsers must be able to parse any InkXML document, ignoring any element that it does not recognize or understand. Examples of primitive elements are device characteristics, screen context characteristics, pen traces, and events.
  2. Application-specific elements. Application-specific elements provide a higher-level description of the digital ink captured in the primitive elements. Application-specific elements reference the primitive elements. For example, a segment tag may be an application-specific element to indicate a group of traces that have a semantic meaning. In a document management application, a segment tag is useful to indicate groups of traces belonging to a particular page. In a form processing application, a segment tag indicates a group of traces belonging to a field in a particular form. The application-specific elements are ignored by an InkXML parser if they are unknown or unrecognized by the parser.

2.2 InkXML Exchange Modes

Applications using InkXML fall into the following broad categories:

2.3 Layers of an Ink-enabled System

The following is an example of the layers of an ink-enabled system.

The structure of an ink-enabled system depends on constraints such as memory, processing, communication link speed, and application requirements. These constraints determine which primitive elements are created at each stage of the system.

In some instances, the pen hardware creates some of the InkXML primitive elements. However, if the pen hardware has small amounts of memory, it may present ink information in a proprietary serial format to the driver.

The driver may present primitive InkXML elements to the programming API (Application Programming Interface). However, if the driver also has limited memory, there might not be enough memory to create some of the primitive elements, such as chunks. In this case, other layers in the system might add these at a later time. The driver typically does not create any events other than time stamps.

The event handler and the programming API (in conjunction with the ink log generator) may add more information to the primitive elements, such as chunks and events.

The front-end application and SDK (Software Development Kit) library make further modifications, such as grouping ink traces into chunks or adding additional event tags.

3. Primitive Elements

Primitive elements are a set of rudimentary elements sufficient for all basic ink applications. These elements provide references for the application-specific elements. There are few semantics attached to primitive elements. Primitive element parsers must be able to parse any InkXML document, ignoring any element that it does not understand. The defined primitive elements include:
  1. Device specifications: These specifications describe the characteristics of devices that capture ink.
  2. ScreenContext specifications: ScreenContext describes the input conditions when digital ink data is written and allows applications to reconstruct the basic input area under which the digital ink was captured.
  3. Trace specifications: A trace is the trajectory of the pen as the user writes digital ink.
  4. Events specifications: Events are mechanisms to indicate an action was performed.
  5. Chunk specifications: Chunks are low-level mechanisms for grouping pen traces that do not have an inherent semantic meaning

3.1 Device Level

Device tags describe the characteristics of devices that capture ink. Primitive tags include device, latency, channel-list, channel-description, event-list, and hints. The device tag specifies the following: The channel-list section lists all data channels that the device is capable of reporting. Channels include: In addition, each channel specifies the following (when appropriate): The device block lists all possible events that this device reports (that may appear in the file). Events, such as erase, clear, color change, and line width, are interpretations of device events (for example, buttons and switches) and do not belong in a device description. They belong in the application layer.

An author may make comments next to the button, switch, and other device events to suggest how they might be interpreted by applications. For example, the switch on the end opposite the tip on a pen is typically mapped in the application as an erase event. However, the button should be recorded as a button event and not an erase event.

3.1.1 Error Calculations

This Error Calculations section is informative.

The following are some suggestions for how error estimates might be derived from the basic fidelity information in a spatial channel (x or y): All errors are subject to additional distortion from a signal exceeding the channel bandwidth.

Open Issues

The current Device Specification only addresses conventional digitizing devices with a fixed Cartesian coordinate system. This group recommends that the standards body working group also consider descriptive elements to identify, if not describe, devices which do not have a fixed reference frame or use non-Cartesian coordinate systems.

One additional item should be considered for channel descriptions, specifically for the force channel. It might be desirable to allow a non-linear transfer function to be identified, such as transfer= "log" or transfer= "sigmoid", to allow the encoding to represent a function of the channel value, rather than the direct value. Alternatively, the channel units might specify "log-newtons" or "tanh-dynes".< /p>< /p>

3.2 ScreenContext

Having raw digitizer coordinates only in the ink file is sometimes not sufficient if one is to be able to reconstruct the entire essence of the handwriting or drawing. A ScreenContext data block is intended for describing the basic screen characteristics that allow applications to reconstruct the conditions experienced by a user when entering ink data.

ScreenContext information is also essential to support interactive ink-enabled applications such as instant messaging and teleconferencing, where ink might be collected and display on at least two different devices. For example, consider a game of tic-tac-toe where one player is using a PDA device and the other player is using a tablet-PC. Clearly, the size of the board (the drawing canvas) on each device is going to be different, and the "X"/ "O" marks made with the pen on one device will need to be transformed (scaled) for proper rendering on the other device.

3.2.1 Elements

ScreenContext should contain information about the following basic elements:

3.2.2 Attributes

ScreenContext
 

id A unique identifier for this ScreenContext
device An input device is identified

Canvas
 

id A unique identifier for this canvas
extent The left, top, right, and bottom coordinates of the canvas. Specified as four coordinates, x1, y1, x2, and y2, where (x1,y1) is the top-left corner and (x2,y2) is the lower-right corner.

Mapping
 

id A unique identifier for this mapping
transform The standard 2x3 matrix representation for basic transformations; default is an identity matrix.

View
 

id A unique identifier for this view
bounding_path The sequence of the vertices of a polygon representing the outline of the viewable area in the canvas.

Open Issues

Are traces to be included within a<ScreenContext>...</ScreenContext> block?

This group believes that ScreenContext as a separate (optional) block can be referenced by traces or chunks, using the id field. In this case, a default behavior for determining the scope of a given ScreenContext should be specified.

3.3 Encoding and Traces

A trace records the trajectory of the pen as the user writes digital ink. More specifically, this recording describes a sequence of contiguous points, bounded by pen contact change events (pen-up and pen-down).

The simplest form of encoding specifies the x- and y-coordinates of each sample point. For compactness, it may be desirable to specify absolute coordinates only for the first point in the trace and use delta-x and delta-y values to encode subsequent points. Some devices record acceleration rather than absolute or relative position; some provide additional data that may be encoded in the trace, including z-coordinates or pressure or the state of side switches or buttons.

Within an InkXML file, traces are encoded using two tags. The <trace-format> tag specifies the encoding format for each sample of the recorded traces, while the <trace> tags are used to represent the actual trace data.

3.3.1 <traceFormat>

The <traceFormat> tag describes the format used to encode points within <trace> tags. The <traceFormat> element defines the sequence of channels within the trace elements. There are two sections of channels: the <requiredChannels> and the<optionalChannels>. The order of declaration in the <traceFormat> element determines the order of appearance in the <trace> elements.

For each channel, there is a <channel> element with optional attributes of name = "", type = "", default = "", and wildcard = "" that describe the encoding type (Boolean, decimal, or integer), the default value, and how to interpret the wildcard character. The name attribute is required and specifies which of the channels described in the DeviceSpec (or default channels described in the DeviceSpec section) that this position in the traceFormat corresponds. Other attributes are optional. If omitted, then the default type is decimal, the default value is zero, and the default wildcard interpretation is "lastValue" for required channels and "defaultValue" for optional channels.

A required channel must contain a value for each point. For example, x- and y-coordinates are likely to be required. Some channels may be recorded on an intermittent basis because their state changes infrequently; for example, the state of a pen switch is not likely to change often. In order to prevent the repeated recording of static channel values, these channels can be specified as optional channels. Required channels appear first in the <trace> followed by optional channels, if there are any. The optional channel values may be completely omitted, and a new point started immediately. In this case, it is assumed that all optional channels have been reported with wildcards. If optional channel values are reported, the optional group is preceded by a colon and ended with a semicolon. Optional channels are represented in order between the colon and semicolon. The list may be terminated early with the semicolon, and the unreported optional channels are interpreted with wildcards.

If there is no optionalChannel element, then there are no optional channels. In this case, the colon and semicolon delimiters are still allowed.

3.3.2 <trace>

The <trace> tag is used to record the data captured by the digitizer. It contains a sequence of points encoded according to the specification given by the <trace-format> tag.

Required channels may be reported as explicit values, differences, or second differences. The default is explicit. There are prefix symbols that indicate the interpretation. The exclamation point indicates an explicit value, a single quote indicates a single difference, and a double quote prefix indicates a second difference. If there is no prefix, then the channel value is interpreted as explicit, difference, or second difference based on the last prefix for the channel.

A second difference encoding must be preceded by a single difference representation; which, in turn, must be preceded with an explicit encoding.

Optional channels are always encoded explicitly, and prefixes are not allowed.

Both required and optional channels may be encoded with a wildcard character *.

The wildcard character means either that the value of the channel is the default value, which is the previous channel value (if explicit), or the channel continues integrating the previous velocity and acceleration values.

Booleans are encoded as "T" or "F".

With this trace example, assume the traceFormat is:

<traceFormat>
  <requiredChannels>
     <channel name="X" type="decimal" wildcard="lastValue"/>
     <channel name="Y" type="decimal" wildcard="lastValue"/>
  </requiredChannels>
  <optionalChannels>
     <channel name="S1" type="boolean" default="F" wildcard="lastValue"/>
     <channel name="S2" type="boolean" default="F" wildcard="lastValue"/>
  </optionalChannels>
</traceFormat>
Then, this trace:
<trace id = "4525BCD">
1125 18432'23'43"7"-8 3-5+7  -3+6+2+6 8+3+6:T;+2+4:*T;+3+6+3-6:FF;
</trace>
The trace is interpreted as follows:
 

Trace X Y vx vy S1 S2 Comments
1125 18432 1125 18432 ? ? F F //switch default values
'23'43 1148 18475 23 43 F F //velocity values
"7"-8 1178 18510 30 35 F F //acceleration Values
3-5 1211 18540 33 30 F F //implicit acceleration
//whitespace token sep
+7 -3 1251 18567 40 27 F F //optional whitespace
+6+2 1280 18596 46 29 F F //
+6 8 1317 18633 52 37 F F //space instead of +
+3+6:T; 1360 18676 55 43 T F //an optional value
+2+4:*T; 1407 18723 57 47 T T //wildcard
+3+6 1460 18776 60 53 T T //optional keep last
+3-6:FF; 1507 18823 63 47 F F //optionals

One would not typically see both a "+"and a "space" used as a separator in the same trace or document, but it is legal.

An InkXML generator might also include additional whitespace formatting for clarity. The following trace specification is identical in meaning to the more compact version shown above:

<trace id = "4525BCD">
1125  18432
'23  '43
"7  "-8
3  -5
7  -3
6  2
6  8
3  6  :T;
2  4  :  *T;
3  6
3-6  :F  F;
</trace>
In addition, the alphabetic characters may be used to encode small negative and positive values. These may be substituted anywhere for an integer value between -25 and +25. Using these shorthand codes, the above trace could be encoded:
<trace id="4525BCD">
1125 18432'W'43"G"hCeGcFBFHCF:T;BD:*T;CFCf:FF;
</trace>
Note that the true and false values for the side switches use symbols that are also used to encode numbers. However, they are unambiguous because of their location.

3.3.2.1 Grammar

The following notation is used to represent grammars:
 

Notation Meaning
| The vertical bar means logical "OR".
_ The underbar means explicit whitespace (all other spaces are for legibility only).
[ ] Empty brackets mean optional whitespace.
( ) Empty parentheses mean mandatory whitespace.
(a | b | c) Exactly one of a or b or c must occur.
(a | b | c)+ One or more of the options in ( ) must occur.
(a | b | c)N Exactly N of the options in ( ) must occur.
(a | b | c)+N One to N of the options in ( ) must occur.
[a | b | c] Zero or one of the options in [ ] may occur.
[a | b | c]+ Any number of the options in [ ] may occur.
[a | b | c]+M Zero to M of the options in [] may occur.
Italics are symbols.
Non-italic bold are literals.

The following is a draft grammar for the encoding scheme using the above notation:
 

Grammar Rules Description
digit::= (0..9) Any single digit zero through nine
sign::= [ + | - ] A plus or minus sign
integer::= [sign] (digit)+ Leading zeros OK
decimal::= [sign] (digit)+.[digit]+ Mandatory leading digit, mandatory decimal point, leading zeros OK
code::= (a..y | A..Z | *) Single character code
point::= (requiredPart)[optionalPart]  
requiredPart::= (requiredValue)N Exactly N require Values
optionalPart::= : [optionalValue]M ; Required colon…up to M optionalValues, then a required semicolon
requiredValue::= [ ][qualifier] (value)[ ]  
optionalValue::= [ ](value)[ ]  
qualifier::= ( ! | ' | " ) An exclamation point, single quote, or double quote
value::= (integer | decimal | code)  
token::= (requiredValue | optionalValue | : | ; )  
trace::= <trace ...> [[ ]point[ ]]++ </trace>  

Whitespace is optional before and after "requiredValue" and "optionalValue" tokens (unless required to separate two adjacent positive integer or decimal tokens values without + signs).

3.3.3 Events and Traces

Allowing events to be placed inside of a trace means that searching for events takes longer because the parser must search through all points rather than all traces. One study of English handwriting sampled at 100Hz measures an average of 30 samples per trace, making a search 30 times as slow. The inclusion of events within a trace also adds complexity to the parsing of point data (since it may be interspersed with events) and may have an impact on point data compression schemes.

On the other hand, if traces are not allowed to contain events, an event which takes place in the middle of a trace (for example, if the user pushes a side switch while writing) cannot be recorded in its order of occurrence.

Open Issues

An open issue is the recording of proximity data which may be generated by digitizer hardware. One alternative is to encode proximity data as a "pen-up" trace with the same format as a "pen-down" trace, using an attribute on the <trace> tag. Another is to introduce a separate tag for encoding proximity data. In a layered system, the low-level digitizer may not provide pen contact state (for example, a digitizer which records pressure); in this case, the format should support a generic "trace" type that does not imply either "pen-up" or "pen-down."

Since many sources of digital ink are temporal, many digital ink records will have significant time information. The "current"or "cumulative" time may be expressed in several ways, depending on what is available at the time of capture. The most explicit expression of time is by the use of the startTime attribute tag in any element. This is not an ideal solution and should be considered more carefully by the working group.

3.4 Events

Events are mechanisms to indicate that an action was performed. Examples include a button-press action or tapping a button in a certain region on a device.

3.4.1 Event Specification

Events are self-contained. An event consists of an event identifier, an event type, an optional event value, and an optional timestamp that identifies when the event occurred.

3.4.2 Attributes


id A unique identifier for this event
type Event type (e.g., left-button press, change pen)
value The optional value specific to the event type (e.g., red)
timestamp The optional time at which the event occurred

3.4.3 Layers

An event is self-contained. However, an event might indicate an action that modifies a set of traces and/or chunks. If it does, a post-processor can interpret the event and write out modified primitive elements.

Open Issues

Theh working group should address the issue of allowing events within a trace; alternative solutions include the following:

3.5 Chunks

Chunks are a low-level mechanism for grouping pen traces and are primitive elements whose tags group of one or more temporally sequential traces, such as occurring next to each other in time. Chunks do not have an inherent semantic meaning (character, word, sentence, and paragraph), nor is there a method to define a semantic meaning in a chunk tag. An application cannot infer a semantic meaning to a chunk, which are benign so they can be removed without corrupting the primitive file. Chunk tags can be removed or ignored without changing the meaning of the file/stream.

Chunks are a means to group, access, and reference groups of traces and allow applications to refer to and manipulate a group of traces as a single entity. Chunks facilitate I/O and streaming by marking a group of traces to be references as an entity. Chunks are a building block for semantic groupings and increase the speed of parsers and searches, since every trace inside a group does not need to be examined.

Chunking is the process of creating and inserting chunk tags in the primitive file. Any method, criteria, or algorithm may be used to create chunks. Each chunk tag contains one or more trace events. Chunking can be done at any layer or stage of capture or analysis.

3.5.1 Creating Chunks

There is no standard way to specify the method, criteria, or algorithm used to create chunks. Because comments can be included in a primitive or an application-specific file, a file author or editor adds comments describing the method, criteria, or algorithm used to create chunks. An application-specific file contains segments that refer to one or more chunks and includes comments on how the chunks are created. Some examples of chunking methods include: Each chunk tag has a chunk ID unique to the primitive file, while segments in the application-specific file point to the chunk ID numbers in primitive file. Chunk tags can contain optional attributes, such as the number of traces in the chunk and the number of bytes to the end of the chunk tag. If the number of bytes to the end of the chunk tag value is used; anything that modifies the primitive file, such as adding tags or comments, must recalculate and adjust the value. To facilitate fast searching and parsing, it is a good idea to have an attribute that specifies the number of bytes before the end tag.

3.5.2 Using Coordinate Space

Traces that overlap and are part of the same character or graphic extend the coordinate space, such as the trace of crossing a "t". Traces that overlap, but are not part of the same character or graphic, can reuse the coordinate space. Examples include crossing out a word or overwriting a word. Traces reusing the coordinate space typically do not make sense when rendered together.

Chunking should avoid incorrect rendering. Therefore, traces that reuse the coordinate space should not occur in the same chunk.

3.5.3 Chunk Requirements

Traces within a chunk must be captured temporally sequential (contiguously). The criteria for creating chunks can be anything, including temporal or spatial methods. Chunking requirements include: Events, such as color change and page turn, cannot occur in chunks. But a trace within a chunk may have a different color attribute; however, that attribute does not change the current state of the event. Events in chunks only affect traces belonging to the chunk and not the elements outside the chunk. If an event does affect traces outside the chunk, the current chunk should be terminated and a new one begun. Non-trace events are excluded from chunks so event handlers do not have to look inside chunk to find non-trace events.

If a non-trace event occurs during the streaming of trace events, place the event between chunk tags. This is done by ending the current chunk tag, stating the non-trace event, and beginning a new chunk tag.
 
Attributes

id A unique identifier for this chunk
numTraces The number of traces in a chunk
numBytes The number of bytes to the end of the chunk tag

4. Illustrative Application-specific Elements

Contents:
4.1 Document Storage and Retrieval
4.2 Command and Control
4.3 Forms Processing
4.4 Handwriting Recognition
4.5 Ink Messaging
4.6 Electronic Signatures and Authentication
4.7 Content Representation/Retrieval

This Illustrative Application-specific Elements section is informational.

Application-specific elements provide a higher-level description of the digital ink captured in the primitive elements and reference the primitive elements. For example, a segment tag is an application-specific element to indicate a group of traces that have a semantic meaning. In a document management application, a segment tag might indicate groups of traces belonging to a particular page. In a forms processing application, a segment tag might indicate a group of traces belonging to a field in a particular form. An InkXML parser ignores application-specific elements if they are unknown or unrecognized by the parser.

The goal of this section is to provide examples of how the following commonly used applications reference primitive elements:

4.1 Document Storage and Retrieval

One natural application for digital ink is the recording of handwritten ink documents. Tasks such as note-taking and filling-in-forms can potentially generate large volumes of ink documents, which require convenient and efficient mechanisms for their management, storage, and retrieval. An InkXML document storage and retrieval system contains documents, which use InkXML primitive elements to represent handwritten ink; application-specific tags provide mechanisms for organizing and retrieving those documents.

4.1.1 Storage

In a typical ink document management system, traces are grouped into "pages" of arbitrary size. Pages may be composed of non-contiguous groups of traces. For example, the user might write on Page A, then switch to Page B to write more traces, and return later to Page A to add additional traces to the page. Within a page, traces are typically ordered. The order should be preserved across editing operations, which provide a modification history for the page.

Within a page, traces may be tagged and grouped to form larger semantic units for the purposes of searching and indexing, such as "keyword," "to-do," and "message." The same group of traces may be labeled with multiple tags and applied to overlapping or nested trace groups. For example, within the handwritten ink for a message with the text "Call Jane at 1 p.m.", the word "Jane" may also be tagged as a keyword. XML document structure lends itself to the creation of nested tag structures. Overlapping tags can be handled by introducing separate tags, which reference the same set of traces. As a general rule, if two groupings of common traces have containment semantics, such as a sentence or a paragraph, it would be appropriate to use nested tags. Otherwise, the use of separate tags is preferable.

Pages can also be assigned one or more tags, which apply to all of the traces on the page. Pages are accumulated into ink documents, which may contain pages of different sizes. Pages or traces from many ink documents may be composed arbitrarily to form new ink documents, which contain either referencing links back to the original pages or actual copies of the page data. Ink documents are accumulated into archives, which may be either shared (for example, a single archive containing all the documents for an entire department) or private (for example, Gary's ink documents).

4.1.2 Devices

Digital ink can be captured with a variety of different devices. An ink document management system should support the storage and retrieval of digital ink from multiple devices with different displays and capture characteristics. For example, an ink document written on a large tablet device should be viewable and editable on a small handheld PDA, despite the different capabilities and document presentations provided by the two platforms.

Some devices, such as the CrossPad, SmartPad, and ThinkPad TransNote, create both a paper copy and the digital ink. In such cases, the ink document storage system may restrict the modification of ink documents to the physical modifications that can be performed to the paper copy (if synchronization of the two is important). If the paper contains a background against which the ink is written such as a form, the system must also be able to handle problems of registration such as the alignment of the handwritten ink with the fixed background image.

4.1.3 Retrieval

Users retrieve information from the ink document management system using attributes. These attributes include basic properties such as document date, title, author, and originating device, as well as the manually created tags described above. In addition to manually created tags, background processes such as handwriting recognition or document classification may automatically generate additional trace or page tags for retrieval.

Retrieved documents may be exchanged with other InkXML applications; for example, traces in a signed ink document can be submitted for signature verification, traces can be sent via email or instant messaging, or ink documents can be annotated or marked up with ink, text, or audio.

4.1.4 Specification Tags

Specification tags for document storage and retrieval include document, page, and keyboard.
4.1.4.1 Document
A document consists of an ordered set of "pages" containing ink traces. The document tag specifies the name for the document, its creation date, author(s), and a unique identifier.
4.1.4.2 Page
A page groups traces that are either collected or intended to be displayed together. A page is characterized by its page size, as well as its creation and modification timestamps. In order to facilitate retrieval, a set of topics may be associated with a page.
4.1.4.3 Keyword
A keyword denotes a group of traces with an associated text string, which can be used to search for the ink traces. Keywords have a type, which describes the relationship between the text of the keyword and the ink traces. The text can be a word-by-word transcription of the ink traces, a summary of the handwritten ink, or an arbitrary annotation of the traces. For example, a drawing of a dog might be annotated with the text "Fido" (a generic American name for a dog) or with another name for a dog. The individual or process that created the tag is identified by a creator attribute, and its creation date is also stored. Where applicable, a quality attribute describes the level of confidence in the "accuracy" of the text. Keywords also can be grouped into user-defined classes, such as "name" or "picture," to assist further searching.

4.2 Command and Control

A typical command and control application achieves two tasks. First, it captures the user's pen input (pen gesture data), selects a gesture interpreter based on the operation context, and sends the recorded gesture data to the related interpreter. Second, according to its scheme, this interpreter calls the corresponding recognizer to translate the pen gesture to a command (or command candidates) and executes the events in the commands sequentially by the command and control application.

In the context of pen command and control, InkXML can be used to represent both the input of a gesture or command and, perhaps, prototypes for gestures. Details of the action to be performed will be indicated by application-specific elements.

The pen input device can also be used to control a cursor, such as a mouse.

4.3 Forms Processing

Even with the shift to an e-centric world, paper forms filled out with ink continue to play an important role in many businesses. The growing number of enterprise-connected handheld devices will also drive the deployment of mobile forms filling applications. With their pen-enabled interfaces, these devices will demand forms that take advantage of the natural input modality of ink as well as its expressiveness. Ink-enabled forms can be employed in situations where keyboards are not convenient (whether due to environmental constraints or user preference). They also allow for the easy input of ideographic characters and non-textual data.

An InkXML forms processing application captures user input in the form of stroke data formatted using InkXML. The captured data is sent to a handwriting recognition engine, and the resulting transcription is presented to the user (or a separate validator) for possible correction. This validation process can be made more efficient and accurate through the use of confidence information returned by the recognizer. Also, since it is often necessary to be able to recreate the appearance of the form as it was filled out, the InkXML stroke data may be archived along with the validated text.

In many cases, field data can be collected using form elements that do not require transcription, such as check boxes. In other cases, with a graphical format being both more compact and more easily understood, a textual representation may not be the best way to convey information. Diagrams and charts have been used to capture data on paper forms, and their use will continue when electronic forms are enabled for pen input. In these situations, the InkXML stroke data serves not only as an intermediate format, but also as the final format for the captured data.

One advantage of electronic forms over their paper counterparts is that they make it easier for multiple parties at separate locations to fill out a particular form instance or for a single individual to fill out a particular form instance at multiple locations or over an extended period of time. In these cases, where the form is presented and its data captured on devices with differing capabilities, multiple InkXML screen contexts may be composed when combining the ink data from the different form input sessions. A session is the time between when a user begins working with the system and when he or she stops working.

An ink-enabled forms processing application should integrate with existing and emerging standards for electronic forms. One such standard is XForms, which is discussed in Section 6.5.

4.4. Handwriting Recognition

This application-specific element is intended primarily to support the needs of online handwriting recognition developers requiring large corpora of handwriting samples stored in a common format with provisions for data annotation about recording conditions, writers, segmentation, data layout, data quality, labeling and recognition results. Therefore, this element is intended to offer the functionality previously available with the UNIPEN format [1], while adding a number of improvements.

The first improvement comes from the use of XML complex types to group related annotations. For instance, a data type "writerInfoType" is defined with elements such as <hand>, <sex>, <age>, <style>, <skill>, and <country>, which generally describe a given writer. Other complex types will be defined to group information describing the source of a set of ink images, information describing user interface elements affecting the ink capture (for example, guidelines as watermarks) and information describing the nature and structure of the data.

A second improvement will be realized through a more sophisticated labeling scheme. Labels can generally be of two main types: "machine"-type (for example, an interpretation generated by a recognition engine) and "human"-type (for example, a truth value assigned by the writer of the ink). Labels also have a (probability) score associated with it, which is indicative of the likelihood that the handwritten ink matches that particular label. Thus, a basic label might look like this:

<label type="machine" source="NeuroScript_v1" score="0.85">hello</label>
An unlimited number of labels can be associated with a given ink image.

The third improvement will be the result of leveraging other industry standards. In particular, activities within the W3C Voice Browser Working Group have resulted in a specification of a grammar format [2] and work has been initiated on a lexicon format. Both grammars and lexicons are necessary elements for a handwriting recognition schema. For instance, a machine-generated label can have as an attribute the URI of the grammar used to generate the label.

References

[1]   I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. UNIPEN project of on-line data exchange and recognizer benchmarks. Proceedings of the 12th International Conference on Pattern Recognition (ICPR). Jerusalem. Vol. II, 1994, pp. 2933.

[2]   W3C Voice Browser Working Group. Speech Recognition Grammar Specification for the Speech Interface Framework. January 2001. This can be found on the Web at http://www.w3.org/TR/grammar-spec.

4.5 Ink Messaging

People have been communicating with each other using voice and text for a number of years via different types of electronic media such as telephones, e-mail, and instant messaging. However, people communicated with each other using handwritten notes for many centuries before these other media were invented. There is always a need to convey a message using drawings. Sometimes, a handwritten message provides a more personalized, creative, and expressive way to represent our ideas and emotions. In addition, handwritten notes are multilingual by nature. Currently, people in many parts of the world, who do not use Roman symbols to represent their languages, have trouble communicating using modern technologies. Therefore, "digital ink" is a technology that will continue to be an important type of communication media that people need.

4.5.1 Raw Ink Capture

People frequently associate ink with handwriting recognition, but raw ink possesses unique characteristics that text cannot match. For people who currently do not have a way to write in their native languages using a keyboard, raw ink provides a more efficient and accurate way to exchange ideas and information.

The most efficient way of representing raw ink is to use stroke (or trace) information. Storing bitmaps or any other graphical formats will either waste memory or lose relevant information.

To enrich the experience of using raw ink to communicate one needs to capture, information about the brush type (to preserve the calligraphy type of effects), pen color, and pen width as well as the x/y coordinates.

For a typical ink mail or ink messaging type of application, traces are grouped into messages of arbitrary size. Consider the following possibilities and requirements:

4.5.2 Ink Rendering

Once the raw ink data is collected and transmitted to a remote recipient, the data will need to be displayed. The goal is to preserve the original ink related information (such as brush type, color, and pen width). The requirements for ink rendering are:

4.5.3 Transmission Efficiency

See Section 5—Binary Ink for a discussion on this topic.

Open Issues

Embedded ASCII text and shapes: It is easy to conceive an ink messaging application that allows raw ink to be mixed together with text and standard shapes (such as a rectangle or a circle) into a single message. Ideally, developers of ink messaging applications will leverage the SVG standard in this respect.

4.6 Electronic Signatures and Authentication

An individual uses electronic signatures to electronically authorize a business transaction. An ink application uses electronic signatures to verify and identify an individual. As more business is conducted electronically (for example, on the internet, point-of-sale terminals, and handheld devices), the need for electronic signatures will increase. Standards are necessary to exchange digital signatures across heterogeneous platforms and enable interoperability among hardware and software provided by multiple vendors.

4.6.1 Electronic Signatures

An electronic signature is created when a person writes and the information is captured electronically. The two types of electronic signatures are online and offline. Online signature capture records the dynamic movement of the pen stylus when writing a name. Parameters recorded during the signature may include the stylus position, pressure, and tilt. Offline signature capture is an electronic image of the signature, which is stored in a bitmap or jpeg image format.

A digital signature is a binary code produced by a device to represent a person. This type of signature is an electronic version of a wax seal. Some methods of producing a digital signature include using a password, a PIN, a computer key, an electronic signature, an electronic key, a magnetic strip card or smart card, or other physical token. Typically, a digital signature is generated with a private key known only by the user.

4.6.2 Signature Authentication

Signature authentication and verification is a 1:1 matching of a reference to a sample of handwriting. Signature identification is a 1:N matching, which determines who the signer is from a database of N handwriting templates. While authentication and verification are often considered the same thing, a distinction can be made; authentication is a real-time process; whereas, verification is an offline process.

On June 30, 2000, Present Clinton signed the "Electronic Signatures in Global and National Commerce Act" (E-Sign Act) that gives digital signatures the same legal status as handwritten signatures. The act is technology-neutral, stating that an electronic signature is whatever is agreed upon by two parties. It can be an "electronic sound, symbol or process, attached to or logically associated with a contract or other record and executed or adopted by a person with the intent to sign the record." It could be anything from a PIN to a digital certificate accompanying an electronic signature to verify the identity of the signer.

4.6.3 Standards for Authentication

All features used to verify an electronic signature can be represented in or derived from InkXML elements. Currently, there is no standard for which features are examined or how they will be examined to verify a signature. The following are features that may be used to verify an electronic signature: Standards need to be established for businesses and consumers (business-to-business and business-to-consumers) to agree on the minimum requirements for a valid (trustworthy) electronic signature. InkXML will be a sufficient and important standard to represent digital signatures.

Because an electronic signature is the electronic binding of an individual's identity to a contract, assurances must be made that the electronic signature is authentic (generated by the individual) and that the electronic contract will not be altered after signing. The establishment of standards for public key infrastructure (PKI) will assist in providing these assurances.

4.7 Content Representation/Retrieval

An incommensurable amount of audio-visual information is becoming available in digital form-in digital archives, on the World Wide Web, in broadcast data streams, and in personal and professional databases. The amount of audio-visual information is growing rapidly. The value of information often depends upon how easily it can be found and retrieved.

In spite of the fact that users have increasing access to these resources, identifying and managing them efficiently is becoming more difficult because of the sheer volume of the information. The question of identifying and managing content is not restricted to just database retrieval applications such as digital libraries, but also extends to other areas such as broadcast channel selection, multimedia editing, and multimedia directory services.

Digital ink can be used for representing content and/or retrieving existing content. Examples include creating handwritten notes, filling out predefined forms using an electronic pen, and annotating other contents. Examples for using digital ink to retrieve existing content include presenting handwritten queries to a retrieval system.

The MPEG-7 standard, formally called "Multimedia Content Description Interface," provides a rich set of standardized tools to describe multimedia content. Both human users and automatic systems that process audiovisual information are within the scope of MPEG-7, which is described in Section 6.1.

Open Issues

Application-specific segments: This working committee is not satisfied with the current discussion of application-specific segments. We believe that the working group should consider an abstract element (perhaps called abstract-segment), from which an application-specific schema may define derived or extended elements for their own use. This abstract-segment could perhaps include minimal parser syntax covering trace references (using XPointers), and some simple attributes.

5. Binary Ink (Compression)

Contents:
5.1 Background
5.2 Compression Approach

5.1 Background

An important application of electronic ink, briefly mentioned in the introduction section, is telewriting. Telewriting is the exchange of handwritten information through telecommunications means. Users draw or write on a PDA screen, electronic whiteboard, or other ink-capture device to compose a message in their own handwriting. Then, the ink message is addressed and delivered to other PDA or whiteboard users, Internet e-mail users, or networked devices. The recipient views the message as the sender composed it, including text in any mix of languages and drawings. In the case of a wireless channel, the two-way transmission of these ink messages will offer mobile users a compelling new way to communicate. When the bandwidth of the network connection is low (as with the cellular infrastructure of today), the question arises of how to reduce the transmission size of InkXML documents. One possible answer lies in a binary InkXML format.

There is still an ongoing debate about the pros and cons of binary XML. Some developers within the XML community express concerns that the original XML design goals will be compromised with a binary encoding (for example, being readable by people) and suggest that there is a lack of real-world test profiles documenting the space- and/or time-saving benefits [1]. Others say that the processing inefficiencies stemming from the document size will preclude widespread adoption of XML. Although deriving a generally useful encoding (one that is most effective on a large variety of documents) is considered a difficult problem, some research and development work in this area is available. The XMill system [2], developed at the AT&T Labs, is based on a user-controllable transformation that exposes document redundancy followed by a standard text compression operation. For instance, users can specify a "pre-compress" transformation that converts strings corresponding to numeric values into their binary integer representation.

One drawback of XMill discussed by Cheney [3] is that it precludes the incremental processing of the XML document. Cheney proposed an online encoder called ESAX that compresses better and faster than XMill. ESAX leverages the work of a general SAX parser by using an encoding scheme where element start tags, end tags, attribute names, and events are represented by single byte code-words. The encoder and decoder both maintain a table of known symbols. The encoder informs the decoder whenever a new symbol is encountered.

Similar to ESAX, the WAP Binary XML (WBXML) Encoding format specifies how XML elements and attribute tags are to be tokenized with respect to a symbol table (the Code Space) [4]. The actual code-words corresponding to the different tags are not defined within the WBXML specification because they are specific to a given document-type. For instance, the Wireless Markup Language (WML) Specification lists the WBXML-based codes that represent the different WML tags [5]. Binary formats, such as WML, are considered useful because applications deal with pre-defined element tokens directly, so there is no overhead for building an on-the-fly dictionary.

Character data content is not compressed in the WBXML format. It is transmitted as inline strings or as a reference to an entry in a string table, which is included at the beginning of the document. The Millau system [6] proposes an extension to WBXML where character data is transmitted on a separate stream (the Content Stream), so a standard text compression algorithm can be used. A special token inside the main data stream (the Structure Stream) indicates the presence of compressed character data.

However, for a typical InkXML document, the ink data (not the character data) must be considered when designing an efficient binary encoding format. There are two distinct modes for coding digital ink—raster scanning and curve tracing. Facsimile coding algorithms belong to the first mode and exploit the correlation within consecutive scan lines. Chain Coding (CC), belonging to the second mode, represents the pen trajectory as a sequence of transitions between successive points in a regular lattice. It is known that curve tracing algorithms result in a higher coding efficiency if the total trace length is not too long. Furthermore, the use of a raster-base technique implies the loss of all time-dependent information.

5.2 Compression Approach

Two curve-tracing algorithms have been standardized in the ITU-T150 recommendation-DCC and Zone Coding [7]. Either one can be used as part of a binary InkXML format. DCC is lossy and belongs to the family of multi-ring and multi-grid differential chain coding algorithms. The curve trajectory is represented by recording only transitions (vectors) between successive points in a regular lattice. Four concentric square lattices, or rings, and four different ways to allocate quantization points around the rings are allowed. The labels of the difference vectors are encoded using a given Huffman table; special code-words invoke the change of coding parameters (for example, radius and grid size). Zone coding is quasi-lossless and very low in complexity (a simple modification is possible to make this encoder truly lossless). Within a given ink trace, vectors linking consecutive points are encoded with three attributes: quadrant number, zone number, and relative address. The quadrant and zone numbers are represented with pre-defined single code-words of variable length.

In summary, this working group believes that a binary encoding of InkXML (say InkXMLb) can be built from the experiences summarized above. In particular, this group suggests that a WBXML-compliant code space be designed for the tags and attribute names and values defined in the InkXML Schema. Compressed ink traces can be included as opaque binary data. The specific algorithm used to compress the ink will be indicated using an ID field. Based on the ITU-T150 recommendation, two standardized algorithms will be offered-one lossy and one lossless. Recommendation of these standarized trace encoding algorithms does not preclude the use of other proprietary ones that might offer higher efficiency, as long as such offerings operate on traces and adhere to the InkXMLb Code Space specification.

References

[1]   L. Dodds. Intuition and Binary XML. www.xml.com/pub/a/2001/04/18/binaryXML.html

[2]   H. Liefke and D. Suciu. "XMill: An efficient compressor for XML data." Proceedings of the 2000 ACM SIGMOD International Conference on the Management of Data. 2000. See also www.research.att.com/sw/tools/xmill.

[3]   J. Cheney. Compressing XML with Multiplexed Hierarchical PPM Models. www.cs.cornell.edu/People/jcheney/xmlppm/paper/paper.html.

[4]   Wireless Application Protocol Forum. Binary XML Content Format Specification. Version 1.3, May 2000. Available at www.wapforum.org/what/technical.htm.

[5]   Wireless Application Protocol Forum. Wireless Markup Language Specification. Version 1.3, February 2000. Available at www.wapforum.org/what/technical.htm.

[6]   M. Girardot and N. Sundaresan. "Millau: An encoding format for efficient representation and exchange of XML over the Web." Proceedings of the Ninth International World Wide Web Conference (WWW9). Amsterdam, The Netherlands. May 2000.

[7]   International Telecommunication Union. T-150 Telewriting Terminal Equipment. 1993. Available at www.itu.int/itudoc/itu-t/rec/t/t150.html.

6. Relationship to Other Standards

Contents:
6.1 MPEG-7
6.2 SVG
6.3 JOT
6.4 UNIPEN
6.5 XForms
6.6 SMIL
6.7 VML
6.8 ITU T.150
6.9 Recognition Grammars

6.1 MPEG-7

The Multimedia Content Description Interface (MPEG-7) [ ipsi.fhg.de/delite/Projects/MPEG7/ ] is the fourth major standard developed by the Moving Picture Experts Group. In contrast to previous MPEG standards that were designed to represent the content itself, MPEG-7 is intended to represent information about the content. Planned to achieve final status in 2001, MPEG-7 provides a comprehensive set of audio-visual descriptions and a flexible framework for describing the content of multimedia material. The motivation is to enable fast and efficient searching for material of a user's interest by making the same content accessible to more search engines and by making the same search engine capable of searching more sources.

6.1.1 Descriptors and Description Schemes

MPEG-7 specifies a standard set of Descriptors and Description Schemes. A Descriptor is simply the representation of a feature, such as color. Description Schemes are predefined structures of Descriptors and their relationships. Users define their own structures with a special language called the Description Definition Language (DDL). The description (a set of instantiated Description Schemes) may be physically located with the associated content itself, in the same data stream, or on the same storage system. However, the descriptions also could live somewhere else on the globe.

6.1.2 Relationships Between InkXML and MPEG-7

There are at least two relationships between InkXML and MPEG-7. In one scenario, ink traces can be used in the definition of a Descriptor. For instance, users might use a pen to generate shape information about the objects within an image. In a second scenario, a standardized Description Scheme can be defined to describe the content of an ink document stored in InkXML format, which may contain handwritten text in any mix of languages and drawings. One such Description Scheme is termed InkSegmentDS [1]. In MPEG-7 terminology, segments represent physical spatial, temporal, or spatial-temporal components of the audio-visual content.

6.1.3 InkSegmentDS

InkSegmentDS is a subclass of the "SegmentDS" which defines general properties of segments, such as media (format), creation (date), and usage (access rights). It adds provisions for describing interpretations of the ink generated by a handwriting recognition engine. Similar information is being proposed in the context of InkXML application-specific files (see Handwriting Recognition, Section 4.4). Information already contained in the InkXML primitive file, such as the input device characteristics, is being included in the media description here.

It is clear that the working group should seek recognition from the MPEG group as the preferred format for electronic ink, while the MPEG group provides feedback for the refinement of the InkSegmentDS.

References

[1] International Organization for Standardization. ISO/IEC 15938-5 Multimedia Description Schemes. 2001. Available at ipsi.fhg.de/delite/Projects/MPEG7/.

6.2 SVG

Scalable Vector Graphics (SVG) [ http://www.w3.org/Graphics/SVG/Overview.htm8 ] is a markup language for 2D graphics written in XML. It can be used to represent graphic shapes (lines and curves), images, and text. SVG is a rich language that represents vector graphics, is a heavyweight application, and is not appropriate for small handheld devices like PDA's and mobile phones. Also, SVG does not have sufficient means to: These and other attributes are required for applications, such as:

6.2.1 Synergy Between InkXML and the SVG

SVG is well suited for rendering handwriting since SVG supports the concepts of a vector path, width, and color. This corresponds to a stroke in InkXML.

The SVG statement <path d="M 100 100 L 300 100 L 200 300" style="stroke: blue; stroke-width:0.1"/> describes a triangle, starting at point (100,100), using the Move command (M). The Lineto (L) command instructs the line to be draw from the starting point to the second point (300 100), a line drawn from the second point to a third point (200 300). The line color is defined as blue, with a line width of 0.1. The line segments are an open path. A capital "M" or "L" means absolute position, and a lower case "m" or "l" means relative position.

6.2.2 Recommendations About Ways to Interface InkXML With SVG

InkXML strokes and page numbers can be easily converted into SVG notation by:

6.2.3 Example

An InkXML file stores the activity of the pen stylus, while SVG describes how the ink should be displayed. In the example below, an InkXML trace is converted into an SVG path. The stylus pressure represented in InkXML (value=15) is converted into the stroke width (value=0.1") represented in the SVG file. Alternatively, the stylus pressure in the InkXML file can be ignored and the stroke width held constant in the SVG file.

The InkXML first gives an absolute position followed by deltas, which are the relative positions referenced to the first absolute position. These are maintained in the SVG representation.

InkXML
<trace color="0  0  255"  brushShape="SQUARE"  brushSize="3">
234  122  12
2  12  14
-3  0  15
</Trace>
SVG
<path d="M 234 122 l2 12 -3 0" stroke="blue"
stroke-width="0.1" stroke-linecap"square"/>

6.3 JOT

Jot [ http://hwr.nici.kun.nl/unipen/jot.html ] is a binary format for compact storage and exchange of electronic ink. The (defunct) Slate Corporation created Jot in 1992 with input from Apple, General Magic, GO, Lotus, and Microsoft. In addition to the X, Y, Z, and theta coordinates, it supports drawing attributes such as nib type, ink color, bounding information, scale and offset. It also includes supports for relative time between strokes and information about buttons on the stylus. An optional (lossless) compression of stroke data is available.

This proprietary format allowed for a very accurate description of electronic ink. Furthermore, Jot was "lightweight" because of its binary representation. On the other hand, InkXML is an open format supporting a wide variety of ink properties including a binary mode as an optional layer, which still appeals to application developers who object to a binary encoding of ink.

Finally, Jot does not support any abstract characterization of the ink in the way it will be possible through InkXML application-specific schemas.

6.4 UNIPEN

UNIPEN [ http://hwr.nici.kun.nl/unipen/ ] is an ASCII format used primarily by the technical and scientific community to store handwriting samples. It was developed in 1993 with the participation of over 40 institutions. It supports rich annotations that suit the needs of people who develop, train, and test handwriting recognition algorithms on large amounts of data. It is not optimized for data storage or real-time data transmission. There are no provisions for ink width, ink color, or similar requirements for ink manipulation applications. Supported annotation includes recording conditions, writer information, writing style, segmentation, data layout, labeling, and recognition results. Segmentation information is de-coupled from the stream of coordinates; thus, allowing multiple "views" (for example, paragraph, line, and word) on a single ink file.

InkXML will be able to provide the same rich annotation possible with the current UNIPEN format by means of a application-specific file definition (see Section 4.4, Handwriting Recognition). InkXML is an improvement over UNIPEN because it replaces UNIPEN's flat attribute organization with a record-like structure by supporting a more sophisticated labeling scheme and by leveraging other standards.

6.5 XForms

XForms [ www.w3.org/TR/xforms/ ], the next generation of Web forms, defines a vendor-neutral XML representation for forms. The user interfaces and the data that forms collect replace the server- and client-side scripting with a declarative language for describing forms. It separates purpose (embodied by the XForms Model) from presentation, making the form description distinct from the instance data for the form [1].

This group envisions several points of intersection between InkXML and XForms. Although the working group is aware of many others (such as ink signature capture for authentication), this draft addresses two points: (1) the use of ink to collect alphanumeric data in an XForms processor and (2) the collection of ink as ink instance data in an XForms model.

6.5.1 Ink Forms Presentation

In XForms, a particular model can be rendered in a number of different presentations, including the XForms User Interface, XHTML, WML, and VoiceXML. Along these lines, an XForms model can also be rendered using a processor that is capable of capturing ink input. In an ink-enabled XForms processor, facets that restrict the value spaces of model items may be converted into constraints for a handwriting recognizer. Additional constraints (such as guidelines, boxes, and character shape constraints) may also be introduced for the presentation and passed on when invoking recognition processes.

In such a system, the ink capture module would collect the users' input ink as InkXML data, which would be passed to the handwriting recognizer along with any syntactic constraint information provided by the form model and any additional constraints added by the ink-centric presentation. Then, the returned results and confidence scores would be presented to the user for verification. When the form is completed or suspended, the instance data would be submitted with the ink stored optionally in an InkXML format for archival purposes.

Structurally, the recognizer may be a module within the XForms processor, or the recognizer could be a service that the processor invokes remotely. Recognition may be performed immediately, as in the previous scenario. This facilitates the presentation of dynamic user interfaces supported by Xforms. On the other hand, the recognition also might be done offline if recognition services are not available to the client.

Example High-level Architecture for an Ink-enabled XForms Processor

Open Issues

The working group should interact with the XForms community to resolve several issues that arise in this model:

6.5.2 Ink as Instance Data

XForms has a binary data type for representing binary data. Since XForms supports custom datatypes based on XML Schema syntax, InkXML can be used as a custom datatype for ink instance data.

Ink instance data may require the introduction of ink-specific static facets, such as the dimensions of the input area, the number of strokes in the input, or the duration of the input.

Open Issues

The working group should attempt to define a number of useful facets for ink instance data.

The group should also work to address any ink-specific concerns with respect to XForms' suspend/resume feature. For example, when a form is suspended and later resumed on a device with different display or capture characteristics, the screen contexts for the two devices must be combined appropriately. Also, if a form is suspended by one writer and resumed by another, it may be useful to preserve the writer information (who wrote this ink?) along with the ink data for the purposes of customizing the handwriting recognition process.

References

[1]   W3C, XForms 1.0, W3C Working Draft, 08 June 2001. Available at http://www.w3.org/TR/2001/WD-xforms-20010608

6.6 SMIL

The Synchronized Multimedia Integration Language (SMIL) [ http://www.w3.org/TR/REC-smil ] is a W3C recommendation defining an XML-compliant language that allows the integration of a set of independent media objects into a spatially and temporally synchronized multimedia presentation [1]. With SMIL, authors can "choreograph" multimedia presentations where audio, video, text and graphics are combined in real-time. A SMIL document also interacts with a standard HTML page. (For a complete introduction, see [2].) SMIL documents might become very common on the Web thanks to streaming technologies.

6.6.1 Basic Elements

The basic elements in a SMIL presentation are: Currently, digital ink is not supported as a SMIL native media type. One option would be to convert the ink into a static image, such as a GIF format, and render it as an img element. However, this would preclude the possibility of displaying the ink as a continuous media with animated ink. Another option is to use the SMIL generic media reference tag:
<ref src="file" type="MIME-Type/Subtype" region="r"
begin="3s" ...>
This option requires the existence of an appropriate MIME content-type/subtype registered with the IANA group. From among currently recognized type/subtype, none appears to fit the needs of electronic ink; however, "video" might be the closest.

In addition to the desirability of having an InkXML MIME type, there are some potential benefits to having a special SMIL tag instead of using ref. For instance, such a tag could have an attribute that allows the control of the duration of the animated ink in a non-standard way, such as over-writing the time information already encoded within the ink file.

References

[1] W3C. Synchronized Multimedia Integration Language (SMIL) Specification. June 1998. Available at http://www.w3.org/TR/REC-smil.

[2] Hardman L. A Smil Tutorial 1998. Available at http://www.cwi.nl/~media/SMIL/Tutorial.

6.7 VML

Vector Markup Language (VML) [ http://www.w3.org/TR/NOTE-VML ] is a markup language for 2D graphics written in XML. In 1998, Autodesk Inc., Hewlett-Packard Company, Macromedia Inc., Microsoft Corporation, and Visio Corporation proposed VML to the W3C. It renders graphic shapes composed of lines and curves, images, and text.

VML, like SVG, is heavyweight and does not have sufficient means to represent a multitude of handwriting applications. VML is not a standard.

6.7.1 Synergy between InkXML and VML

VML renders handwriting because VML supports the concepts of a polyline, a collection of lines segments defined by a set of points with width and color. Polyline corresponds to a stroke in InkXML. The polyline element defines shapes made from connected line segments.

The following is the VML polyline template:

<polyline
 points="0 0 10 10 20 0"
 id=null
 href=null
 target=null
 class=null
 title=null
 alt=null
 style='visibility: visible'
 opacity="1.0"
 chromakey="null"
 stroke="true"
 strokecolor="blue"
 strokeweight="1"
 fill="true"
 fillcolor="white"
 print="true"
 coordsize="1000,1000"
 coordorigin="0 0"
 />

6.7.2 Recommendations About Ways to Interface InkXML With VML

InkXML strokes and page numbers can be easily converted into VML by:

6.7.3 Example

An InkXML file stores the activity of the pen stylus, while VML describes how the ink should be displayed. In the example below, an InkXML trace is converted into a VML path. The stylus pressure represented in InkXML (value=15) is converted into the stroke width (value=0.1") represented in the VML file. Alternatively, the stylus pressure in the InkXML file can be ignored and the stroke width held constant in the VML file.

The InkXML first gives an absolute position followed by deltas, which are the relative positions referenced to the first absolute position. The first VML example represents the InkXML trace as a path using an absolute position. The second example defines a polyline as a sequence of absolute points.

InkXML
<trace color="0 0 255" brushShape="SQUARE" brushSize="3">
234  122  12
2  12  14
-3  0  15
</Trace>
VML
<path d="M 234 122 l 2 12 -3 0" stroke="blue"
stroke-width="0.1" stroke-linecap"square"/> 

<polyline
 points="234 122 236 134 233 134"
 style='visibility: visible'
 opacity="1.0"
 chromakey="null"
 stroke="true"
 strokecolor="blue"
 strokeweight="0.1"
 fill="false"
 print="true"
 coordsize="1000,1000"
 coordorigin="0 0"
 />

6.8 ITU T.150

Like Jot, the ITU-T.150 [ www.itu.int/itudoc/itu-t/rec/t/t150.html ] recommendation is a binary format for compact storage and exchange of electronic ink [1]. However, T.150 also describes how binary ink, possibly in combination with voice, can be converted into a signal suitable for transmission over a telephone network.

The two methods for encoding the handwritten ink that are included in the ITU-T.150 recommendation are:

ITU T.150 is relevant to InkXML because it provides standardized algorithms for compressing ink trace coordinate data. These algorithms can be used by InkXML compression. Only minor modifications of these encoders might be necessary. For instance, zone coding can be upgraded easily to make it truly lossless, both in time and space, and to remove a restriction on the maximum resolution supported.

References

[1] International Telecommunication Union. T-150 Telewriting Terminal Equipment. 1993.

6.9 Recognition Grammars

One way to increase the performance of a handwriting recognition system in terms of accuracy and/or speed is to use contextual information in the recognition process. Contextual knowledge can be in the form of lexical constraints, such as a dictionary of known words in the language to restrict interpretations of the input ink. Without some constraints, factors such as segmentation ambiguity, letter co-articulation, and ligatures make exact recognition of continuous handwritten input a very difficult task.

Applications of contextual information in the form of grammars are used by speech recognition systems. A grammar allows the specification of words and patterns of words that the recognizer should expect. Recently, the W3C Voice Browser Working Group suggested an XML-based syntax for representing BNF-like grammars [1] [ www.w3.org/TR/grammar-spec ].

The current W3C grammar specification includes a mode attribute with "speech" or "dtmf" as possible values. The mode attribute indicates how to interpret tokens contained by the grammar. For instance, speech tokens are expected to detect speech audio that sounds like the token. One possible way to increase awareness of the important role of grammars in handwriting recognition systems is to suggest a third value for the mode attribute-namely, "hwr."

Then, APIs for handwriting recognition engines could be standardized to take two inputs: ink in InkXML format and a grammar in W3C format. Thus, it is desirable that members of this committee be invited to participate in the W3C activities related to the grammar specification.

References

[1]   W3C Voice Browser Working Group. Speech Recognition Grammar Specification for the Speech Interface Framework. January 2001.

7. Appendices

Contents:
Appendix A Primitive Schema
Appendix B Application-specific Schema
Appendix C Glossary

Appendix A—Primitive Schema

This is a (partial) first draft of a possible schema for the InkXML format. It is provided for illustrative purposes only. It incorporates in some rudimentary way almost all of the primitive elements discussed in this document, such as "deviceInfo", "channelList", "screenContext", "trace", and "chunk". Events are missing at this point. It is a well-formed XML schema that can be used to validate sample documents.
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
<xsd:annotation>
   <xsd:documentation xml:lang="en">
      Primitive file
      InkXml Schema
   </xsd:documentation>
</xsd:annotation>
 
<xsd:element name="inkxml" type="inkxmlType"/>
 
<!--                                                          -->
<!-- Main document type declaration                           -->
<!--                                                          -->

<xsd:complexType name="inkxmlType">
   <xsd:sequence>
      <xsd:element name="deviceInfo" minOccurs="0" maxOccurs="1"
       type="deviceInfoType"/>
      <xsd:element name="channelList" type="channelNameListType"/>
      <xsd:element name="screenContext" minOccurs="0" maxOccurs="unbounded"
       type="screenContextType"/>
      <xsd:choice maxOccurs="unbounded">
         <xsd:element name="trace" minOccurs="0" maxOccurs="unbounded"
       type="traceType"/>     
         <xsd:element name="chunk" minOccurs="0" maxOccurs="unbounded"
       type="chunkType"/>
      </xsd:choice>
   </xsd:sequence>         
</xsd:complexType>
 
 
<!--                                                          -->
<!-- Information about the transducer device                  -->
<!--                                                          -->

<xsd:complexType name="deviceInfoType">
   <xsd:sequence>
      <xsd:element name="sampleRate" minOccurs="0" maxOccurs="1"
       type="xsd:decimal"/>
      <xsd:element name="sampleMode" minOccurs="0" maxOccurs="1"
       type="samplingModeType"/>
      <xsd:element name="channelInfo" minOccurs="0" maxOccurs="1"
       type="channelsInfoType"/>
   </xsd:sequence>
   <xsd:attribute name="manufacturer" type="xsd:string"/>
   <xsd:attribute name="model" type="xsd:string"/>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- Name of known sampling modes                             -->
<!--                                                          -->

<xsd:simpleType name="samplingModeType">
   <xsd:restriction base="xsd:string">
      <xsd:enumeration value="UNIFORM"/>
      <xsd:enumeration value="NONUNIFORM"/>
   </xsd:restriction>
</xsd:simpleType>
 
 
<!--                                                          -->
<!-- Name of data channels a device can be capable of reporting -->
<!--                                                          -->

<xsd:simpleType name="channelNameType">
   <xsd:restriction base="xsd:string">
      <xsd:enumeration value="X"/>
      <xsd:enumeration value="Y"/>
      <xsd:enumeration value="F"/>
      <xsd:enumeration value="U"/>
      <xsd:enumeration value="V"/>
   </xsd:restriction>
</xsd:simpleType>
 
 
<!--                                                          -->
<!-- List of data channels that do appear in the file         -->
<!--                                                          -->

<xsd:simpleType name="channelNameListType">
   <xsd:list itemType="channelNameType"/>
</xsd:simpleType>
 
 
<!--                                                          -->
<!-- Name of event types a device can be capable of reporting -->
<!--                                                          -->

<xsd:simpleType name="eventNameType">
   <xsd:restriction base="xsd:string">
      <xsd:enumeration value="time"/>
      <xsd:enumeration value="penChange"/>
      <xsd:enumeration value="switchButton"/>
      <xsd:enumeration value="modeChange" />
   </xsd:restriction>
</xsd:simpleType>
 
 
<!--                                                          -->
<!-- Information on channel(s) characteristics                -->
<!--                                                          -->

<xsd:complexType name="channelsInfoType">
   <xsd:sequence>
      <xsd:element name="channel" minOccurs="1" maxOccurs="unbounded">
         <xsd:complexType>
             <xsd:sequence>
                <xsd:element name="range" type="xsd:decimal"/>
                <xsd:element name="resolution" type="xsd:integer"/>
                <xsd:element name="accuracy" type="xsd:decimal"/>
                <xsd:element name="eventList">
                   <xsd:simpleType>
                      <xsd:list itemType="eventNameType"/>
                   </xsd:simpleType>
                </xsd:element>
             </xsd:sequence>
             <xsd:attribute name="chName" type="channelNameType"
              use="required"/>
             <xsd:attribute name="type">
                <xsd:simpleType>
                   <xsd:restriction base="xsd:string">
                      <xsd:enumeration value="BOOLEAN"/>
                      <xsd:enumeration value="INTEGER"/>
                      <xsd:enumeration value="DECIMAL"/>
                   </xsd:restriction>
                </xsd:simpleType>
             </xsd:attribute>
         </xsd:complexType>
      </xsd:element>
   </xsd:sequence>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- The possible states of the pen for a given ink trace     -->
<!--                                                          -->

<xsd:simpleType name="inkStateType">
   <xsd:restriction base="xsd:string">
      <xsd:enumeration value="penUp"/>
      <xsd:enumeration value="penDown"/>
      <xsd:enumeration value="continue"/>
   </xsd:restriction>
</xsd:simpleType>
 
 
<!--                                                          -->
<!-- The possible colors of a "penDown" ink trace             -->
<!--                                                          -->

<xsd:simpleType name="inkColorType">
   <xsd:restriction base="inkColorChList">
      <xsd:length value="3"/>
   </xsd:restriction>
</xsd:simpleType>
 
<xsd:simpleType name="inkColorChList">  
   <xsd:list itemType="inkColorCh"/>
</xsd:simpleType>  
 
<xsd:simpleType name="inkColorCh">
   <xsd:restriction base="xsd:positiveInteger">
      <xsd:maxInclusive value="255"/>
   </xsd:restriction>
</xsd:simpleType>
 
 
<!--                                                          -->
<!-- Attributes associated with ink traces                    -->
<!--                                                          -->

<xsd:attributeGroup name="traceAttributesType">
   <xsd:attribute name="type" type="inkStateType"/>
   <xsd:attribute name="color" type="inkColorType"/>
   <xsd:attribute name="brushShape">
      <xsd:simpleType>
         <xsd:restriction base="xsd:string">
            <xsd:enumeration value="DISC"/>
            <xsd:enumeration value="SQUARE"/>
         </xsd:restriction>
      </xsd:simpleType>
   </xsd:attribute>
   <xsd:attribute name="brushSize" type="xsd:decimal"/>
   <xsd:attribute name="screenContextRef" type="xsd:string"/>
</xsd:attributeGroup>
 
 
<!--                                                          -->
<!-- Simple list of point coordinates                         -->
<!--                                                          -->

<xsd:simpleType name="basePtCoordListType">
   <xsd:list itemType="xsd:integer"/>
</xsd:simpleType>
 
 
<!--                                                          -->
<!-- Holder of ink point coordinate sequence                  -->
<!--                                                          -->

<xsd:simpleType name="ptCoordListType">
   <xsd:restriction base="basePtCoordListType">
      <xsd:minLength value="2"/>
   </xsd:restriction>
</xsd:simpleType>
 
 
<!--                                                          -->
<!-- Main ink trace definition.  Reader must use 'channelList'-->
<!-- field in order to properly parse this list.              -->
<!--                                                          -->

<xsd:complexType name="traceType">
   <xsd:simpleContent>
      <xsd:extension base="ptCoordListType">
         <xsd:attribute name="id" type="xsd:ID"/>
         <xsd:attributeGroup ref="traceAttributesType" />
      </xsd:extension>
   </xsd:simpleContent>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- Main ink chunk definition                                -->
<!--                                                          -->

<xsd:complexType name="chunkType">
   <xsd:sequence>
      <xsd:element name="trace" minOccurs="1" maxOccurs="unbounded"
       type="traceType"/>
   </xsd:sequence>
   <xsd:attribute name="id" type="xsd:ID"/>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- Main screenContext type declaration                      -->
<!--                                                          -->

<xsd:complexType name="screenContextType">
   <xsd:sequence>
      <xsd:element name="canvas">
         <xsd:complexType>
            <xsd:sequence>
               <xsd:element name="x1" type="xsd:positiveInteger"/>  
               <xsd:element name="y1" type="xsd:positiveInteger"/>  
               <xsd:element name="x2" type="xsd:positiveInteger"/>  
               <xsd:element name="y2" type="xsd:positiveInteger"/>  
            </xsd:sequence>
            <xsd:attribute name="id" type="xsd:ID"/>
         </xsd:complexType>
      </xsd:element>
      <xsd:element name="mapping" minOccurs="0" maxOccurs="1">
         <xsd:complexType>
            <xsd:sequence>
               <xsd:element name="t00" type="xsd:integer"/>  
               <xsd:element name="t01" type="xsd:integer"/>  
               <xsd:element name="t10" type="xsd:integer"/>  
               <xsd:element name="t11" type="xsd:integer"/>  
               <xsd:element name="t20" type="xsd:integer"/>  
               <xsd:element name="t21" type="xsd:integer"/>  
            </xsd:sequence>
            <xsd:attribute name="id" type="xsd:ID"/>
         </xsd:complexType>
      </xsd:element>
   </xsd:sequence>         
   <xsd:attribute name="id" type="xsd:ID" use="required"/>
</xsd:complexType>
</xsd:schema>

Sample Ink File

This is an uncompressed sample InkXML document-it includes ink and conforms to the above schema, which was named "inkxml.xsd." It contains three traces, one for each stroke in the letter "H." The first two traces have been arbitrarily grouped under a single chunk. The third trace makes explicit the screen context to which it belongs. Readers might find this example useful in getting a quick sense of what a simple ink file might look like. For readability, this example does not use any of the compression options.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Edited with XML Spy v3.5 NT (http://www.xmlspy.com)
by Giovanni Seni (Motorola HIL) -->

<inkxml xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="H:\inkxml.xsd"
<deviceInfo  manufacturer="Wacom"  model="UD-0608-R">
<sampleRate>200</sampleRate>
</deviceInfo>
<channelList>X Y</channelList>
<screenContext id="s01">
<canvas id="w01">
<x1>0</x1>
<y1>0</y1>
<x2>750</x2>
<y2>250</y2>
</canvas>
</screenContext>
<chunk id="c01">
<trace id="t01">
232 -94 232 -95 232 -96 232 -97 232 -98 232 -99 232 -101 232 -102
232 -103 232 -105 232 -106 232 -108 232 -109 232 -110 232 -112 232
-113 232 -114 232 -116 232 -117 232 -118 232 -119 232 -120 232 -121
232 -122 232 -123 232 -124 232 -125 232 -126 232 -127 232 -129 232
-130 232 -132 232 -133 233 -134 233 -135 233 -136 233 -137 233 -138
233 -139 233 -140 233 -141 233 -140 233 -139
</trace>
<trace id="t02">201 -151 202 -151 203 -151 204 -152 206 -152
208 -152 211 -152 214 -152 216 -152 219 -152 221 -152 223 -152 225
-152 227 -152 229 -152 230 -152 231 -152 233 -152 234 -153 236 -153
237 -153 239 -153 240 -153 241 -153 242 -153 243 -153 244 -153 245
-153 246 -153 247 -153 248 -153 250 -153 251 -153 253 -153 254 -153
255 -153 257 -153 258 -153 260 -153 261 -153 263 -152 264 -152 265
-152 266 -152 266 -151
</trace>
</chunk>
<trace id="t03" screenContextRef="w01">
203 -90 204 -90 205 -90 207 -90 209 -91 211
-91 214 -91 217 -91 220 -91 223 -91 226 -91 229 -91 231 -91 234 -91
236 -91 238 -91 240 -91 242 -91 243 -91 244 -91 246 -91 247 -91 249
-91 251 -91 252 -91 254 -91 255 -91 256 -91 258 -91 259 -91 260 -91
261 -91 260 -91
</trace>
</inkxml>

Open Issue

There are many ways to define some of the elements. In preparing these examples, the overriding principle was simplicity, but a more thorough discussion will be necessary for the working group.

Appendix B—Application-specific Schema

This is a (partial) first draft of a working schema for a UNIPEN-like file format. It is provided for illustrative purposes only. It illustrates the use of the XML "include" statement to bring in the definitions and declarations contained in the primitive schema ("inkxml.xsd") and to make them available as part of an application-specific schema definition. In this UNIPEN-like format, a file is a sequence of data blocks ("dataBlockType") with each block being a collection of images that share the same source. Within a data block, images are grouped by writer ("writerBlockType"). An image ("writerImage") is a sequence of traces and chunks (or references to them) and have one or more labels associated with the image.
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
<xsd:annotation>
  <xsd:documentation xml:lang="en">
     Application-specific file for UNIPEN-like data files
     Uses inkxml Schema
  </xsd:documentation>
</xsd:annotation>
 
<!-- For access to primitive file element definitions                              -->

<xsd:include schemaLocation="H:\inkxml.xsd"/>
 
 
<xsd:element name="inkxmlUNIPEN" type="inkxmlUNIPENtype"/>
 
<!--                                                          -->
<!-- Main document type declaration                           -->
<!--                                                          -->

<xsd:complexType name="inkxmlUNIPENtype">
   <xsd:sequence>
      <xsd:element name="dataBlock" type="dataBlockType" minOccurs="1"
maxOccurs="unbounded"/>
   </xsd:sequence>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- A collection of images that share the same source        -->
<!--                                                          -->

<xsd:complexType name="dataBlockType">
   <xsd:sequence>
      <xsd:element name="dataBlockInfo" minOccurs="0" maxOccurs="1"
type="dataBlockInfoType" />  
      <xsd:element name="writerBlock" minOccurs="1" maxOccurs="unbounded"
type="writerBlockType"/>
   </xsd:sequence>
   <xsd:attribute name="id" type="xsd:ID"/>
   <xsd:attribute name="hierarchy" type="xsd:string" use="required"/>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- Information describing the source of this data block     -->
<!--                                                          -->

<xsd:complexType name="dataBlockInfoType">
   <xsd:sequence>
      <xsd:element name="source" type="xsd:string" minOccurs="1"
       maxOccurs="1"/>
      <xsd:element name="date" type="xsd:date" minOccurs="1"
       maxOccurs="1"/>
   </xsd:sequence>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- A collection of images written by the same writer        -->
<!--                                                          -->

<xsd:complexType name="writerBlockType">
   <xsd:sequence>
      <xsd:element name="writerInfo" type="writerInfoType"/>
      <xsd:element name="writerImage" minOccurs="1"
       maxOccurs="unbounded">
         <xsd:complexType>
            <xsd:sequence>
               <xsd:element name="label" type="labelType" minOccurs="0"
                maxOccurs="unbounded"/>
               <xsd:choice maxOccurs="unbounded">
                  <xsd:element name="traceRef" type="traceRefType"
                   minOccurs="0" maxOccurs="unbounded"/>
                  <xsd:element name="trace" minOccurs="0"
                   maxOccurs="unbounded" type="traceType"/>
                  <xsd:element name="chunkRef" type="traceRefType"
                   minOccurs="0" maxOccurs="unbounded"/>
                  <xsd:element name="chunk" minOccurs="0"
                   maxOccurs="unbounded" type="chunkType"/>
               </xsd:choice>
                </xsd:sequence>         
         </xsd:complexType>                         
      </xsd:element>
   </xsd:sequence>
   <xsd:attribute name="id" type="xsd:ID"/>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- Information about a given writer                         -->
<!--                                                          -->

<xsd:complexType name="writerInfoType">
   <xsd:sequence>
      <xsd:element name="hand">
         <xsd:simpleType>
            <xsd:restriction base="xsd:string">
               <xsd:enumeration value="L"/>
               <xsd:enumeration value="R"/>
            </xsd:restriction>
         </xsd:simpleType>
      </xsd:element>
      <xsd:element name="sex">
         <xsd:simpleType>
            <xsd:restriction base="xsd:string">
               <xsd:enumeration value="M"/>
               <xsd:enumeration value="F"/>
            </xsd:restriction>
         </xsd:simpleType>
      </xsd:element>
      <xsd:element name="country" type="xsd:string"/>
      <xsd:element name="age" type="xsd:integer"/>
      <xsd:element name="skill">
         <xsd:simpleType>
            <xsd:restriction base="xsd:string">
               <xsd:enumeration value="bad"/>
               <xsd:enumeration value="ok"/>
               <xsd:enumeration value="good"/>
            </xsd:restriction>
         </xsd:simpleType>
      </xsd:element>
      <xsd:element name="style">
         <xsd:simpleType>
            <xsd:restriction base="xsd:string">
               <xsd:enumeration value="print"/>
               <xsd:enumeration value="cursive"/>
               <xsd:enumeration value="mixed"/>
            </xsd:restriction>
         </xsd:simpleType>
      </xsd:element>
   </xsd:sequence>
   <xsd:attribute name="id" type="xsd:ID"/>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- Information on an interpretation of some image           -->
<!--                                                          -->

<xsd:complexType name="labelType">
   <xsd:simpleContent>
      <xsd:extension base="xsd:string">
             <xsd:attribute name="id" type="xsd:ID"/>
         <xsd:attribute name="source" type="xsd:string"/>
         <xsd:attribute name="type">
            <xsd:simpleType>
               <xsd:restriction base="xsd:string">
                  <xsd:enumeration value="machine"/>
                  <xsd:enumeration value="human"/>
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:attribute>
         <xsd:attribute name="score" >
            <xsd:simpleType>
               <xsd:restriction base="xsd:float">
                  <xsd:minInclusive value="0"/>
                  <xsd:maxInclusive value="1"/>
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:attribute> 
      </xsd:extension>
   </xsd:simpleContent>
</xsd:complexType>
 
 
<!--                                                          -->
<!-- The actual ink                                           -->
<!--                                                          -->

<xsd:complexType name="traceRefType">
   <xsd:attribute name="uri" type="xsd:string"/>
</xsd:complexType>
</xsd:schema>

Sample UNIPEN-like Ink File

This uncompressed sample document conforms to the schema described above ("inkxmlUNIPEN.xsd"). It contains information about a writer and information labels, as well as chunks and traces (some embedded in the document, and others referenced from an external primitive file). For readability, this example does not use any of the compression options.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Edited with XML Spy v3.5 NT (http://www.xmlspy.com)
by Giovanni Seni (Motorola HIL) -->

<inkxmlUNIPEN xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="file://locahost/H:/inkxmlUNIPEN.xsd"
xmlns="file://localhost/H:/inkxml.xsd">
<dataBlock id="db1" hierarchy="CHARACTER">
<dataBlockInfo>
<source>Motorola HIL Palo Alto</source>
<date>2000-06-22</date>
</dataBlockInfo>
<writerBlock>
<writerInfo>
<hand>L</hand>
<sex>M</sex>
<country>UK</country>
<age>17</age>
<skill>good</skill>
<style>mixed</style>
</writerInfo>
<writerImage>
<label id="l01" source="me" score="1.0">I</label>
<traceRef uri="primitiveSample.xml#t01"/>
<traceRef uri="primitiveSample.xml#t02"/>
<traceRef uri="primitiveSample.xml#t03"/>
</writerImage>
<writerImage>
<label id="l02" source="you" score="1.0">I</label>
<chunk id="c01">
<trace id="t01">
232 -94 232 -95 232 -96 232 -97 232 -98 232 -99 232 -101
232 -102 232 -103 232 -105 232 -106 232 -108 232 -109 232
-110 232 -112 232 -113 232 -114 232 -116 232 -117 232 -118
232 -119 232 -120 232 -121 232 -122 232 -123 232 -124 232
-125 232 -126 232 -127 232 -129 232 -130 232 -132 232 -133
233 -134 233 -135 233 -136 233 -137 233 -138 233 -139 233
-140 233 -141 233 -140 233 -139
</trace>
<trace id="t02">
201 -151 202 -151 203 -151 204 -152 206 -152 208 -152 211
-152 214 -152 216 -152 219 -152 221 -152 223 -152 225 -152
227 -152 229 -152 230 -152 231 -152 233 -152 234 -153 236
-153 237 -153 239 -153 240 -153 241 -153 242 -153 243 -153
244 -153 245 -153 246 -153 247 -153 248 -153 250 -153 251
-153 253 -153 2254 -15 255 -153 257 -153 258 -153 260 -153
261 -153 263 -152 264 -152 265 -152 266 -152 266 -151
</trace>
</chunk>
</writerImage>
</writerBlock>
</dataBlock>
</inkxmlUNIPEN>

Appendix C—Glossary

Accuracy: (1) Percentage of words correctly transcribed by a handwriting recognition engine
(2) Error bounds of a coordinate measurement, relative to a physical reference frame

Annotation: Elements in an inkXML file that describe meta-data, or semantic information, about the traces themselves (See ink annotation)

Application-specific elements: Provide higher-level description of the digital ink captured in the primitive elements

Attribute (XML): Additional value associated with an XML element, such as ID, TIME, NAME, or VALUE. These appear in XML elements with the syntax name="value"

Bandwidth: Maximum frequency at which a digitizer can accurately track and report pen coordinates (or other channels). Bandwidth may be much lower than the sample rate.

Binary ink: Any file format for digital ink encoded as a sequence of bits but not consisting of a sequence of printable characters (text). After compression, binary ink is typically archived or transmitted.

Bounding box: A minimal-sized rectangle that encloses a group of traces

Canvas: Widget or window in a graphical user interface where ink is drawn during ink capture

Capture: Digitally recording physical measurements of handwriting, typically using a stylus

Chunks: A group of pen traces

Compression: The coding of data to save storage space or transmission time

Content: Actual data represented by an element in the XML document

Device: See digitizer

Digital ink: An electronic representation of the pen movement, pressure, and other characteristics of handwritten input using a digitizing device

Digital pen: A passive stylus containing no electronic components or an active stylus containing electronic components

Digitizer: A hardware device capable of sensing the digital pen tip position. The digital pen can be a passive stylus containing no electronic components, or an active stylus containing electronic components (a.k.a. tablet).

Electronic ink: See digital ink

Element: The basic construct in XML. An element begins with a "<element-type-name [attributes] >" tag and ends with a "</element-type-name>" tag. The intervening data is considered the element's content. An element without any content may also be written as <element-type-name [attributes] />.

Events: An action, either human or machine generated; for example, page turn, pen up, or ink color change

Force: The pressure applied to a writing implement, typically measured in grams, ounces, or newtons

Gesture: Collection of ink traces that indicate a certain action to be performed

Ideographic: A written language in which symbols represent words, rather than characters, such as Kanji (a.k.a. pictographic)

Ink: See digital ink

Ink annotation: A handwritten note or markup referencing (by proximity) another visible writing or printed matter

Ink archive: A collection of ink documents

Ink attribute: A basic named value for an ink trace, such as color and width

Ink document: A collection of one or more pages containing ink traces

Ink label: A descriptive or identifying word or phrase accompanying some ink traces

Ink point: An element in the stream of data recorded by a real-time digitizer of handwriting; for example, a tuple <x, y, pressure, tilt>

Ink-enabled system: A system capable of recording digital ink data

Instant messaging: Communication application allowing people to know the presence information from other parties and to participate in a near real-time chat session

Mapping: Transformation used to map from digitizer coordinates to canvas coordinates (See transform)

Page boundaries: The division of handwriting events by the page for which they are intended

Primitive elements: Set of rudimentary elements sufficient for all basic ink applications

Primitive file: Contains the raw output of the digitizer in temporal order

Recognition grammar: Specification of words and patterns of words that a recognizer should expect when processing input ink

Resolution: The minimal change or difference in a measurement (coordinate, force, tilt) that a digitizer reports

RMS noise: "Root mean square" noise-a measure of the actual ability of a digitizer to resolve position. Some digitizers report at high resolution, but have a lower effective resolution due to noise. RMS noise is usually linked to bandwidth with noise increasing at higher bandwidths.

Sample rate: The frequency at which a digitizer reports coordinate (or other) information. Sample rate is not always directly related to the bandwidth.

ScreenContext: ScreenContext is one of the primitive elements of the InkXML. It is used to reflect the characteristics of the display area and the correspondence between the display area and the ink-capturing device.

Semantic: A contextual interpretation of handwriting, such as character, word, sentence, and paragraph

Session: The span of time from a user beginning an interaction to ending the interaction with the system. The data gathered during this span of time.

Streaming: Continuously sending handwriting events over a communication channel

Stroke: Ink resulting from an elementary pen movement, such as bounded by two consecutive velocity extrema. A sequence of strokes constitutes a trace.

Tags: Description of a semantic component of an XML language

Temporally sequential: Items that occur next to each other in time

Tilt angle: The angle of the pen with respect to the writing surface, which is usually measured as angles of the projection onto x and y vertical planes

Trace: A complete pen-down movement bounded by two pen-up movements or a complete pen-up movement. A sequence of traces accumulates to meaningful units, such as characters and words.

Transform: A linear function applied to a point in order to stretch, rotate, and skew from one coordinate space into another, which is usually expressed in 2D as a 2 x 3 matrix. (See mapping)

Verification: Confirmation that a presented signature is the same as the one on file (a.k.a. one-to-one matching)

View: The portion of the canvas visible during ink capture



Copyright ©2002 IBM, Intel, Motorola, International Unipen Foundation. All Rights Reserved.