"BX" Proposal for XML Binary Format

Paul R. Pierce

Abstract

A proposal for a binary format for XML, specified in terms of a set of independent reversible transformations between standard XML and a binary form.

Revision

February 5, 2006

Contents

1. Introduction

XML is a very successful syntax for carrying information, being a careful compromise between many conflicting goals to hit the broad center of possible uses. One significant non-goal for XML was to be terse, so XML is completely text based and tends to be verbose. A binary format for XML would describe alternate syntax to carry information just the same way as XML but in a form more suitable for many valuable fringe uses for which standard XML is not suitable, often because it is too verbose.

Other than possibly a need for terseness these other uses do not consistently have a lot in common. To achieve sufficient generality to cover more than one corner of the possible uses, the binary format must either be extremely clever or it must be flexible. This proposal goes in the direction of flexibility by specifying a number of options. In fact, the proposal is little more than a specification of a set of optional transformations on standard XML that yield a more terse syntax, together with a simple mechanism for selecting from the options.

The transformations specified here were selected in an attempt to meet the requirements set forth in the report of the W3C XML Binary Characterization working group. Most either represent existing practice or are an obvious, simple invention to directly address a need of the format. Each transformation is reversible and as much as possible independent of the others.

2. Overview

The binary format for XML is specified in terms of a set of transformations applied to standard XML, resulting in a new syntax that carries most or all of the original textual format and all of the original data. The transformations are ordered, mostly independent, optional, and reversible. Thus one simple use model is to encode standard XML by applying a sequence of selected transformations, then later decode by applying the reverse transformations in reverse order. However, the results of many of the transformations are straightforward and can be constructed and used directly without producing or processing the textual form. These transformations might be called translations, and form the first part of the ordered sequence, the remaining transformations are for compression.

To accurately and efficiently translate names and different data types its necessary to use tables or some sort of magic; if tables are used they can be explicitly included in the format or implicitly derived from schema or example. To avoid excessive invention this proposal specifies the use of tables implicitly derived from unmodified XML-Schema documents. To better achieve some of the desired characteristics, there is also specified a schema for annotation of XML-Schema, described in the last section. The optional annotation provides more explicit direction for building the tables and configuring the transformations. This specification does not support use of DTD's unless they are first translated to and replaced by XML-Schema.

Only some of the transformations need schema information. Early translation transformations interpret the schema, later transformations inherit this information.

Some of the compression transformations are performed on blocks rather than the entire data set, when several of them are selected they all operate on the same blocks instead of choosing blocks independently.

Here are the specified transformations:

Use of the binary format and the set of transformations to be used are specified in the encoding attribute of the XML declaration.

3. Specification

This section contains the complete specification of the "BX" binary format. It is presented as a process of encoding a presumed XML document into the "BX" format.

3.1 Basic Mechanisms

3.1.1 Encoding Declaration

The binary format is declared as an encoding in the "encoding" attribute of the XML declaration. This is consistent with XML 1.x if one takes a broad view of what it means to encode a stream of characters. The declaration indicates use of the binary format, provides a version number to indicate the specific specification in use, and indicates which of the optional transformations are selected.

The encoding name consists of the prefix "BX-" followed by a two-part version number such as "1.0" followed by the transformation selection suffix. The transformation selection suffix consists of a dash followed by the capitalized first letters of the selected transformations, in order.

BxEncName ::= 'BX-' Version '-' ['D']['S']['N']['E']['V']['T']['O']['B']['Z']['H']

An example declaration that selects all of the translation transformations except Declaration might be:

<?XML version="1.1" encoding="BX-1.0-SNEVO"?>

3.1.2 Symbols

All the transformations are specified in terms of streams of symbols. Symbols are essentially integers. The initial symbols are the Unicode characters of the presumed input XML document. Transformations can produce different kinds of symbols. The final result of a sequence of transformations is the simple concatenation of the binary representation of all the symbols.

During intermediate processing steps individual symbols or groups of symbols might have associated type or other information. Any such information that remains at the end of the last selected transformation is lost, except for width. Each symbol has an associated width in bits that determines how much space it occupies in the final stream.

3.1.2.1 Characters

Unicode characters are symbols with width 21 bits and type "character". The range of values is that specified in XML 1.x and/or Unicode.

3.1.2.2 Octets

An octet is a symbol with width 8 bits and type "octet".  The range of values is 0-255.

3.1.2.3 Length symbols

Length symbols are used to define lists. They are only produced by translation transformations, in fact, the Name, Entity and Value translations always turn all character sequences into lists. They are integers that provide a count of the number output 8-bit bytes used by the following symbols in the list, excluding the length symbol itself. There is a global width for length symbols that depends on the maximum block size. For the default maximum block size the width is 21 bits. Length symbols are always either the global width or, if redundant, width 0.

Lists can cross block boundaries. In this case the list is segmented with a length symbol at the head of each segment. The most significant bit of the value of the length symbol indicates whether there are more segments. 0 means this is the final (or only) segment, 1 means there are more segments.

3.1.2.4 Count symbols

Count symbols are produced in compression transformations. Like length symbols they often tell how many symbols follow, but their value is the number of symbols rather than the number of bytes taken up by the symbols.

3.1.2.5 Name Tokens

Name tokens represent element, attribute and entity names and delimit their respective constructs. They are symbols of type "name token".The value range for name tokens is built up from three contiguous subranges, one each for elements, attributes and entities. Once the name token symbol space is fully evaluated the global name token width is defined as the number of bits required to represent the maximum value.

There are two special name tokens. Token 0 is the end-of-element token, and token 1 is the end-of-attributes token.

There is also a pre-defined element name token, transpose, that takes the first number (2) of the element name range.

The width of an individual name token can be either the global token width or zero. A token with zero width still has a value and other properties, but is redundant and so won't appear unless it is subsequently converted to some other symbol.

3.1.2.6 Type Tokens

Type tokens represent XML-Schema primitive types and their encodings, and are symbols of type "type token" and width 8 or 0, where as for name tokens width zero indicates that its presence is not required in the output. The possible values of type tokens are given in Table 1. Each type token also has a property of whether or not it is a list, encoded in its value as even for plain and odd for list. A type token is followed by value symbols that encode the value of an attribute or element body.

Value

Hex

Primitive Type

Encoding (Width)

0, 1

00, 01

string or anySimpleType

list of characters

2, 3

02, 03

nil

none

4, 5

04, 05

boolean

integer (1)

6, 7

06, 07

float

float (32)

8, 9

08, 09

double

double (64)

10, 11

0A, 0B

decimal

unsigned integer (8)

12, 13

0C, 0D

decimal

signed integer (8)

14, 15

0E, 0F

decimal

unsigned integer (16)

16, 17

10, 11

decimal

signed integer (16)

18, 19

12, 13

decimal

unsigned integer (32)

20, 21

14, 15

decimal

signed integer (32)

22, 23

16, 17

decimal

signed integer (64)

24, 25

18, 19

decimal

fraction (8), (32)

26, 27

1A, 1B

decimal

fraction (16), (64)

28, 29

1C. 1D

decimal

fraction (32), (64)

30, 31

1E, 1F

decimal

fraction (64), (64)

32, 33

20, 21

duration

fraction (32), (64)

34, 35

22, 23

dateTime

fraction (32), (64), tz (16)

36, 37

24, 25

time

fraction (32), (64)

38, 39

26, 27

date

integer (32)

40, 41

28, 29

gYearMonth

integer (16)

42, 43

2A, 2B

gYear

integer (16)

44, 45

2C, 2D

gMonthDay

integer (16)

46, 47

2E, 2F

gMonth

integer (8)

48, 49

30, 31

gDay

integer (8)

50, 51

32, 33

hexBinary or base64Binary

list of octets

52, 53

34, 35

anyURI

list of characters

54, 55

36, 37

QName

name token

56, 57

38, 39

QName

list of characters

58, 59

3A, 3B

enumeration

integer (8)

60, 61

3C, 3D

enumeration

integer (16)

62, 63

3E, 3F

enumeration

integer (32)

Table 1

3.1.2.7 Value symbols

Value symbols encode the numeric data values shown in Table 1.

Integer is a signed or unsigned twos-complement integer value of the given width.

Float is a 32-bit IEEE-754-1985 floating-point value. Double is a 64-bit IEEE-754-1985 floating-point value.

Fraction is a pair of integers, which when the second is divided by the first yield the value. The first (divisor) is unsigned and non-zero, the second (dividend) is signed. When a fraction is derived from a decimal string literal, the divisor will always be a power of ten, but in general divisor values are not so limited.

The primitive types duration, dateTime and time are encoded in seconds as fractions. For dateTime the fraction is followed by the timezone, which is a signed 16-bit integer. Zero timezone means UTC, while the special value -32768 means there is no timezone specified.

TODO: is there an issue for dateTime and date with regard to the epoch? Should the date instead be represented as gYear, gMonth, gDay? If so, should duration be the same?

3.1.2.8 Output

An implicit transformation in this specification is the rendering of symbols to the final bit stream of the binary format. Two of the transformations, Octet and Huffman, make this straightforward as their output is either octets or bits. However, they are both optional and if neither is selected there is a default rendering. This rendering is also referenced in creating blocks and length symbols.

Each symbol is padded on the left (most significant bits) with zeroes so that its width is a multiple of 8. Then it is placed in the stream in network byte order, most significant 8-bit bytes first.

All properties and type information are discarded. Symbols of width zero are skipped altogether.

3.1.3 Schema

Several transformations depend on or can make use of schema for the document. In all cases this is XML-Schema, optionally annotated with the BX Schema Annotation described in section 3.3. All such transformations are specified to operate properly with any valid XML-Schema. If no schema exists it is possible to automatically create a schema for a given document and either encapsulate it with the document (see XML Container below) or select the Octet transformation which preserves type and other needed information. If a DTD exists it might be possible to perform a canonical translation of the DTD into a schema, and use that.

The Schema transformation depends on a registration scheme for XML-Schema documents. The details are outside the scope of this specification, but we will assume a two-level mechanism similar to that used for PCI devices. A global entity such as the W3C registers parties who wish to register documents, and assigns each party a number. Each party then assigns its own numbers to its schema documents. The global entity maintains a web accessible database of the parties, accessible by number, providing for each a link to their web-accessible database of documents. Each party's database is accessible by document IRI and number, and given one provides the other and a link to the schema document itself.

The party number 0 is reserved for local, private and experimental use. For use in examples we will also assume that party number 1 is the W3C and the XML-Schema instance document is the W3C's document number 1.

3.1.4 XML Container

If it should happen that a recommendation is created for an XML container schema, then it would be possible to encapsulate required schema along with a document. In this case all translation transformations would recognize the container syntax and would, when necessary, treat encapsulated documents individually.

3.1.5 Blocks

Several transformations work on blocks of symbols. In addition, length symbols are limited to the block size, so transformations that produce length symbols also respect block boundaries. When any such transformations are selected, the document is divided into contiguous blocks at the input to the first transformation that uses blocks (or within the first transformation as an optimization) and all the transformations operate on the same blocks.

Unless configured differently by use of XML Schema Annotation, block boundaries are placed between the outermost elements possible. Maximum block size is 1 binary megabyte, 1048576 8-bit bytes, but can be configured to a smaller size.

Each block is preceded by a symbol of type "block" of width 20 bits. The value of the symbol is the length of the block in 8-bit bytes. These symbols do not participate in transformations except their value is kept up to date when the size of the block changes and some transformations alter the width to zero or back to 20.

3.2 Transformations

The transformations are specified in terms of how they map an input symbol stream to an output symbol stream. All the transformations are supposed to be reversible, so both encoding and decoding are possible as well as direct production and interpretation of translated symbol streams or serialization from and parsing to internal representations.

It is common practice in binary specifications to specify packed values in terms of bit fields, but here there are a few places where the packing is more complicated, so packed values are specified in terms of integer ranges and sums of integer products. In many cases it should be obvious that they are in fact bit fields.

3.2.1 Declaration Translation

The Declaration transformation recasts the XML declaration into six octets. The first two octets are the characters 'B' and 'X'. The third octet encodes the XML version as 16*<major version> + <minor version>. The fourth octet similarly encodes the binary format version. The fifth octet encodes the selected translation transformations except for Declaration, which is implicit when the BX prefix is present. The sixth octet similarly encodes the selected compression transformations.

The fifth octet flags are encoded as 16*<Schema> + 8*<Name> + 4*<Entity> + 2*<Value> + <Octet>.

The sixth octet flags are encoded as 8*<Transpose> + 4*<Block> + 2*<Ziv-Lempel> + <Huffman>.

Using the example above, the following are equivalent except in the use of the Declaration transformation, with the second shown in hex:

<?XML version="1.1" encoding="BX-1.0-SNEVO"?>

42 58 11 10 1F 00

All further transformations pass the declaration (in either form) through unmodified.

TODO: Should there be a bit for the Standalone Document Declaration? If so, a document that is not standalone might be identified by using the most significant bit of the sixth octet (128*<not standalone>).

3.2.2 Schema Translation

When registered XML-Schema are referenced in the document element, the Schema transformation removes no longer needed markup (i.e. xsi:schemaLocation, xsi:noNamespaceSchemaLocation) and inserts binary reference symbols after the declaration. In any case it inserts a length symbol before any reference symbols.

The Schema transformation introduces a special schema symbol, with no value and 0 width, followed by a list (beginning with a length symbol) of pairs of 32-bit unsigned integers. Each pair is the party number followed by the document number. The schema symbol and list immediately follows the XML declaration. The rest of the document remains unchanged except to eliminate from the document element any xsi:schemaLocation and xsi:noNameSchemaLocation attributes for the listed schema.

In the case of container documents, the Schema transformation is performed at the document element of each encapsulated document, with all schema added to the single list following the XML declaration.

3.2.3 Name Translation

Element and attribute names and syntax map to token symbols. Namespace declarations are eliminated.

The transformation numbers each element and attribute name found in the schema in order, implicitly building a table of names to numbers, and determining the global token properties.

All namespace declaration attributes are removed from the document element.

Within the document body, including the document element, each element name together with its introductory angle bracket and whitespace maps to the corresponding element token. If the presence of a specific element can be deduced from the schema, its token width is zero. If an element has a body the closing angle bracket and surrounding white space maps to the end-of-attributes token, and the element's end-tag maps to the end-of-element token. If there is no body the closing angle bracket and surrounding white space maps to the end-of-element token. The width is zero on the final trail of end-of-element tokens in the document.

The attributes are sorted into the order in which they appear in the schema declaration for the element, required attributes first and then optional attributes. Each attribute name and its following equals sign and surrounding white space maps to the corresponding attribute token. For required attributes the token width is zero.

All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol.

3.2.4 Entity Translation

Entity names and syntax map to token symbols and lists of characters.

Starting at the entity base number the transformation numbers each entity name found in the document, implicitly building a table of names to numbers, and determines or updates the global name token properties.

Note that if the Name transformation is not selected the entity base is 2.

All internal entity declarations are grouped at the head of the document, following all non-character symbols after the XML declaration. Each internal entity declaration maps to a sequence consisting of its name token followed by its replacement strings encoded as a string type token followed by a list of characters. Throughout the rest of the document entity references map to their corresponding name token.

All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol. Each entity reference must be surrounded on both sides with lists of characters, even if of zero length.

TODO: deal with external entities and parameter entities.

3.2.5 Value Translation

Element bodies and attribute values map to sequences consisting of a type symbol followed by an encoding of the value.

The transformation maps each attribute value and each simpleType in an element body to a type token and an encoding of the value. It does not rely on the Name transformation so must work either with text XML or with name tokens to find elements and attributes.

The type token reflects the ultimate primitive type (or enumeration) of the value according to the schema, and specifies the encoding of the value as shown in Table 1.  The actual value encountered does not affect the type token unless the simpleType of the value involves a union.

In several cases the type token indicates the width of the value encoding. This width is determined from the schema by examining the restrictions on the type. If there are no restrictions, the maximum width is used. The actual value does not affect the width specified by the type token.

If the type is restricted by enumeration then the type token is of enumeration type with the width determined by the number of possibilities. The value is encoded as a zero-based integer index that selects among the possibilities in the order they are listed in the schema.

In the case of a union the appropriate type token is selected from a list of possible primitive types according to the schema. If multiple enumerations are encountered in a union they are combined in the order found into a single enumeration by concatenating their value ranges.

Since, for instance, the string type is encoded as a list, a list of strings results in two layers of length symbols. The outermost length embraces the entire list while each of the inner lengths corresponds to its string's characters.

Fixed attributes are left with both type token and value symbol width 0. Default values are not inserted.

All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol.

3.2.6 Transposition for Compression

The transposition transformation rearranges data for better compression in following steps. It is useful in applications that involve a large number of samples, where each sample contains several more-or-less independent data items. A programming language equivalent of the transformation would be changing an array of structures into a set of arrays of each of the original structure's elements. This places related data items next to each other. Experience shows this can result in modest but significant improvement in the amount of compression that can be obtained from lossless compression algorithms such as the compression transforms specified here.

The transformation works on sequences of leaf elements (elements with no body or body of simpleType), all with the same element name, the same number of attributes and, for body or attributes of list type, the same number of items in each list. An exception is that at the end of the sequence the attributes and numbers of items in lists can fall off monotonically. By default the minimum length of such a sequence of elements is 8. Transposition does not cross block boundaries.

Sequences are identified in terms of token symbols and their associated value encodings. (This transform produces no mapping unless both Name and Value translation are selected; this also means the attributes are already mapped into the same order in every element.) Each eligible sequence of N elements, with maximum body or attribute list length of M, maps to a new transpose element containing M new elements with the original name, but with all attributes and the body retyped to lists (if they were not a list already) of maximum length N. Each attribute and the body of the first new element has as value a list of the N (first) values of the corresponding attribute or body of the original element. In the second and subsequent new elements only those attributes or the body that were originally lists are present, and list the up to N second or subsequent values from those lists.

As a comprehensive example, consider the following fragment of XML. Note that the last sample element is not eligible because it reintroduces an attribute and the length of one of its lists increased.

<testrun>

   <sample a="11 12 13" b="11 12" c="1" d="1" e="1">1</sample>

   <sample a="21 22 23" b="21 22" c="2" d="2" e="2">2</sample>

   <sample a="31 32 33" b="31 32" c="3" d="3">3</sample>

   <sample a="41 42 43" b="41 42" c="4" d="4">4</sample>

   <sample a="51 52 53" b="51 52" c="5" d="5">5</sample>

   <sample a="61 62 63" b="61 62" c="6" d="6">6</sample>

   <sample a="71 72 73" b="71" c="7" d="7">7</sample>

   <sample a="81 82 83" b="81" c="8" d="8">8</sample>

   <sample a="91 92 93 94" b="91" c="9" d="9" e="9">9</sample>

</testrun>

The fragment maps to the equivalent of this XML:

<testrun>

   <bx:transpose>

       <sample a="11 21 31 41 51 61 71 81"

               b="11 21 31 41 51 61 71 81"

               c="1 2 3 4 5 6 7 8"

               d="1 2 3 4 5 6 7 8"

               e="1 2">

           1 2 3 4 5 6 7 8

       </sample>

       <sample a="12 22 32 42 52 62 72 82"

               b="12 22 32 42 52 62"/>

       <sample a="13 23 33 43 53 63 73 83"/>

   </bx:transpose>

   <sample a="91 92 93 94" b="91" c="9" d="9" e="9">9</sample>

</testrun>

3.2.7 Octet Translation

The Octet translation transformation provides an alternative to the standard output mechanism. It is less suited to applications that want static formatted data and more suited to streaming in that it uses variable length sequences that depend on the actual values encountered. In addition it carries enough type information that a document might be reconstructed without reference to a schema.

Past the XML declaration all symbol sequences (except block symbols) map to sequences of octets, preserving redundant type and length information. Each symbol or group of symbols translates to an introductory octet followed by zero or more additional octets. Table 2 lists the values of the introductory octet and what they mean. Unless otherwise noted, a range of values for the introductory octet means that it carries the most significant bits of the item value. Further data octets carry the rest of the value in network byte order. The definition of following octets is recursive, so one introductory octet might be followed by shorter sequences containing their own introductory octets.

Characters not preceded by token symbols map to a short sequence beginning with an introductory octet in the range 0-144.

A schema symbol and its associated list of registered schema numbers maps to a sequence beginning with an introductory octet in the range 145-147.

Length symbols are discarded. Lists of zero length are discarded.

Block length symbols are retained within the stream of octet symbols but set to 0 width.

A name token maps to an introductory octet with value 148-150 for the first occurrence, followed by an integer sequence and a string sequence listing the name; or for any subsequent occurrence value 151-153  followed by an integer sequence. The introductory octet indicates whether it is an element, attribute or entity name, and for the first occurrence of an entity name there is also a string carrying the replacement text. The end-of-attributes and end-of-element tokens have their own introductory octets, 154 and 155.

A type token and its following value map to sequences with different introductory octet ranges depending on the type. The string type maps to a sequence consisting of a length integer (providing the number of characters) followed by a sequence as described above for each character. This string sequence is in turn reused inside sequences for name tokens. Similarly the integer sequence is used for integer-valued decimal types as well as inside many of the other sequences.

If a type token is for a list it and its values map to a sequence beginning with an introductory octet with value 156 followed by an integer sequence that provides the number of items in the list, then followed by each item translated as for an individual item.

The length of an octet sequence depends only on its value, being the shortest length that will encode the value. Any length information carried from the schema by type tokens is discarded.

Introductory octet values

Hex

Symbols replaced

Indicated type

Following octets

0-127

00 - 7F

character in the range hex 00-7F

character

none

128

80

character in the range hex 0080-FFFF

character

2 data

129-144

81 - 90

character in the range hex 100000-10FFFF

character

2 data

145

91

schema and list of length 1 pair

schema

2 integer

146

92

schema and list of length 2 pair

schema

4 integer

147

93

schema and longer list of length n pairs

schema

integer=n, 2*n integer

148

94

(unused)

none

none

149

95

1st element name token

element

integer, string

150

96

1st attribute name token

attribute

integer, string

151

97

entity definition name token and list of characters

entity declaration

integer, string, string

152

98

element name token

element

integer

153

99

attribute name token

attribute

integer

154

9A

entity name token

entity reference

integer

155

9B

name token

end of attributes

none

156

9C

name token

end of element

none

157

9D

list of length n

list

integer=n, n value items

158

9E

nil type token

nil

none

159

9F

boolean type token, value=false

boolean

none

160

A0

boolean type token, value=true

boolean

none

161

A1

float type token and value symbol

float

4 data

162

A2

double type token and value symbol

double

8 data

163

A3

duration type token and value symbols

duration

2 integer

164

A4

dateTime type token and value symbols, no timezone

dateTime

2 integer

165

A5

dateTime type token and value symbols with timezone

dateTime

3 integer

166

A6

time type token and value symbols

time

2 integer

167

A7

date type token and value symbol

date

integer

168

A8

gYearMonth type token and value symbol

gYearMonth

integer

169

A9

gYear type token and value symbol

gYear

integer

170

AA

gMonthDay type token and value symbol

gMonthDay

integer

171

AB

gDay type token and value symbol

gDay

integer

172

AC

gMonth type token and value symbol

gMonth

integer

173

AD

binary type token and octet list of length n

binary data

integer=n, n data

174

AE

anyURI type token and string

anyURI

string

175

AF

QName type token and name token

QName

integer

176

B0

QName type token and string

QName

string

177

B1

string or anySimpleType type token and list of n characters or (string)

string

integer=n, n character

178

B2

fractional decimal type token and value symbol

decimal

2 integer

200

C8

decimal type token and value symbol or (integer)

decimal

8 data (signed 2's complement integer)

201

C9

decimal type token and value symbol or (integer)

decimal

7 data (signed 2's complement integer)

202

CA

decimal type token and value symbol or (integer)

decimal

6 data (signed 2's complement integer)

203

CB

decimal type token and value symbol or (integer)

decimal

5 data (signed 2's complement integer)

204

CC

decimal type token and value symbol or (integer)

decimal

4 data (signed 2's complement integer)

205

CD

decimal type token and value symbol or (integer)

decimal

3 data (signed 2's complement integer)

206

CE

decimal type token and value symbol or (integer)

decimal

2 data (unsigned integer)

207

CF

decimal type token and value symbol or (integer)

decimal

2 data (signed 2's complement integer)

208-223

D0 - DF

decimal type token and value symbol or (integer)

decimal

1 data (signed 2's complement integer, including 4 bits from intro octet)

224-255

E0 - FF

decimal type token and value symbol or (integer)

decimal

none (intro octet represents range 0-32)

Table 2

3.2.8 Block Sort for Compression

The Block Sort transformation provides the same essential mechanism as that used in the bzip2 utility. It maps a block of symbols into a new sequence based on sorting. While this does not reduce their size, it significantly improves the chances for good compression in later steps.

Since the transformation destroys the original order of the symbols it is no longer possible to infer the width of each symbol. Thus, it must be followed by a transformation that encodes symbol width. Also, it benefits from run-length encoding (or better yet, simple dictionary compression) before the sort and it is of little use without dictionary compression following the sort. So this transformation cannot be used without Ziv-Lempel and Huffman coding. If the Block Sort transformation is selected without both of Ziv-Lempel and Huffman coding, Block Sort becomes the identity mapping.

The transformation remaps each block independently.

The block is first transformed using the Ziv-Lempel transformation below. The encoder need only search for runs, duplicated sequences immediately following the sequence they duplicate. The count symbols from the Ziv-Lempel transformation map to the beginning of the new block. Following the count symbols the original symbols in the block are sorted as follows.

Every symbol in the block is associated with a (reverse order) sequence beginning with the preceding symbol and going backwards, wrapping around from the beginning to the end of the block to finish back with the symbol itself. The new mapping orders the symbols by sorting their associated sequences by (unsigned integer) symbol value in lexical order such that symbols earlier in each sequence have greater priority. For the purpose of sorting, each symbol whose value is not a non-negative integer (i.e. float, double and negative integers) is treated as an unsigned binary integer of the symbol's width.

A new count symbol is prepended to the sequence (after the block symbol and Ziv-Lempel count symbols and before the new mapping). The value is the 0-based index of the symbol in the new mapping that was the first symbol in the original mapping.

Oddly enough, this transformation is completely reversible. A very short example will show how it works. Consider the sequence "bcab". These symbols correspond to the following reversed and wrapped sequences, followed by the sequences sorted with the  symbol highlighted:

Original Symbols

Reverse Wrapped Sequences

Sequences Sorted

Output Symbols

Decoding Columns

Reconstruction

b

bacb

acbb

b

ab

 ...ab

c

bbac

bacb

b*

bb

a

cbba

bbac

c

bc

  .bc  

b

acbb

cbba

a

ca

 ..ca 

The new mapping is "bbca" and the 0-based index of the original first symbol (starred) is 1. To decode, first sort all the individual symbols into one column, then list the new mapping in a second column. The pair of symbols in each row is in the original order, now you just have to figure out how to connect the pairs. Start with the original first symbol in the second column of the indexed row, the starred "b". Note that its the second "b" in the column. Now find the second "b" in the first column. This is your first pair. Take the following "c" and repeat the process to find the next pair. Repeat until you recover the entire sequence.

3.2.9 Ziv-Lempel Compression

The Ziv-Lempel compression transformation remaps the symbols within a block to a new, hopefully smaller, set of symbols. It is an LZ77 algorithm related to the dictionary algorithm used in gzip.

The transformation makes use of count symbols. It segments the original symbols within each block into sequences and remaps pairs of sequences (the second of which is a duplicate of any sequence earlier in the block) into a triple of count symbols followed, later, by the first sequence of the pair. (Sometimes the first sequence is nil). All the triples of count symbols are mapped to the front of the block, all the remaining original sequences of symbols follow.

Each triple of count symbols consists of, first, a count of unduplicated symbols (the number of symbols in the first sequence of the original pair); second, a count of duplicated symbols (the number of symbols in the second sequence of the original pair, which duplicates a sequence earlier in the block); third, the offset (in symbols) from the earlier duplicated sequence to the second sequence of the pair.

When a duplicate sequence is immediately followed by another, the second is treated as the second sequence of a new pair, where the first sequence is nil. In that case the first count symbol if its triple has value 0. In the same way, if the last sequence in the block is not a duplicate it is treated as the first sequence of a pair where the second sequence is nil, and in its triple the second count symbol has value 0.

There is always a final triple where the second count symbol has value 0, whether or not the block ends with a duplicate sequence. In this final triple the third symbol is not present, so it actually consists of only two symbols.

Here is an example of a block of letters segmented into unduplicated and duplicated segments, with nil segments depicted with a dash, and the count triples and symbols produced by the mapping:

abcdeeeeabcdf

abcde eee    - abcd    f -

 5 3 1       0 4 8    1 0   abcde f

In finding duplicate sequences symbol values may be considered equal if  they have the same unsigned integer value as for block sort. Any more strict definition of  equality is also acceptable.

3.2.10 Huffman Coding Compression

The Huffman Coding compression transformation remaps the symbols within a block to a stream of bits. The mapping process involves alternating steps of Huffman Coding (producing bit counts, a list of symbols, and a bit stream) and Ziv-Lempel Compression as described above (producing count triples and a list of symbols). After the penultimate step there is a list of mixed count symbols followed by a bit stream. The count symbols are Huffman coded using a predefined symbol list and set of bit counts into a bit stream, resulting in a complete Huffman coded bit stream as the final mapping.

TODO: Actually specify the algorithms and/or their input and output.

3.3 XML Schema Annotation

All of the transformations are defined to work with unmodified XML-Schema. However, the default mappings do not provide sufficient flexibility to cover all of the possible uses of binary XML. To cover more possible uses there is an annotation schema to control the use of binary XML and guide the transformations. It also provides a namespace for the transpose element used in the Transpose transformation.

The annotation schema defines fragments and attributes to be inserted in the schema for a class of target binary XML documents. The binary element is for overall control and configuration and may be inserted anywhere at the top level of the schema. The others are for annotation within the schema to control how the transformations apply to various parts of target documents.

Some control of target documents is possible within standard XML-Schema, as described in some of the transformations above, without use of this annotation schema. The methods are described again here along with the new methods available with annotation so that all schema options are described in one place.

3.3.1 Overall Configuration

Here is an example of a fragment that can appear at the top level of a schema to constrain and configure the binary format of a document.

<bx:binary>

 <bx:required bx:encoding="BX-1.0-DSNEVO"/>

 <bx:optional bx:encoding="BX-1.0-ZH"/>

 <bx:transform bx:name="schema">

   <bx:property bx:name="local" bx:value="true"/>

   <bx:property bx:name="global" bx:value="false"/>

 </bx:transform>

 <bx:property bx:name="blocksize" bx:value="65536"/>

</bx:binary>

This shows the two parts of the binary element.

The first part consists of the required and optional elements. The encoding attribute of the required element shows the minimum version and minimum set of selected transformations to be used in encoding a target document. The encoding attribute of the optional element shows the maximum version and additional allowed transformations. In either element the version is optional and the set of transformations can be empty, or can be "*" to indicate all transformations. In any case both "-" characters must be present. In the optional element the required transformations can be duplicated or not without changing the meaning. Note that the Declaration transformation can be specified here, even though it never appears in an a ctual XML declaration.

The second part consists of transform elements that provide configuration information to the transforms in the form of a property list consisting of property elements. Global configuration is provided by property elements outside a transform element.

TODO: Document some properties.

3.3.2 Embedded Annotation

The enable and disable elements control the application of individual transforms (given in the name attribute) to schema-defined types.

The index attribute may be used in element and attribute type declarations to fix the value of the corresponding name token in the Name transformation.

Without using the annotation schema it is possible to determine the width of simpleType value symbols by restrictions on length and/or minimum and maximum value.

TODO: Write the actual schema document.

4. Examples

4.1 Purchase Order

Here are examples of all of the translation transformations performed on the Purchase Order example from the XML-Schema tutorial.

4.1.1 Reference Documents

For reference, here is the example document and its schema. The example document has been modified to include an internal entity and reference to the schema.

Following is the schema annotated in such a way as to produce the same results as in this example without annotation. This shows the use of some of the annotations and also shows the default numbering of name tokens.

4.1.1.1 Purchase Order Document

<?xml version="1.0"?>

<!ENTITY california "CA">

<purchaseOrder orderDate="1999-10-20"

xmlns="http://example.com:po"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="po.xsd">

  <shipTo country="US">

     <name>Alice Smith</name>

     <street>123 Maple Street</street>

     <city>Mill Valley</city>

     <state>&california;</state>

     <zip>90952</zip>

  </shipTo>

  <billTo country="US">

     <name>Robert Smith</name>

     <street>8 Oak Avenue</street>

     <city>Old Town</city>

     <state>PA</state>

     <zip>95819</zip>

  </billTo>

  <comment>Hurry, my lawn is going wild!</comment>

  <items>

     <item partNum="872-AA">

        <productName>Lawnmower</productName>

        <quantity>1</quantity>

        <USPrice>148.95</USPrice>

        <comment>Confirm this is electric</comment>

     </item>

     <item partNum="926-AA">

        <productName>Baby Monitor</productName>

        <quantity>1</quantity>

        <USPrice>39.98</USPrice>

        <shipDate>1999-05-21</shipDate>

     </item>

  </items>

</purchaseOrder>

4.1.1.2 Purchase Order Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

 <xsd:annotation>

   <xsd:documentation xml:lang="en">

    Purchase order schema for Example.com.

    Copyright 2000 Example.com. All rights reserved.

   </xsd:documentation>

 </xsd:annotation>

 <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

 <xsd:element name="comment" type="xsd:string"/>

 <xsd:complexType name="PurchaseOrderType">

   <xsd:sequence>

     <xsd:element name="shipTo" type="USAddress"/>

     <xsd:element name="billTo" type="USAddress"/>

     <xsd:element ref="comment" minOccurs="0"/>

     <xsd:element name="items"  type="Items"/>

   </xsd:sequence>

   <xsd:attribute name="orderDate" type="xsd:date"/>

 </xsd:complexType>

 <xsd:complexType name="USAddress">

   <xsd:sequence>

     <xsd:element name="name"   type="xsd:string"/>

     <xsd:element name="street" type="xsd:string"/>

     <xsd:element name="city"   type="xsd:string"/>

     <xsd:element name="state"  type="xsd:string"/>

     <xsd:element name="zip"    type="xsd:decimal"/>

   </xsd:sequence>

   <xsd:attribute name="country" type="xsd:NMTOKEN"

                  fixed="US"/>

 </xsd:complexType>

 <xsd:complexType name="Items">

   <xsd:sequence>

     <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">

       <xsd:complexType>

         <xsd:sequence>

           <xsd:element name="productName" type="xsd:string"/>

           <xsd:element name="quantity">

             <xsd:simpleType>

               <xsd:restriction base="xsd:positiveInteger">

                 <xsd:maxExclusive value="100"/>

               </xsd:restriction>

             </xsd:simpleType>

           </xsd:element>

           <xsd:element name="USPrice"  type="xsd:decimal"/>

           <xsd:element ref="comment"   minOccurs="0"/>

           <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>

         </xsd:sequence>

         <xsd:attribute name="partNum" type="SKU" use="required"/>

       </xsd:complexType>

     </xsd:element>

   </xsd:sequence>

 </xsd:complexType>

 <!-- Stock Keeping Unit, a code for identifying products -->

 <xsd:simpleType name="SKU">

   <xsd:restriction base="xsd:string">

     <xsd:pattern value="\d{3}-[A-Z]{2}"/>

   </xsd:restriction>

 </xsd:simpleType>

</xsd:schema>

4.1.1.3 Annotated Purchase Order Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:bx="http://example.org/BX">

 <xsd:annotation>

   <xsd:documentation xml:lang="en">

    Purchase order schema for Example.com.

    Copyright 2000 Example.com. All rights reserved.

   </xsd:documentation>

 </xsd:annotation>

 <bx:binary>

   <bx:required bx:encoding="BX-1.0-DSNEV"/>

   <bx:optional bx:encoding="BX-1.0-O"/>

 </bx:binary>

 <xsd:element name="purchaseOrder" bx:index="3" type="PurchaseOrderType"/>

 <xsd:element name="comment" bx:index="4" type="xsd:string"/>

 <xsd:complexType name="PurchaseOrderType">

   <xsd:sequence>

     <xsd:element name="shipTo" bx:index="5" type="USAddress"/>

     <xsd:element name="billTo" bx:index="6" type="USAddress"/>

     <xsd:element ref="comment" minOccurs="0"/>

     <xsd:element name="items" bx:index="7" type="Items"/>

   </xsd:sequence>

   <xsd:attribute name="orderDate" bx:index="18" type="xsd:date"/>

 </xsd:complexType>

 <xsd:complexType name="USAddress">

   <xsd:sequence>

     <xsd:element name="name" bx:index="8" type="xsd:string"/>

     <xsd:element name="street" bx:index="9" type="xsd:string"/>

     <xsd:element name="city" bx:index="10" type="xsd:string"/>

     <xsd:element name="state" bx:index="11" type="xsd:string"/>

     <xsd:element name="zip" bx:index="12" type="xsd:decimal"/>

   </xsd:sequence>

   <xsd:attribute name="country" bx:index="19" type="xsd:NMTOKEN"

                  fixed="US"/>

 </xsd:complexType>

 <xsd:complexType name="Items">

   <xsd:sequence>

     <xsd:element name="item" bx:index="13" minOccurs="0" maxOccurs="unbounded">

       <xsd:complexType>

         <xsd:sequence>

           <xsd:element name="productName" bx:index="14" type="xsd:string"/>

           <xsd:element name="quantity" bx:index="15">

             <xsd:simpleType>

               <xsd:restriction base="xsd:positiveInteger">

                 <xsd:maxExclusive value="100"/>

               </xsd:restriction>

             </xsd:simpleType>

           </xsd:element>

           <xsd:element name="USPrice" bx:index="16" type="xsd:decimal"/>

           <xsd:element ref="comment"   minOccurs="0"/>

           <xsd:element name="shipDate" bx:index="17" type="xsd:date" minOccurs="0"/>

         </xsd:sequence>

         <xsd:attribute name="partNum" bx:index="20" type="SKU" use="required"/>

       </xsd:complexType>

     </xsd:element>

   </xsd:sequence>

 </xsd:complexType>

 <!-- Stock Keeping Unit, a code for identifying products -->

 <xsd:simpleType name="SKU">

   <xsd:restriction base="xsd:string">

     <xsd:pattern value="\d{3}-[A-Z]{2}"/>

   </xsd:restriction>

 </xsd:simpleType>

</xsd:schema>

4.1.2 Individual Transformations

The following sections show the mapping produced by each of the translation transformations (except Octet) selected individually, before the final output rendering.

Characters are shown naturally as above, symbols are shown in curly braces "{}" with the type first followed by the decimal value (unless preceded by "hex") and then the width (in bits) in parentheses. Comments to help keep track of whats happening follow "//" on each line, they are not part of the transformation, nor are any additional line breaks. Ellipses (...) indicate the rest of the document follows without modification.

4.1.2.1 Declaration

{octet hex 42 (8)}

{octet hex 58 (8)}

{octet hex 10 (8)}

{octet hex 10 (8)}

{octet hex 00 (8)}

{octet hex 00 (8)}

<!ENTITY california "CA">

<purchaseOrder orderDate="1999-10-20"

xmlns="http://example.com:po"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="po.xsd">

  <shipTo country="US">

...

4.1.2.2 Schema

We will assume the po.xsd schema is registered privately as document number 15.

<?xml version="1.0"?>

{schema 0 (0)}

{length 8 (21)}

{integer 0 (32)}

{integer 15 (32)}

<!ENTITY california "CA">

<purchaseOrder orderDate="1999-10-20"

xmlns="http://example.com:po"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

>

  <shipTo country="US">

...

4.1.2.3 Name

We will assume the noNamespaceSchemaLocation attribute is name 100 in a canonical schema for XML-Schema.

Notice that because the schema provides very strict guidance on the sequence of the content, most of the name tokens can be inferred from context and so have a width of 0.

<?xml version="1.0"?>

{block n (20)}

{length 24 {21)}<!ENTITY california "CA">

{name token 3 (0)}        // <purchaseOrder>

{name token 18 (7)}       // orderDate

{length 10 (21)}1999-10-20

{name token 121 (7)}      // xsi:noNamespaceSchemaLocation

{length 6 (21)}po.xsd

{name token 1 (7)}

{name token 5 (0)}        // <shipTo>

{name token 19 (0)}       // country

{length 2 (21)}US

{name token 1 (0)}

{name token 8 (0)}        // <name>

{name token 1 (0)}

{length 11 (21)}Alice Smith

{name token 0 (0)}        // </name>

{name token 9 (0)}        // <street>

{name token 1 (0)}

{length 16 (21)}123 Maple Street

{name token 0 (0)}        // </street>

{name token 10 (0)}       // <city>

{name token 1 (0)}

{length 11 (21)}Mill Valley

{name token 0 (0)}        // </city>

{name token 11 (0)}       // <state>

{name token 1 (0)}

{length 12 (21)}&california;

{name token 0 (0)}        // </state>

{name token 12 (0)}       // <zip>

{name token 1 (0)}

{length 5 (21)}90952

{name token 0 (0)}        // </zip>

{name token 0 (0)}        // </shipTo>

{name token 6 (0)}        // <billTo>

{name token 19 (0)}       // country

{length 2 (21)}US

{name token 1 (0)}

{name token 8 (0)}        // <name>

{name token 1 (0)}

{length 12 (21)}Robert Smith

{name token 0 (0)}        // </name>

{name token 9 (0)}        // <street>

{name token 1 (0)}

{length 12 (21)}8 Oak Avenue

{name token 0 (0)}        // </street>

{name token 10 (0)}       // <city>

{name token 1 (0)}

{length 8 (21)}Old Town

{name token 0 (0)}        // </city>

{name token 11 (0)}       // <state>

{name token 1 (0)}

{length 2 (21)}PA

{name token 0 (0)}        // </state>

{name token 12 (0)}       // <zip>

{name token 1 (0)}

{length 5 (21)}95819

{name token 0 (0)}        // </zip>

{name token 0 (0)}        // </billTo>

{name token 4 (7)}        // <comment>

{name token 1 (0)}

{length 29 (21)}Hurry, my lawn is going wild!

{name token 0 (0)}        // </comment>

{name token 7 (7)}        // <items>

{name token 1 (0)}

{name token 13 (7)}       // <item>

{name token 20 (0)}       // partNum

{length 6 (21)}872-AA

{name token 1 (0)}

{name token 14 (0)}       // <productName>

{name token 1 (0)}

{length 9 (21)}Lawnmower

{name token 0 (0)}        // </productName>

{name token 15 (0)}       // <quantity>

{name token 1 (0)}

{length 1 (21)}1

{name token 0 (0)}        // </quantity>

{name token 16 (0)}       // <USPrice>

{name token 1 (0)}

{length 6 (21)}148.95

{name token 0 (0)}        // </USPrice>

{name token 4 (7)}        // <comment>

{name token 1 (0)}

{length 24 (21)}Confirm this is electric

{name token 0 (0)}        // </comment>

{name token 0 (7)}        // </item>

{name token 13 (7)}       // <item>

{name token 20 (0)}       // partNum

{length 6 (21)}926-AA

{name token 1 (0)}

{name token 14 (0)}       // <productName>

{name token 1 (0)}

{length 12 (21)}Baby Monitor

{name token 0 (0)}        // </productName>

{name token 15 (0)}       // <quantity>

{name token 1 (0)}

{length 1 (21)}1

{name token 0 (0)}        // </quantity>

{name token 16 (0)}       // <USPrice>

{name token 1 (0)}

{length 5 (21)}39.98

{name token 0 (0)}        // </USPrice>

{name token 17 (7)}       // <shipDate>

{name token 1 (0)}

{length 10 (21)}1999-05-21

{name token 0 (0)}        // </shipDate>

{name token 0 (0)}        // </item>

{name token 0 (0)}        // </items>

{name token 0 (0)}        // </purchaseOrder>

4.1.2.4 Entity

<?xml version="1.0"?>

{block n (20)}

{name token 2 (2)}

{type token 0 (8)}

{length 2 (21)}CA

{length 305 (21)}<purchaseOrder orderDate="1999-10-20"

xmlns="http://example.com:po"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="po.xsd">

  <shipTo country="US">

     <name>Alice Smith</name>

     <street>123 Maple Street</street>

     <city>Mill Valley</city>

     <state>

{name token 2 (2)}

{length 728 (21)}</state>

     <zip>90952</zip>

  </shipTo>

...

4.1.2.5 Value

We will assume the Unix epoch for dates.

Note that since the schema defines all types without unions, all of the type tokens are redundant and so have width 0 instead of 8.

<?xml version="1.0"?>

{block n (20)}

{length 26 (21)}<!ENTITY california "CA">

<purchaseOrder orderDate=

{type token 38 (0)}

{ value 10915 (32)}

xmlns=

{type token 0 (0)}

{length 21 (21)}http://example.com:po

xmlns:xsi=

{type token 0 (0)}

{length 41 (21)}http://www.w3.org/2001/XMLSchema-instance

xsi:noNamespaceSchemaLocation=

{type token 0 (0)}

{length 6 (21)}po.xsd

>

  <shipTo country=

{type token 0 (0)}

{length 2 (0)}US

>

     <name>

{type token 0 (0)}

{length 11 (21)}Alice Smith

</name>

     <street>

{type token 0 (0)}

{length 16 (21)}123 Maple Street

</street>

     <city>

{type token 0 (0)}

{length 11 (21)}Mill Valley

</city>

     <state>

{type token 0 (0)}

{length 12 (21)}&california;

</state>

     <zip>

{type token 0 (0)}

{length 5 (21)}90952

</zip>

  </shipTo>

  <billTo country=

{type token 0 (0)}

{length 2 (0)}US

>

     <name>

{type token 0 (0)}

{length 12 (21)}Robert Smith

</name>

     <street>

{type token 0 (0)}

{length 12 (21)}8 Oak Avenue

</street>

     <city>

{type token 0 (0)}

{length 8 (21)}Old Town

</city>

     <state>

{type token 0 (0)}

{length 2 (21)}PA

</state>

     <zip>

{type token 0 (0)}

{length 8 (21)}95819

</zip>

  </billTo>

  <comment>

{type token 0 (0)}

{length 29 (21)}Hurry, my lawn is going wild!

</comment>

  <items>

     <item partNum=

{type token 0 (0)}

{length 6 (0)}872-AA

>

        <productName>

{type token 0 (0)}

{length 9 (21)}Lawnmower

</productName>

        <quantity>

{type token 10 (0)}

{value 1 (8)}

</quantity>

        <USPrice>

{type token 30 (0)}

{value 100 (64)}

{value 14895 (64)}

</USPrice>

        <comment>

{type token 0 (0)}

{length 24 (21)}Confirm this is electric

</comment>

     </item>

     <item partNum=

{type token 0 (0)}

{length 6 (0)}926-AA

>

        <productName>

{type token 0 (0)}

{length 12 (21)}Baby Monitor

</productName>

        <quantity>

{type token 10 (0)}

{value 1 (8)}

</quantity>

        <USPrice>

{type token 30 (0)}

{value 100 (64)}

{value 3998 (64)}

</USPrice>

        <shipDate>

{type token 38 (0)}

{value 10763 (32)}

</shipDate>

     </item>

  </items>

</purchaseOrder>

4.1.3 Combined Transformations

In the following table all the above translation transformations are selected together, showing the original XML text, the resulting mapping into symbols (and the source of the symbols), the default rendering of those symbols (in hex), and the alternative rendering if the Octets transformation is also selected.

Original XML

Symbols

Source

Output

Octets

<?xml version="1.0"?>

{octet hex 42 (8)}

{octet hex 58 (8)}

{octet hex 10 (8)}

{octet hex 10 (8)}

{octet hex 00 (8)}

{octet hex 00 (8)}

Declaration

42

58

10

10

1E

00

42

58

10

10

1F

00

{schema 0 (0)}

{length 8 (21)}

{integer 0 (32)}

{integer 15 (32)}

Schema

00 00 08

00 00 00 00

00 00 00 0F

91

E0

EF

<!ENTITY california "CA">

{name token 21 (2)}

{type token 0 (8)}

{length 2 (21)}CA

Entity

00 15

00

00 00 02

43 41

97

F5

B1 EA 63 61 6C 69 66 6F 72 6E 69 61

B1 E2 43 41

<purchaseOrder

{name token 3 (0)}

Name

95

E3

B1 ED 70 75 72 63 68 61 73 65 4F 72 64 65 72

orderDate=

{name token 18 (7)}

Name

12

96

F2

B1 E9 6F 72 64 65 72 44 61 74 65

"1999-10-20"

{type token 38 (0)}

{ value 10915 (32)}

Value

00 00 2A A3

A7

CF 2A A3

>

{name token 1 (7)}

Name

01

9B

<shipTo

{name token 5 (0)}

Name

95

E5

B1 E6 73 68 69 70 54 6F

>

{name token 1 (0)}

Name

01

9B

<name

{name token 8 (0)}

Name

95

E8

B1 E4 6E 61 6D 65

>

{name token 1 (7)}

Name

01

9B

Alice Smith

{type token 0 (0)}

{length 11 (21)}Alice Smith

Value

00 00 0B

41 6C 69 63 65 20 53 6D 69 74 68

B1 EB 41 6C 69 63 65 20 53 6D 69 74 68

</name>

{name token 0 (0)}

Name

9C

<street>

{name token 9 (0)}

Name

95

E9

B1 E5 73 74 72 65 65 74

>

{name token 1 (0)}

Name

9B

123 Maple Street

{type token 0 (0)}

{length 16 (21)}123 Maple Street

Value

00 00 10

31 32 33 20 4D 61 70 6C 65 20 53 74 72 65 65 74

B1 F0 31 32 33 20 4D 61 70 6C 65 20 53 74 72 65 65 74

</street>

{name token 0 (0)}

Name

9C

<city

{name token 10 (0)}

Name

95

EA

B1 E4 63 69 74 79

>

{name token 1 (0)}

Name

9B

Mill Valley

{type token 0 (0)}

{length 11 (21)}Mill Valley

Value

00 00 0B

3E 4D 69 6C 6C 20 56 61 6C 6C 65 79

B1 EB 3E 4D 69 6C 6C 20 56 61 6C 6C 65 79

</city>

{name token 0 (0)}

Name

9C

<state

{name token 11 (0)}

Name

95

EB

B1 E5 73 74 61 74 65

>

{name token 1 (0)}

Name

9B

{length 0 (21)}

Entity

00 00 00

&california;

{name token 21 (7)}

Entity

15

9A

F5

{length 0 (21)}

Entity

00 00 00

</state>

{name token 0 (0)}

Name

9C

<zip

{name token 12 (0)}

Name

95

EC

B1 E3 7A 69 70

>

{name token 1 (0)}

Name

9B

90952

{type token 0 (0)}

{length 5 (21)}90952

Value

00 00 05

39 30 39 35 32

B1 E5 39 30 39 35 32

</zip>

{name token 0 (0)}

Name

9C

</shipTo>

{name token 0 (0)}

Name

9C

<billTo

{name token 6 (0)}

Name

95

E6

B1 E6 62 69 6C 6C 54 6F

>

{name token 1 (0)}

Name

9B

<name

{name token 8 (0)}

Name

98

E8

>

{name token 1 (0)}

Name

9B

Robert Smith

{type token 0 (0)}

{length 12 (21)}Robert Smith

Value

00 00 0C

52 6F 62 65 72 74 20 53 6D 69 74 68

B1 EC 52 6F 62 65 72 74 20 53 6D 69 74 68

</name>

{name token 0 (0)}

Name

9C

<street

{name token 9 (0)}

Name

98

E9

>

{name token 1 (0)}

Name

9B

8 Oak Avenue

{type token 0 (0)}

{length 12 (21)}8 Oak Avenue

Value

00 00 0C

38 20 4F 61 6B 20 41 76 65 6E 75 65

B1 EC 38 20 4F 61 6B 20 41 76 65 6E 75 65

</street>

{name token 0 (0)}

Name

9C

<city

{name token 10 (0)}

Name

98

EA

>

{name token 1 (0)}

Name

9B

Old Town

{type token 0 (0)}

{length 8 (21)}

Value

00 00 08

4F 6C 64 20 54 6F 77 6E

B1 E8 4F 6C 64 20 54 6F 77 6E

</city>

{name token 0 (0)}

Name

9C

<state

{name token 11 (0)}

Name

98

EB

>

{name token 1 (0)}

Name

9B

PA

{type token 0 (0)}

{length 2 (21)}PA

Value

00 00 02

50 41

B1 E2 50 41

</state>

{name token 0 (0)}

Name

9C

<zip

{name token 12 (0)}

Name

98

EC

>

{name token 1 (0)}

Name

9B

95819

{type token 0 (0)}

{length 5 (21)}95819

Value

00 00 05

39 35 38 31 39

B1 E5 39 35 38 31 39

</zip>

{name token 0 (0)}

Name

9C

</billTo>

{name token 0 (0)}

Name

9C

<comment

{name token 4 (7)}

Name

04

95

E4

B1 E7 63 6F 6D 6D 65 6E 74

>

{name token 1 (0)}

Name

9B

Hurry, my lawn is going wild!

{type token 0 (0)}

{length 29 (21)}Hurry, my lawn is going wild!

Value

00 00 1D

48 75 72 72 79 2C 20 6D 79 20 6C 61 77 6E 20 69 73 20 67 6F 69 6E 67 20 77 69 6C 64 21

B1 FD 48 75 72 72 79 2C 20 6D 79 20 6C 61 77 6E 20 69 73 20 67 6F 69 6E 67 20 77 69 6C 64 21

</comment>

{name token 0 (0)}

Name

9C

<items

{name token 7 (7)}

Name

07

95

E7

B1 E5 69 74 65 6D 73

>

{name token 1 (0)}

Name

9B

<item

{name token 13 (7)}

Name

0D

95

ED

B1 E4 69 74 65 6D

partNum=

{name token 20 (0)}

Name

96

F4

B1 E7 70 61 72 74 4E 75 6D

"872-AA"

{type token 0 (0)}

{length 6 (0)}872-AA

Value

00 00 06

38 37 32 2D 41 41

B1 E6 38 37 32 2D 41 41

>

{name token 1 (0)}

Name

9B

<productName

{name token 14 (0)}

Name

95

EE

B1 ED 70 72 6F 64 75 63 74 4E 61 6D 65

>

{name token 1 (0)}

Name

9B

Lawnmower

{type token 0 (0)}

{length 9 (21)}Lawnmower

Value

00 00 09

4C 61 77 6E 6D 6F 77 65 72

B1 E9 4C 61 77 6E 6D 6F 77 65 72

</productName>

{name token 0 (0)}

Name

9C

<quantity

{name token 15 (0)}

Name

95

EF

B1 E8 71 75 61 6E 74 69 74 79

>

{name token 1 (0)}

Name

9B

1

{type token 10 (0)}

{value 1 (8)}

Value

01

E1

</quantity>

{name token 0 (0)}

Name

9C

<USPrice

{name token 16 (0)}

Name

95

F0

B1 E7 55 53 50 72 69 63 65

>

{name token 1 (0)}

Name

9B

148.95

{type token 30 (0)}

{value 100 (64)}

{value 14895 (64)}

Value

00 00 00 00 00 00 00 64

00 00 00 00 00 00 3A 2F

B2

D0 64

CF 3A 2F

</USPrice>

{name token 0 (0)}

Name

9C

<comment

{name token 4 (7)}

Name

04

98

E4

>

{name token 1 (0)}

Name

9B

Confirm this is electric

{type token 0 (0)}

{length 24 (21)}Confirm this is electric

Value

00 00 18

43 6F 6E 66 69 72 6D 20 74 68 69 73 20 69 73 20 65 6C 65 63 74 72 69 63

B1 F8 43 6F 6E 66 69 72 6D 20 74 68 69 73 20 69 73 20 65 6C 65 63 74 72 69 63

</comment>

{name token 0 (0)}

Name

9C

</item>

{name token 0 (0)}

Name

9C

<item

{name token 13 (7)}

Name

98

ED

partNum=

{name token 20 (0)}

Name

99

F4

"926-AA"

{type token 0 (0)}

{length 6 (0)}926-AA

Value

00 00 06

39 32 36 2D 41 41

B1 E6 39 32 36 2D 41 41

>

{name token 1 (0)}

Name

9B

<productName

{name token 14 (0)}

Name

98

EE

>

{name token 1 (0)}

Name

9B

Baby Monitor

{type token 0 (0)}

{length 12 (21)}Baby Monitor

Value

00 00 0C

42 61 62 79 20 4D 6F 6E 69 74 6F 72

B1 EC 42 61 62 79 20 4D 6F 6E 69 74 6F 72

</productName>

{name token 0 (0)}

Name

9C

<quantity

{name token 15 (0)}

Name

98

EF

>

{name token 1 (0)}

Name

9B

1

{type token 10 (0)}

{value 1 (8)}

Value

01

E1

</quantity>

{name token 0 (0)}

Name

9C

<USPrice

{name token 16 (0)}

Name

98

F0

>

{name token 1 (0)}

Name

9B

39.98

{type token 30 (0)}

{value 100 (64)}

{value 3998 (64)}

Value

00 00 00 00 00 00 00 64

00 00 00 00 00 00 0F 9E

B2

D0 64

DF 9E

</USPrice>

{name token 0 (0)}

Name

9C

<shipDate

{name token 17 (7)}

Name

11

95

F1

B1 E8 73 68 69 70 44 61 74 65

>

{name token 1 (0)}

Name

9B

1999-05-21

{type token 38 (0)}

{value 10763 (32)}

Value

00 00 2A 0B

A7

CF 2A 0B

</shipDate>

{name token 0 (0)}

Name

9C

</item>

{name token 0 (0)}

Name

9C

</items>

{name token 0 (0)}

Name

9C

</purchaseOrder>

{name token 0 (0)}

Name

9C