bxproposal.html

XML is a very successful syntax for carrying information, being a careful compromise between many conflicting goals to hit the broad center of possible uses. One significant non-goal for XML was to be terse, so XML is completely text based and tends to be verbose. A binary format for XML would describe alternate syntax to carry information just the same way as XML but in a form more suitable for many valuable fringe uses for which standard XML is not suitable, often because it is too verbose.

Other than possibly a need for terseness these other uses do not consistently have a lot in common. To achieve sufficient generality to cover more than one corner of the possible uses, the binary format must either be extremely clever or it must be flexible. This proposal goes in the direction of flexibility by specifying a number of options. In fact, the proposal is little more than a specification of a set of optional transformations on standard XML that yield a more terse syntax, together with a simple mechanism for selecting from the options.

The transformations specified here were selected in an attempt to meet the requirements set forth in the report of the W3C XML Binary Characterization working group. Most either represent existing practice or are an obvious, simple invention to directly address a need of the format. Each transformation is reversible and as much as possible independent of the others.

2. Overview

The binary format for XML is specified in terms of a set of transformations applied to standard XML, resulting in a new syntax that carries most or all of the original textual format and all of the original data. The transformations are ordered, mostly independent, optional, and reversible. Thus one simple use model is to encode standard XML by applying a sequence of selected transformations, then later decode by applying the reverse transformations in reverse order. However, the results of many of the transformations are straightforward and can be constructed and used directly without producing or processing the textual form. These transformations might be called translations, and form the first part of the ordered sequence, the remaining transformations are for compression.

To accurately and efficiently translate names and different data types its necessary to use tables or some sort of magic; if tables are used they can be explicitly included in the format or implicitly derived from schema or example. To avoid excessive invention this proposal specifies the use of tables implicitly derived from unmodified XML-Schema documents. To better achieve some of the desired characteristics, there is also specified a schema for annotation of XML-Schema, described in the last section. The optional annotation provides more explicit direction for building the tables and configuring the transformations. This specification does not support use of DTD's unless they are first translated to and replaced by XML-Schema.

Only some of the transformations need schema information. Early translation transformations interpret the schema, later transformations inherit this information.

Some of the compression transformations are performed on blocks rather than the entire data set, when several of them are selected they all operate on the same blocks instead of choosing blocks independently.

Here are the specified transformations:

Declaration (translation) - A concise syntax for the XML declaration.
Schema (translation) - To specify schema concisely.
Name (translation, interprets schema) - Tokenize element and attribute names.
Entity (translation) - Recode entities and internal entity declarations.
Value (translation, blocked, interprets schema) - Recode attribute values and element body.
Transpose (compression, blocked, inherits schema) - Rearrange data for better compression.
Octet (translation, inherits schema if present) - Format symbols into 8-bit bytes, encoding and preserving types and token mapping.
Block Sort (compression, blocked) - Rearrange symbols, essential part of bzip2.
Ziv-Lempel compress (compression, blocked) - Compress symbols using a dictionary as in gzip.
Huffman coding (compression, blocked) - Final compression step.

Use of the binary format and the set of transformations to be used are specified in the encoding attribute of the XML declaration.

3. Specification

This section contains the complete specification of the "BX" binary format. It is presented as a process of encoding a presumed XML document into the "BX" format.

3.1 Basic Mechanisms

3.1.1 Encoding Declaration

The binary format is declared as an encoding in the "encoding" attribute of the XML declaration. This is consistent with XML 1.x if one takes a broad view of what it means to encode a stream of characters. The declaration indicates use of the binary format, provides a version number to indicate the specific specification in use, and indicates which of the optional transformations are selected.

The encoding name consists of the prefix "BX-" followed by a two-part version number such as "1.0" followed by the transformation selection suffix. The transformation selection suffix consists of a dash followed by the capitalized first letters of the selected transformations, in order.

BxEncName ::= 'BX-' Version '-' ['D']['S']['N']['E']['V']['T']['O']['B']['Z']['H']

An example declaration that selects all of the translation transformations except Declaration might be:

<?XML version="1.1" encoding="BX-1.0-SNEVO"?>

3.1.2 Symbols

All the transformations are specified in terms of streams of symbols. Symbols are essentially integers. The initial symbols are the Unicode characters of the presumed input XML document. Transformations can produce different kinds of symbols. The final result of a sequence of transformations is the simple concatenation of the binary representation of all the symbols.

During intermediate processing steps individual symbols or groups of symbols might have associated type or other information. Any such information that remains at the end of the last selected transformation is lost, except for width. Each symbol has an associated width in bits that determines how much space it occupies in the final stream.

3.1.2.1 Characters

Unicode characters are symbols with width 21 bits and type "character". The range of values is that specified in XML 1.x and/or Unicode.

3.1.2.2 Octets

An octet is a symbol with width 8 bits and type "octet". The range of values is 0-255.

3.1.2.3 Length symbols

Length symbols are used to define lists. They are only produced by translation transformations, in fact, the Name, Entity and Value translations always turn all character sequences into lists. They are integers that provide a count of the number output 8-bit bytes used by the following symbols in the list, excluding the length symbol itself. There is a global width for length symbols that depends on the maximum block size. For the default maximum block size the width is 21 bits. Length symbols are always either the global width or, if redundant, width 0.

Lists can cross block boundaries. In this case the list is segmented with a length symbol at the head of each segment. The most significant bit of the value of the length symbol indicates whether there are more segments. 0 means this is the final (or only) segment, 1 means there are more segments.

3.1.2.4 Count symbols

Count symbols are produced in compression transformations. Like length symbols they often tell how many symbols follow, but their value is the number of symbols rather than the number of bytes taken up by the symbols.

3.1.2.5 Name Tokens

Name tokens represent element, attribute and entity names and delimit their respective constructs. They are symbols of type "name token".The value range for name tokens is built up from three contiguous subranges, one each for elements, attributes and entities. Once the name token symbol space is fully evaluated the global name token width is defined as the number of bits required to represent the maximum value.

There are two special name tokens. Token 0 is the end-of-element token, and token 1 is the end-of-attributes token.

There is also a pre-defined element name token, transpose, that takes the first number (2) of the element name range.

The width of an individual name token can be either the global token width or zero. A token with zero width still has a value and other properties, but is redundant and so won't appear unless it is subsequently converted to some other symbol.

3.1.2.6 Type Tokens

Type tokens represent XML-Schema primitive types and their encodings, and are symbols of type "type token" and width 8 or 0, where as for name tokens width zero indicates that its presence is not required in the output. The possible values of type tokens are given in Table 1. Each type token also has a property of whether or not it is a list, encoded in its value as even for plain and odd for list. A type token is followed by value symbols that encode the value of an attribute or element body.

Value	Hex	Primitive Type	Encoding (Width)
0, 1	00, 01	string or anySimpleType	list of characters
2, 3	02, 03	nil	none
4, 5	04, 05	boolean	integer (1)
6, 7	06, 07	float	float (32)
8, 9	08, 09	double	double (64)
10, 11	0A, 0B	decimal	unsigned integer (8)
12, 13	0C, 0D	decimal	signed integer (8)
14, 15	0E, 0F	decimal	unsigned integer (16)
16, 17	10, 11	decimal	signed integer (16)
18, 19	12, 13	decimal	unsigned integer (32)
20, 21	14, 15	decimal	signed integer (32)
22, 23	16, 17	decimal	signed integer (64)
24, 25	18, 19	decimal	fraction (8), (32)
26, 27	1A, 1B	decimal	fraction (16), (64)
28, 29	1C. 1D	decimal	fraction (32), (64)
30, 31	1E, 1F	decimal	fraction (64), (64)
32, 33	20, 21	duration	fraction (32), (64)
34, 35	22, 23	dateTime	fraction (32), (64), tz (16)
36, 37	24, 25	time	fraction (32), (64)
38, 39	26, 27	date	integer (32)
40, 41	28, 29	gYearMonth	integer (16)
42, 43	2A, 2B	gYear	integer (16)
44, 45	2C, 2D	gMonthDay	integer (16)
46, 47	2E, 2F	gMonth	integer (8)
48, 49	30, 31	gDay	integer (8)
50, 51	32, 33	hexBinary or base64Binary	list of octets
52, 53	34, 35	anyURI	list of characters
54, 55	36, 37	QName	name token
56, 57	38, 39	QName	list of characters
58, 59	3A, 3B	enumeration	integer (8)
60, 61	3C, 3D	enumeration	integer (16)
62, 63	3E, 3F	enumeration	integer (32)

Table 1

3.1.2.7 Value symbols

Value symbols encode the numeric data values shown in Table 1.

Integer is a signed or unsigned twos-complement integer value of the given width.

Float is a 32-bit IEEE-754-1985 floating-point value. Double is a 64-bit IEEE-754-1985 floating-point value.

Fraction is a pair of integers, which when the second is divided by the first yield the value. The first (divisor) is unsigned and non-zero, the second (dividend) is signed. When a fraction is derived from a decimal string literal, the divisor will always be a power of ten, but in general divisor values are not so limited.

The primitive types duration, dateTime and time are encoded in seconds as fractions. For dateTime the fraction is followed by the timezone, which is a signed 16-bit integer. Zero timezone means UTC, while the special value -32768 means there is no timezone specified.

TODO: is there an issue for dateTime and date with regard to the epoch? Should the date instead be represented as gYear, gMonth, gDay? If so, should duration be the same?

3.1.2.8 Output

An implicit transformation in this specification is the rendering of symbols to the final bit stream of the binary format. Two of the transformations, Octet and Huffman, make this straightforward as their output is either octets or bits. However, they are both optional and if neither is selected there is a default rendering. This rendering is also referenced in creating blocks and length symbols.

Each symbol is padded on the left (most significant bits) with zeroes so that its width is a multiple of 8. Then it is placed in the stream in network byte order, most significant 8-bit bytes first.

All properties and type information are discarded. Symbols of width zero are skipped altogether.

3.1.3 Schema

Several transformations depend on or can make use of schema for the document. In all cases this is XML-Schema, optionally annotated with the BX Schema Annotation described in section 3.3. All such transformations are specified to operate properly with any valid XML-Schema. If no schema exists it is possible to automatically create a schema for a given document and either encapsulate it with the document (see XML Container below) or select the Octet transformation which preserves type and other needed information. If a DTD exists it might be possible to perform a canonical translation of the DTD into a schema, and use that.

The Schema transformation depends on a registration scheme for XML-Schema documents. The details are outside the scope of this specification, but we will assume a two-level mechanism similar to that used for PCI devices. A global entity such as the W3C registers parties who wish to register documents, and assigns each party a number. Each party then assigns its own numbers to its schema documents. The global entity maintains a web accessible database of the parties, accessible by number, providing for each a link to their web-accessible database of documents. Each party's database is accessible by document IRI and number, and given one provides the other and a link to the schema document itself.

The party number 0 is reserved for local, private and experimental use. For use in examples we will also assume that party number 1 is the W3C and the XML-Schema instance document is the W3C's document number 1.

3.1.4 XML Container

If it should happen that a recommendation is created for an XML container schema, then it would be possible to encapsulate required schema along with a document. In this case all translation transformations would recognize the container syntax and would, when necessary, treat encapsulated documents individually.

3.1.5 Blocks

Several transformations work on blocks of symbols. In addition, length symbols are limited to the block size, so transformations that produce length symbols also respect block boundaries. When any such transformations are selected, the document is divided into contiguous blocks at the input to the first transformation that uses blocks (or within the first transformation as an optimization) and all the transformations operate on the same blocks.

Unless configured differently by use of XML Schema Annotation, block boundaries are placed between the outermost elements possible. Maximum block size is 1 binary megabyte, 1048576 8-bit bytes, but can be configured to a smaller size.

Each block is preceded by a symbol of type "block" of width 20 bits. The value of the symbol is the length of the block in 8-bit bytes. These symbols do not participate in transformations except their value is kept up to date when the size of the block changes and some transformations alter the width to zero or back to 20.

3.2 Transformations

The transformations are specified in terms of how they map an input symbol stream to an output symbol stream. All the transformations are supposed to be reversible, so both encoding and decoding are possible as well as direct production and interpretation of translated symbol streams or serialization from and parsing to internal representations.

It is common practice in binary specifications to specify packed values in terms of bit fields, but here there are a few places where the packing is more complicated, so packed values are specified in terms of integer ranges and sums of integer products. In many cases it should be obvious that they are in fact bit fields.

3.2.1 Declaration Translation

The Declaration transformation recasts the XML declaration into six octets. The first two octets are the characters 'B' and 'X'. The third octet encodes the XML version as 16*<major version> + <minor version>. The fourth octet similarly encodes the binary format version. The fifth octet encodes the selected translation transformations except for Declaration, which is implicit when the BX prefix is present. The sixth octet similarly encodes the selected compression transformations.

The fifth octet flags are encoded as 16*<Schema> + 8*<Name> + 4*<Entity> + 2*<Value> + <Octet>.

The sixth octet flags are encoded as 8*<Transpose> + 4*<Block> + 2*<Ziv-Lempel> + <Huffman>.

Using the example above, the following are equivalent except in the use of the Declaration transformation, with the second shown in hex:

<?XML version="1.1" encoding="BX-1.0-SNEVO"?>

42 58 11 10 1F 00

All further transformations pass the declaration (in either form) through unmodified.

TODO: Should there be a bit for the Standalone Document Declaration? If so, a document that is not standalone might be identified by using the most significant bit of the sixth octet (128*<not standalone>).

3.2.2 Schema Translation

When registered XML-Schema are referenced in the document element, the Schema transformation removes no longer needed markup (i.e. xsi:schemaLocation, xsi:noNamespaceSchemaLocation) and inserts binary reference symbols after the declaration. In any case it inserts a length symbol before any reference symbols.

The Schema transformation introduces a special schema symbol, with no value and 0 width, followed by a list (beginning with a length symbol) of pairs of 32-bit unsigned integers. Each pair is the party number followed by the document number. The schema symbol and list immediately follows the XML declaration. The rest of the document remains unchanged except to eliminate from the document element any xsi:schemaLocation and xsi:noNameSchemaLocation attributes for the listed schema.

In the case of container documents, the Schema transformation is performed at the document element of each encapsulated document, with all schema added to the single list following the XML declaration.

3.2.3 Name Translation

Element and attribute names and syntax map to token symbols. Namespace declarations are eliminated.

The transformation numbers each element and attribute name found in the schema in order, implicitly building a table of names to numbers, and determining the global token properties.

All namespace declaration attributes are removed from the document element.

Within the document body, including the document element, each element name together with its introductory angle bracket and whitespace maps to the corresponding element token. If the presence of a specific element can be deduced from the schema, its token width is zero. If an element has a body the closing angle bracket and surrounding white space maps to the end-of-attributes token, and the element's end-tag maps to the end-of-element token. If there is no body the closing angle bracket and surrounding white space maps to the end-of-element token. The width is zero on the final trail of end-of-element tokens in the document.

The attributes are sorted into the order in which they appear in the schema declaration for the element, required attributes first and then optional attributes. Each attribute name and its following equals sign and surrounding white space maps to the corresponding attribute token. For required attributes the token width is zero.

All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol.

3.2.4 Entity Translation

Entity names and syntax map to token symbols and lists of characters.

Starting at the entity base number the transformation numbers each entity name found in the document, implicitly building a table of names to numbers, and determines or updates the global name token properties.

Note that if the Name transformation is not selected the entity base is 2.

All internal entity declarations are grouped at the head of the document, following all non-character symbols after the XML declaration. Each internal entity declaration maps to a sequence consisting of its name token followed by its replacement strings encoded as a string type token followed by a list of characters. Throughout the rest of the document entity references map to their corresponding name token.

All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol. Each entity reference must be surrounded on both sides with lists of characters, even if of zero length.

TODO: deal with external entities and parameter entities.

3.2.5 Value Translation

Element bodies and attribute values map to sequences consisting of a type symbol followed by an encoding of the value.

The transformation maps each attribute value and each simpleType in an element body to a type token and an encoding of the value. It does not rely on the Name transformation so must work either with text XML or with name tokens to find elements and attributes.

The type token reflects the ultimate primitive type (or enumeration) of the value according to the schema, and specifies the encoding of the value as shown in Table 1. The actual value encountered does not affect the type token unless the simpleType of the value involves a union.

In several cases the type token indicates the width of the value encoding. This width is determined from the schema by examining the restrictions on the type. If there are no restrictions, the maximum width is used. The actual value does not affect the width specified by the type token.

If the type is restricted by enumeration then the type token is of enumeration type with the width determined by the number of possibilities. The value is encoded as a zero-based integer index that selects among the possibilities in the order they are listed in the schema.

In the case of a union the appropriate type token is selected from a list of possible primitive types according to the schema. If multiple enumerations are encountered in a union they are combined in the order found into a single enumeration by concatenating their value ranges.

Since, for instance, the string type is encoded as a list, a list of strings results in two layers of length symbols. The outermost length embraces the entire list while each of the inner lengths corresponds to its string's characters.

Fixed attributes are left with both type token and value symbol width 0. Default values are not inserted.

All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol.

3.2.6 Transposition for Compression

The transposition transformation rearranges data for better compression in following steps. It is useful in applications that involve a large number of samples, where each sample contains several more-or-less independent data items. A programming language equivalent of the transformation would be changing an array of structures into a set of arrays of each of the original structure's elements. This places related data items next to each other. Experience shows this can result in modest but significant improvement in the amount of compression that can be obtained from lossless compression algorithms such as the compression transforms specified here.

The transformation works on sequences of leaf elements (elements with no body or body of simpleType), all with the same element name, the same number of attributes and, for body or attributes of list type, the same number of items in each list. An exception is that at the end of the sequence the attributes and numbers of items in lists can fall off monotonically. By default the minimum length of such a sequence of elements is 8. Transposition does not cross block boundaries.

Sequences are identified in terms of token symbols and their associated value encodings. (This transform produces no mapping unless both Name and Value translation are selected; this also means the attributes are already mapped into the same order in every element.) Each eligible sequence of N elements, with maximum body or attribute list length of M, maps to a new transpose element containing M new elements with the original name, but with all attributes and the body retyped to lists (if they were not a list already) of maximum length N. Each attribute and the body of the first new element has as value a list of the N (first) values of the corresponding attribute or body of the original element. In the second and subsequent new elements only those attributes or the body that were originally lists are present, and list the up to N second or subsequent values from those lists.

As a comprehensive example, consider the following fragment of XML. Note that the last sample element is not eligible because it reintroduces an attribute and the length of one of its lists increased.

</testrun>

The fragment maps to the equivalent of this XML:

<bx:transpose>

<sample a="11 21 31 41 51 61 71 81"

b="11 21 31 41 51 61 71 81"

c="1 2 3 4 5 6 7 8"

d="1 2 3 4 5 6 7 8"

e="1 2">

1 2 3 4 5 6 7 8

</sample>

<sample a="12 22 32 42 52 62 72 82"

b="12 22 32 42 52 62"/>

</bx:transpose>

</testrun>

3.2.7 Octet Translation

The Octet translation transformation provides an alternative to the standard output mechanism. It is less suited to applications that want static formatted data and more suited to streaming in that it uses variable length sequences that depend on the actual values encountered. In addition it carries enough type information that a document might be reconstructed without reference to a schema.

Past the XML declaration all symbol sequences (except block symbols) map to sequences of octets, preserving redundant type and length information. Each symbol or group of symbols translates to an introductory octet followed by zero or more additional octets. Table 2 lists the values of the introductory octet and what they mean. Unless otherwise noted, a range of values for the introductory octet means that it carries the most significant bits of the item value. Further data octets carry the rest of the value in network byte order. The definition of following octets is recursive, so one introductory octet might be followed by shorter sequences containing their own introductory octets.

Characters not preceded by token symbols map to a short sequence beginning with an introductory octet in the range 0-144.

A schema symbol and its associated list of registered schema numbers maps to a sequence beginning with an introductory octet in the range 145-147.

Length symbols are discarded. Lists of zero length are discarded.

Block length symbols are retained within the stream of octet symbols but set to 0 width.

A name token maps to an introductory octet with value 148-150 for the first occurrence, followed by an integer sequence and a string sequence listing the name; or for any subsequent occurrence value 151-153 followed by an integer sequence. The introductory octet indicates whether it is an element, attribute or entity name, and for the first occurrence of an entity name there is also a string carrying the replacement text. The end-of-attributes and end-of-element tokens have their own introductory octets, 154 and 155.

A type token and its following value map to sequences with different introductory octet ranges depending on the type. The string type maps to a sequence consisting of a length integer (providing the number of characters) followed by a sequence as described above for each character. This string sequence is in turn reused inside sequences for name tokens. Similarly the integer sequence is used for integer-valued decimal types as well as inside many of the other sequences.

If a type token is for a list it and its values map to a sequence beginning with an introductory octet with value 156 followed by an integer sequence that provides the number of items in the list, then followed by each item translated as for an individual item.

The length of an octet sequence depends only on its value, being the shortest length that will encode the value. Any length information carried from the schema by type tokens is discarded.

Introductory octet values	Hex	Symbols replaced	Indicated type	Following octets
0-127	00 - 7F	character in the range hex 00-7F	character	none
128	80	character in the range hex 0080-FFFF	character	2 data
129-144	81 - 90	character in the range hex 100000-10FFFF	character	2 data
145	91	schema and list of length 1 pair	schema	2 integer
146	92	schema and list of length 2 pair	schema	4 integer
147	93	schema and longer list of length n pairs	schema	integer=n, 2*n integer
148	94	(unused)	none	none
149	95	1st element name token	element	integer, string
150	96	1st attribute name token	attribute	integer, string
151	97	entity definition name token and list of characters	entity declaration	integer, string, string
152	98	element name token	element	integer
153	99	attribute name token	attribute	integer
154	9A	entity name token	entity reference	integer
155	9B	name token	end of attributes	none
156	9C	name token	end of element	none
157	9D	list of length n	list	integer=n, n value items
158	9E	nil type token	nil	none
159	9F	boolean type token, value=false	boolean	none
160	A0	boolean type token, value=true	boolean	none
161	A1	float type token and value symbol	float	4 data
162	A2	double type token and value symbol	double	8 data
163	A3	duration type token and value symbols	duration	2 integer
164	A4	dateTime type token and value symbols, no timezone	dateTime	2 integer
165	A5	dateTime type token and value symbols with timezone	dateTime	3 integer
166	A6	time type token and value symbols	time	2 integer
167	A7	date type token and value symbol	date	integer
168	A8	gYearMonth type token and value symbol	gYearMonth	integer
169	A9	gYear type token and value symbol	gYear	integer
170	AA	gMonthDay type token and value symbol	gMonthDay	integer
171	AB	gDay type token and value symbol	gDay	integer
172	AC	gMonth type token and value symbol	gMonth	integer
173	AD	binary type token and octet list of length n	binary data	integer=n, n data
174	AE	anyURI type token and string	anyURI	string
175	AF	QName type token and name token	QName	integer
176	B0	QName type token and string	QName	string
177	B1	string or anySimpleType type token and list of n characters or (string)	string	integer=n, n character
178	B2	fractional decimal type token and value symbol	decimal	2 integer
200	C8	decimal type token and value symbol or (integer)	decimal	8 data (signed 2's complement integer)
201	C9	decimal type token and value symbol or (integer)	decimal	7 data (signed 2's complement integer)
202	CA	decimal type token and value symbol or (integer)	decimal	6 data (signed 2's complement integer)
203	CB	decimal type token and value symbol or (integer)	decimal	5 data (signed 2's complement integer)
204	CC	decimal type token and value symbol or (integer)	decimal	4 data (signed 2's complement integer)
205	CD	decimal type token and value symbol or (integer)	decimal	3 data (signed 2's complement integer)
206	CE	decimal type token and value symbol or (integer)	decimal	2 data (unsigned integer)
207	CF	decimal type token and value symbol or (integer)	decimal	2 data (signed 2's complement integer)
208-223	D0 - DF	decimal type token and value symbol or (integer)	decimal	1 data (signed 2's complement integer, including 4 bits from intro octet)
224-255	E0 - FF	decimal type token and value symbol or (integer)	decimal	none (intro octet represents range 0-32)

Table 2

3.2.8 Block Sort for Compression

The Block Sort transformation provides the same essential mechanism as that used in the bzip2 utility. It maps a block of symbols into a new sequence based on sorting. While this does not reduce their size, it significantly improves the chances for good compression in later steps.

Since the transformation destroys the original order of the symbols it is no longer possible to infer the width of each symbol. Thus, it must be followed by a transformation that encodes symbol width. Also, it benefits from run-length encoding (or better yet, simple dictionary compression) before the sort and it is of little use without dictionary compression following the sort. So this transformation cannot be used without Ziv-Lempel and Huffman coding. If the Block Sort transformation is selected without both of Ziv-Lempel and Huffman coding, Block Sort becomes the identity mapping.

The transformation remaps each block independently.

The block is first transformed using the Ziv-Lempel transformation below. The encoder need only search for runs, duplicated sequences immediately following the sequence they duplicate. The count symbols from the Ziv-Lempel transformation map to the beginning of the new block. Following the count symbols the original symbols in the block are sorted as follows.

Every symbol in the block is associated with a (reverse order) sequence beginning with the preceding symbol and going backwards, wrapping around from the beginning to the end of the block to finish back with the symbol itself. The new mapping orders the symbols by sorting their associated sequences by (unsigned integer) symbol value in lexical order such that symbols earlier in each sequence have greater priority. For the purpose of sorting, each symbol whose value is not a non-negative integer (i.e. float, double and negative integers) is treated as an unsigned binary integer of the symbol's width.

A new count symbol is prepended to the sequence (after the block symbol and Ziv-Lempel count symbols and before the new mapping). The value is the 0-based index of the symbol in the new mapping that was the first symbol in the original mapping.

Oddly enough, this transformation is completely reversible. A very short example will show how it works. Consider the sequence "bcab". These symbols correspond to the following reversed and wrapped sequences, followed by the sequences sorted with the symbol highlighted:

Original Symbols	Reverse Wrapped Sequences	Sequences Sorted	Output Symbols	Decoding Columns	Reconstruction
b	bacb	acbb	b	ab	...ab
c	bbac	bacb	b*	bb
a	cbba	bbac	c	bc	.bc
b	acbb	cbba	a	ca	..ca

The new mapping is "bbca" and the 0-based index of the original first symbol (starred) is 1. To decode, first sort all the individual symbols into one column, then list the new mapping in a second column. The pair of symbols in each row is in the original order, now you just have to figure out how to connect the pairs. Start with the original first symbol in the second column of the indexed row, the starred "b". Note that its the second "b" in the column. Now find the second "b" in the first column. This is your first pair. Take the following "c" and repeat the process to find the next pair. Repeat until you recover the entire sequence.

3.2.9 Ziv-Lempel Compression

The Ziv-Lempel compression transformation remaps the symbols within a block to a new, hopefully smaller, set of symbols. It is an LZ77 algorithm related to the dictionary algorithm used in gzip.

The transformation makes use of count symbols. It segments the original symbols within each block into sequences and remaps pairs of sequences (the second of which is a duplicate of any sequence earlier in the block) into a triple of count symbols followed, later, by the first sequence of the pair. (Sometimes the first sequence is nil). All the triples of count symbols are mapped to the front of the block, all the remaining original sequences of symbols follow.

Each triple of count symbols consists of, first, a count of unduplicated symbols (the number of symbols in the first sequence of the original pair); second, a count of duplicated symbols (the number of symbols in the second sequence of the original pair, which duplicates a sequence earlier in the block); third, the offset (in symbols) from the earlier duplicated sequence to the second sequence of the pair.

When a duplicate sequence is immediately followed by another, the second is treated as the second sequence of a new pair, where the first sequence is nil. In that case the first count symbol if its triple has value 0. In the same way, if the last sequence in the block is not a duplicate it is treated as the first sequence of a pair where the second sequence is nil, and in its triple the second count symbol has value 0.

There is always a final triple where the second count symbol has value 0, whether or not the block ends with a duplicate sequence. In this final triple the third symbol is not present, so it actually consists of only two symbols.

Here is an example of a block of letters segmented into unduplicated and duplicated segments, with nil segments depicted with a dash, and the count triples and symbols produced by the mapping:

abcdeeeeabcdf

abcde eee - abcd f -

5 3 1 0 4 8 1 0 abcde f

In finding duplicate sequences symbol values may be considered equal if they have the same unsigned integer value as for block sort. Any more strict definition of equality is also acceptable.

3.2.10 Huffman Coding Compression

The Huffman Coding compression transformation remaps the symbols within a block to a stream of bits. The mapping process involves alternating steps of Huffman Coding (producing bit counts, a list of symbols, and a bit stream) and Ziv-Lempel Compression as described above (producing count triples and a list of symbols). After the penultimate step there is a list of mixed count symbols followed by a bit stream. The count symbols are Huffman coded using a predefined symbol list and set of bit counts into a bit stream, resulting in a complete Huffman coded bit stream as the final mapping.

TODO: Actually specify the algorithms and/or their input and output.

3.3 XML Schema Annotation

All of the transformations are defined to work with unmodified XML-Schema. However, the default mappings do not provide sufficient flexibility to cover all of the possible uses of binary XML. To cover more possible uses there is an annotation schema to control the use of binary XML and guide the transformations. It also provides a namespace for the transpose element used in the Transpose transformation.

The annotation schema defines fragments and attributes to be inserted in the schema for a class of target binary XML documents. The binary element is for overall control and configuration and may be inserted anywhere at the top level of the schema. The others are for annotation within the schema to control how the transformations apply to various parts of target documents.

Some control of target documents is possible within standard XML-Schema, as described in some of the transformations above, without use of this annotation schema. The methods are described again here along with the new methods available with annotation so that all schema options are described in one place.

3.3.1 Overall Configuration

Here is an example of a fragment that can appear at the top level of a schema to constrain and configure the binary format of a document.

<bx:binary>

<bx:required bx:encoding="BX-1.0-DSNEVO"/>

<bx:optional bx:encoding="BX-1.0-ZH"/>

<bx:transform bx:name="schema">

<bx:property bx:name="local" bx:value="true"/>

<bx:property bx:name="global" bx:value="false"/>

</bx:transform>

<bx:property bx:name="blocksize" bx:value="65536"/>

</bx:binary>

This shows the two parts of the binary element.

The first part consists of the required and optional elements. The encoding attribute of the required element shows the minimum version and minimum set of selected transformations to be used in encoding a target document. The encoding attribute of the optional element shows the maximum version and additional allowed transformations. In either element the version is optional and the set of transformations can be empty, or can be "*" to indicate all transformations. In any case both "-" characters must be present. In the optional element the required transformations can be duplicated or not without changing the meaning. Note that the Declaration transformation can be specified here, even though it never appears in an a ctual XML declaration.

The second part consists of transform elements that provide configuration information to the transforms in the form of a property list consisting of property elements. Global configuration is provided by property elements outside a transform element.

TODO: Document some properties.

3.3.2 Embedded Annotation

The enable and disable elements control the application of individual transforms (given in the name attribute) to schema-defined types.

The index attribute may be used in element and attribute type declarations to fix the value of the corresponding name token in the Name transformation.

Without using the annotation schema it is possible to determine the width of simpleType value symbols by restrictions on length and/or minimum and maximum value.

TODO: Write the actual schema document.

4. Examples

4.1 Purchase Order

Here are examples of all of the translation transformations performed on the Purchase Order example from the XML-Schema tutorial.

4.1.1 Reference Documents

For reference, here is the example document and its schema. The example document has been modified to include an internal entity and reference to the schema.

Following is the schema annotated in such a way as to produce the same results as in this example without annotation. This shows the use of some of the annotations and also shows the default numbering of name tokens.

4.1.1.1 Purchase Order Document

<?xml version="1.0"?>

<!ENTITY california "CA">

<purchaseOrder orderDate="1999-10-20"

xmlns="http://example.com:po"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="po.xsd">

<name>Alice Smith</name>

<street>123 Maple Street</street>

<city>Mill Valley</city>

<state>&california;</state>

</shipTo>

<name>Robert Smith</name>

<street>8 Oak Avenue</street>

</billTo>

<comment>Hurry, my lawn is going wild!</comment>

<items>

<productName>Lawnmower</productName>

<comment>Confirm this is electric</comment>

</item>

<productName>Baby Monitor</productName>

</item>

</items>

</purchaseOrder>

4.1.1.2 Purchase Order Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:annotation>

<xsd:documentation xml:lang="en">

Purchase order schema for Example.com.

</xsd:documentation>

</xsd:annotation>

<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

<xsd:element name="comment" type="xsd:string"/>

<xsd:complexType name="PurchaseOrderType">

<xsd:sequence>

<xsd:element name="shipTo" type="USAddress"/>

<xsd:element name="billTo" type="USAddress"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="items" type="Items"/>

</xsd:sequence>

<xsd:attribute name="orderDate" type="xsd:date"/>

</xsd:complexType>

<xsd:complexType name="USAddress">

<xsd:sequence>

<xsd:element name="name" type="xsd:string"/>

<xsd:element name="street" type="xsd:string"/>

<xsd:element name="city" type="xsd:string"/>

<xsd:element name="state" type="xsd:string"/>

<xsd:element name="zip" type="xsd:decimal"/>

</xsd:sequence>

<xsd:attribute name="country" type="xsd:NMTOKEN"

fixed="US"/>

</xsd:complexType>

<xsd:complexType name="Items">

<xsd:sequence>

<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="productName" type="xsd:string"/>

<xsd:element name="quantity">

<xsd:simpleType>

<xsd:restriction base="xsd:positiveInteger">

<xsd:maxExclusive value="100"/>

</xsd:restriction>

</xsd:simpleType>

</xsd:element>

<xsd:element name="USPrice" type="xsd:decimal"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>

</xsd:sequence>

<xsd:attribute name="partNum" type="SKU" use="required"/>

</xsd:complexType>

</xsd:element>

</xsd:sequence>

</xsd:complexType>

<xsd:simpleType name="SKU">

<xsd:restriction base="xsd:string">

<xsd:pattern value="\d{3}-[A-Z]{2}"/>

</xsd:restriction>

</xsd:simpleType>

</xsd:schema>

4.1.1.3 Annotated Purchase Order Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:bx="http://example.org/BX">

<xsd:annotation>

<xsd:documentation xml:lang="en">

Purchase order schema for Example.com.

</xsd:documentation>

</xsd:annotation>

<bx:binary>

<bx:required bx:encoding="BX-1.0-DSNEV"/>

<bx:optional bx:encoding="BX-1.0-O"/>

</bx:binary>

<xsd:element name="purchaseOrder" bx:index="3" type="PurchaseOrderType"/>

<xsd:element name="comment" bx:index="4" type="xsd:string"/>

<xsd:complexType name="PurchaseOrderType">

<xsd:sequence>

<xsd:element name="shipTo" bx:index="5" type="USAddress"/>

<xsd:element name="billTo" bx:index="6" type="USAddress"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="items" bx:index="7" type="Items"/>

</xsd:sequence>

<xsd:attribute name="orderDate" bx:index="18" type="xsd:date"/>

</xsd:complexType>

<xsd:complexType name="USAddress">

<xsd:sequence>

<xsd:element name="name" bx:index="8" type="xsd:string"/>

<xsd:element name="street" bx:index="9" type="xsd:string"/>

<xsd:element name="city" bx:index="10" type="xsd:string"/>

<xsd:element name="state" bx:index="11" type="xsd:string"/>

<xsd:element name="zip" bx:index="12" type="xsd:decimal"/>

</xsd:sequence>

<xsd:attribute name="country" bx:index="19" type="xsd:NMTOKEN"

fixed="US"/>

</xsd:complexType>

<xsd:complexType name="Items">

<xsd:sequence>

<xsd:element name="item" bx:index="13" minOccurs="0" maxOccurs="unbounded">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="productName" bx:index="14" type="xsd:string"/>

<xsd:element name="quantity" bx:index="15">

<xsd:simpleType>

<xsd:restriction base="xsd:positiveInteger">

<xsd:maxExclusive value="100"/>

</xsd:restriction>

</xsd:simpleType>

</xsd:element>

<xsd:element name="USPrice" bx:index="16" type="xsd:decimal"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="shipDate" bx:index="17" type="xsd:date" minOccurs="0"/>

</xsd:sequence>

<xsd:attribute name="partNum" bx:index="20" type="SKU" use="required"/>

</xsd:complexType>

</xsd:element>

</xsd:sequence>

</xsd:complexType>

<xsd:simpleType name="SKU">

<xsd:restriction base="xsd:string">

<xsd:pattern value="\d{3}-[A-Z]{2}"/>

</xsd:restriction>

</xsd:simpleType>

</xsd:schema>

4.1.2 Individual Transformations

The following sections show the mapping produced by each of the translation transformations (except Octet) selected individually, before the final output rendering.

Characters are shown naturally as above, symbols are shown in curly braces "{}" with the type first followed by the decimal value (unless preceded by "hex") and then the width (in bits) in parentheses. Comments to help keep track of whats happening follow "//" on each line, they are not part of the transformation, nor are any additional line breaks. Ellipses (...) indicate the rest of the document follows without modification.

4.1.2.1 Declaration

{octet hex 42 (8)}

{octet hex 58 (8)}

{octet hex 10 (8)}

{octet hex 00 (8)}

<!ENTITY california "CA">

<purchaseOrder orderDate="1999-10-20"

xmlns="http://example.com:po"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="po.xsd">

...

4.1.2.2 Schema

We will assume the po.xsd schema is registered privately as document number 15.

<?xml version="1.0"?>

{schema 0 (0)}

{length 8 (21)}

{integer 0 (32)}

{integer 15 (32)}

<!ENTITY california "CA">

<purchaseOrder orderDate="1999-10-20"

xmlns="http://example.com:po"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

...

4.1.2.3 Name

We will assume the noNamespaceSchemaLocation attribute is name 100 in a canonical schema for XML-Schema.

Notice that because the schema provides very strict guidance on the sequence of the content, most of the name tokens can be inferred from context and so have a width of 0.

<?xml version="1.0"?>

{block n (20)}

{length 24 {21)}<!ENTITY california "CA">

{name token 3 (0)} // <purchaseOrder>

{name token 18 (7)} // orderDate

{length 10 (21)}1999-10-20

{name token 121 (7)} // xsi:noNamespaceSchemaLocation

{length 6 (21)}po.xsd

{name token 1 (7)}

{name token 5 (0)} // <shipTo>

{name token 19 (0)} // country

{length 2 (21)}US

{name token 1 (0)}

{name token 8 (0)} // <name>

{name token 1 (0)}

{length 11 (21)}Alice Smith

{name token 0 (0)} // </name>

{name token 9 (0)} // <street>

{name token 1 (0)}

{length 16 (21)}123 Maple Street

{name token 0 (0)} // </street>

{name token 10 (0)} // <city>

{name token 1 (0)}

{length 11 (21)}Mill Valley

{name token 0 (0)} // </city>

{name token 11 (0)} // <state>

{name token 1 (0)}

{length 12 (21)}&california;

{name token 0 (0)} // </state>

{name token 12 (0)} // <zip>

{name token 1 (0)}

{length 5 (21)}90952

{name token 0 (0)} // </zip>

{name token 0 (0)} // </shipTo>

{name token 6 (0)} // <billTo>

{name token 19 (0)} // country

{length 2 (21)}US

{name token 1 (0)}

{name token 8 (0)} // <name>

{name token 1 (0)}

{length 12 (21)}Robert Smith

{name token 0 (0)} // </name>

{name token 9 (0)} // <street>

{name token 1 (0)}

{length 12 (21)}8 Oak Avenue

{name token 0 (0)} // </street>

{name token 10 (0)} // <city>

{name token 1 (0)}

{length 8 (21)}Old Town

{name token 0 (0)} // </city>

{name token 11 (0)} // <state>

{name token 1 (0)}

{length 2 (21)}PA

{name token 0 (0)} // </state>

{name token 12 (0)} // <zip>

{name token 1 (0)}

{length 5 (21)}95819

{name token 0 (0)} // </zip>

{name token 0 (0)} // </billTo>

{name token 4 (7)} // <comment>

{name token 1 (0)}

{length 29 (21)}Hurry, my lawn is going wild!

{name token 0 (0)} // </comment>

{name token 7 (7)} // <items>

{name token 1 (0)}

{name token 13 (7)} // <item>

{name token 20 (0)} // partNum

{length 6 (21)}872-AA

{name token 1 (0)}

{name token 14 (0)} // <productName>

{name token 1 (0)}

{length 9 (21)}Lawnmower

{name token 0 (0)} // </productName>

{name token 15 (0)} // <quantity>

{name token 1 (0)}

{length 1 (21)}1

{name token 0 (0)} // </quantity>

{name token 16 (0)} // <USPrice>

{name token 1 (0)}

{length 6 (21)}148.95

{name token 0 (0)} // </USPrice>

{name token 4 (7)} // <comment>

{name token 1 (0)}

{length 24 (21)}Confirm this is electric

{name token 0 (0)} // </comment>

{name token 0 (7)} // </item>

{name token 13 (7)} // <item>

{name token 20 (0)} // partNum

{length 6 (21)}926-AA

{name token 1 (0)}

{name token 14 (0)} // <productName>

{name token 1 (0)}

{length 12 (21)}Baby Monitor

{name token 0 (0)} // </productName>

{name token 15 (0)} // <quantity>

{name token 1 (0)}

{length 1 (21)}1

{name token 0 (0)} // </quantity>

{name token 16 (0)} // <USPrice>

{name token 1 (0)}

{length 5 (21)}39.98

{name token 0 (0)} // </USPrice>

{name token 17 (7)} // <shipDate>

{name token 1 (0)}

{length 10 (21)}1999-05-21

{name token 0 (0)} // </shipDate>

{name token 0 (0)} // </item>

{name token 0 (0)} // </items>

{name token 0 (0)} // </purchaseOrder>

4.1.2.4 Entity

<?xml version="1.0"?>

{block n (20)}

{name token 2 (2)}

{type token 0 (8)}

{length 2 (21)}CA

{length 305 (21)}<purchaseOrder orderDate="1999-10-20"

xmlns="http://example.com:po"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="po.xsd">

<name>Alice Smith</name>

<street>123 Maple Street</street>

<city>Mill Valley</city>

<state>

{name token 2 (2)}

{length 728 (21)}</state>

</shipTo>

...

4.1.2.5 Value

We will assume the Unix epoch for dates.

Note that since the schema defines all types without unions, all of the type tokens are redundant and so have width 0 instead of 8.

<?xml version="1.0"?>

{block n (20)}

{length 26 (21)}<!ENTITY california "CA">

<purchaseOrder orderDate=

{type token 38 (0)}

{ value 10915 (32)}

xmlns=

{type token 0 (0)}

{length 21 (21)}http://example.com:po

xmlns:xsi=

{type token 0 (0)}

{length 41 (21)}http://www.w3.org/2001/XMLSchema-instance

xsi:noNamespaceSchemaLocation=

{type token 0 (0)}

{length 6 (21)}po.xsd

<shipTo country=

{type token 0 (0)}

{length 2 (0)}US

<name>

{type token 0 (0)}

{length 11 (21)}Alice Smith

</name>

{type token 0 (0)}

{length 16 (21)}123 Maple Street

</street>

<city>

{type token 0 (0)}

{length 11 (21)}Mill Valley

</city>

<state>

{type token 0 (0)}

{length 12 (21)}&california;

</state>

<zip>

{type token 0 (0)}

{length 5 (21)}90952

</zip>

</shipTo>

<billTo country=

{type token 0 (0)}

{length 2 (0)}US

<name>

{type token 0 (0)}

{length 12 (21)}Robert Smith

</name>

{type token 0 (0)}

{length 12 (21)}8 Oak Avenue

</street>

<city>

{type token 0 (0)}

{length 8 (21)}Old Town

</city>

<state>

{type token 0 (0)}

{length 2 (21)}PA

</state>

<zip>

{type token 0 (0)}

{length 8 (21)}95819

</zip>

</billTo>

{type token 0 (0)}

{length 29 (21)}Hurry, my lawn is going wild!

</comment>

<items>

<item partNum=

{type token 0 (0)}

{length 6 (0)}872-AA

{type token 0 (0)}

{length 9 (21)}Lawnmower

</productName>

{type token 10 (0)}

{value 1 (8)}

</quantity>

{type token 30 (0)}

{value 100 (64)}

{value 14895 (64)}

</USPrice>

{type token 0 (0)}

{length 24 (21)}Confirm this is electric

</comment>

</item>

<item partNum=

{type token 0 (0)}

{length 6 (0)}926-AA

{type token 0 (0)}

{length 12 (21)}Baby Monitor

</productName>

{type token 10 (0)}

{value 1 (8)}

</quantity>

{type token 30 (0)}

{value 100 (64)}

{value 3998 (64)}

</USPrice>

{type token 38 (0)}

{value 10763 (32)}

</shipDate>

</item>

</items>

</purchaseOrder>

4.1.3 Combined Transformations

In the following table all the above translation transformations are selected together, showing the original XML text, the resulting mapping into symbols (and the source of the symbols), the default rendering of those symbols (in hex), and the alternative rendering if the Octets transformation is also selected.

Original XML	Symbols	Source	Output	Octets
<?xml version="1.0"?>	{octet hex 42 (8)} {octet hex 58 (8)} {octet hex 10 (8)} {octet hex 10 (8)} {octet hex 00 (8)} {octet hex 00 (8)}	Declaration	42 58 10 10 1E 00	42 58 10 10 1F 00
	{schema 0 (0)} {length 8 (21)} {integer 0 (32)} {integer 15 (32)}	Schema	00 00 08 00 00 00 00 00 00 00 0F	91 E0 EF
<!ENTITY california "CA">	{name token 21 (2)} {type token 0 (8)} {length 2 (21)}CA	Entity	00 15 00 00 00 02 43 41	97 F5 B1 EA 63 61 6C 69 66 6F 72 6E 69 61 B1 E2 43 41
<purchaseOrder	{name token 3 (0)}	Name		95 E3 B1 ED 70 75 72 63 68 61 73 65 4F 72 64 65 72
orderDate=	{name token 18 (7)}	Name	12	96 F2 B1 E9 6F 72 64 65 72 44 61 74 65
"1999-10-20"	{type token 38 (0)} { value 10915 (32)}	Value	00 00 2A A3	A7 CF 2A A3
>	{name token 1 (7)}	Name	01	9B
<shipTo	{name token 5 (0)}	Name		95 E5 B1 E6 73 68 69 70 54 6F
>	{name token 1 (0)}	Name	01	9B
<name	{name token 8 (0)}	Name		95 E8 B1 E4 6E 61 6D 65
>	{name token 1 (7)}	Name	01	9B
Alice Smith	{type token 0 (0)} {length 11 (21)}Alice Smith	Value	00 00 0B 41 6C 69 63 65 20 53 6D 69 74 68	B1 EB 41 6C 69 63 65 20 53 6D 69 74 68
</name>	{name token 0 (0)}	Name		9C
<street>	{name token 9 (0)}	Name		95 E9 B1 E5 73 74 72 65 65 74
>	{name token 1 (0)}	Name		9B
123 Maple Street	{type token 0 (0)} {length 16 (21)}123 Maple Street	Value	00 00 10 31 32 33 20 4D 61 70 6C 65 20 53 74 72 65 65 74	B1 F0 31 32 33 20 4D 61 70 6C 65 20 53 74 72 65 65 74
</street>	{name token 0 (0)}	Name		9C
<city	{name token 10 (0)}	Name		95 EA B1 E4 63 69 74 79
>	{name token 1 (0)}	Name		9B
Mill Valley	{type token 0 (0)} {length 11 (21)}Mill Valley	Value	00 00 0B 3E 4D 69 6C 6C 20 56 61 6C 6C 65 79	B1 EB 3E 4D 69 6C 6C 20 56 61 6C 6C 65 79
</city>	{name token 0 (0)}	Name		9C
<state	{name token 11 (0)}	Name		95 EB B1 E5 73 74 61 74 65
>	{name token 1 (0)}	Name		9B
	{length 0 (21)}	Entity	00 00 00
&california;	{name token 21 (7)}	Entity	15	9A F5
	{length 0 (21)}	Entity	00 00 00
</state>	{name token 0 (0)}	Name		9C
<zip	{name token 12 (0)}	Name		95 EC B1 E3 7A 69 70
>	{name token 1 (0)}	Name		9B
90952	{type token 0 (0)} {length 5 (21)}90952	Value	00 00 05 39 30 39 35 32	B1 E5 39 30 39 35 32
</zip>	{name token 0 (0)}	Name		9C
</shipTo>	{name token 0 (0)}	Name		9C
<billTo	{name token 6 (0)}	Name		95 E6 B1 E6 62 69 6C 6C 54 6F
>	{name token 1 (0)}	Name		9B
<name	{name token 8 (0)}	Name		98 E8
>	{name token 1 (0)}	Name		9B
Robert Smith	{type token 0 (0)} {length 12 (21)}Robert Smith	Value	00 00 0C 52 6F 62 65 72 74 20 53 6D 69 74 68	B1 EC 52 6F 62 65 72 74 20 53 6D 69 74 68
</name>	{name token 0 (0)}	Name		9C
<street	{name token 9 (0)}	Name		98 E9
>	{name token 1 (0)}	Name		9B
8 Oak Avenue	{type token 0 (0)} {length 12 (21)}8 Oak Avenue	Value	00 00 0C 38 20 4F 61 6B 20 41 76 65 6E 75 65	B1 EC 38 20 4F 61 6B 20 41 76 65 6E 75 65
</street>	{name token 0 (0)}	Name		9C
<city	{name token 10 (0)}	Name		98 EA
>	{name token 1 (0)}	Name		9B
Old Town	{type token 0 (0)} {length 8 (21)}	Value	00 00 08 4F 6C 64 20 54 6F 77 6E	B1 E8 4F 6C 64 20 54 6F 77 6E
</city>	{name token 0 (0)}	Name		9C
<state	{name token 11 (0)}	Name		98 EB
>	{name token 1 (0)}	Name		9B
PA	{type token 0 (0)} {length 2 (21)}PA	Value	00 00 02 50 41	B1 E2 50 41
</state>	{name token 0 (0)}	Name		9C
<zip	{name token 12 (0)}	Name		98 EC
>	{name token 1 (0)}	Name		9B
95819	{type token 0 (0)} {length 5 (21)}95819	Value	00 00 05 39 35 38 31 39	B1 E5 39 35 38 31 39
</zip>	{name token 0 (0)}	Name		9C
</billTo>	{name token 0 (0)}	Name		9C
<comment	{name token 4 (7)}	Name	04	95 E4 B1 E7 63 6F 6D 6D 65 6E 74
>	{name token 1 (0)}	Name		9B
Hurry, my lawn is going wild!	{type token 0 (0)} {length 29 (21)}Hurry, my lawn is going wild!	Value	00 00 1D 48 75 72 72 79 2C 20 6D 79 20 6C 61 77 6E 20 69 73 20 67 6F 69 6E 67 20 77 69 6C 64 21	B1 FD 48 75 72 72 79 2C 20 6D 79 20 6C 61 77 6E 20 69 73 20 67 6F 69 6E 67 20 77 69 6C 64 21
</comment>	{name token 0 (0)}	Name		9C
<items	{name token 7 (7)}	Name	07	95 E7 B1 E5 69 74 65 6D 73
>	{name token 1 (0)}	Name		9B
<item	{name token 13 (7)}	Name	0D	95 ED B1 E4 69 74 65 6D
partNum=	{name token 20 (0)}	Name		96 F4 B1 E7 70 61 72 74 4E 75 6D
"872-AA"	{type token 0 (0)} {length 6 (0)}872-AA	Value	00 00 06 38 37 32 2D 41 41	B1 E6 38 37 32 2D 41 41
>	{name token 1 (0)}	Name		9B
<productName	{name token 14 (0)}	Name		95 EE B1 ED 70 72 6F 64 75 63 74 4E 61 6D 65
>	{name token 1 (0)}	Name		9B
Lawnmower	{type token 0 (0)} {length 9 (21)}Lawnmower	Value	00 00 09 4C 61 77 6E 6D 6F 77 65 72	B1 E9 4C 61 77 6E 6D 6F 77 65 72
</productName>	{name token 0 (0)}	Name		9C
<quantity	{name token 15 (0)}	Name		95 EF B1 E8 71 75 61 6E 74 69 74 79
>	{name token 1 (0)}	Name		9B
1	{type token 10 (0)} {value 1 (8)}	Value	01	E1
</quantity>	{name token 0 (0)}	Name		9C
<USPrice	{name token 16 (0)}	Name		95 F0 B1 E7 55 53 50 72 69 63 65
>	{name token 1 (0)}	Name		9B
148.95	{type token 30 (0)} {value 100 (64)} {value 14895 (64)}	Value	00 00 00 00 00 00 00 64 00 00 00 00 00 00 3A 2F	B2 D0 64 CF 3A 2F
</USPrice>	{name token 0 (0)}	Name		9C
<comment	{name token 4 (7)}	Name	04	98 E4
>	{name token 1 (0)}	Name		9B
Confirm this is electric	{type token 0 (0)} {length 24 (21)}Confirm this is electric	Value	00 00 18 43 6F 6E 66 69 72 6D 20 74 68 69 73 20 69 73 20 65 6C 65 63 74 72 69 63	B1 F8 43 6F 6E 66 69 72 6D 20 74 68 69 73 20 69 73 20 65 6C 65 63 74 72 69 63
</comment>	{name token 0 (0)}	Name		9C
</item>	{name token 0 (0)}	Name		9C
<item	{name token 13 (7)}	Name		98 ED
partNum=	{name token 20 (0)}	Name		99 F4
"926-AA"	{type token 0 (0)} {length 6 (0)}926-AA	Value	00 00 06 39 32 36 2D 41 41	B1 E6 39 32 36 2D 41 41
>	{name token 1 (0)}	Name		9B
<productName	{name token 14 (0)}	Name		98 EE
>	{name token 1 (0)}	Name		9B
Baby Monitor	{type token 0 (0)} {length 12 (21)}Baby Monitor	Value	00 00 0C 42 61 62 79 20 4D 6F 6E 69 74 6F 72	B1 EC 42 61 62 79 20 4D 6F 6E 69 74 6F 72
</productName>	{name token 0 (0)}	Name		9C
<quantity	{name token 15 (0)}	Name		98 EF
>	{name token 1 (0)}	Name		9B
1	{type token 10 (0)} {value 1 (8)}	Value	01	E1
</quantity>	{name token 0 (0)}	Name		9C
<USPrice	{name token 16 (0)}	Name		98 F0
>	{name token 1 (0)}	Name		9B
39.98	{type token 30 (0)} {value 100 (64)} {value 3998 (64)}	Value	00 00 00 00 00 00 00 64 00 00 00 00 00 00 0F 9E	B2 D0 64 DF 9E
</USPrice>	{name token 0 (0)}	Name		9C
<shipDate	{name token 17 (7)}	Name	11	95 F1 B1 E8 73 68 69 70 44 61 74 65
>	{name token 1 (0)}	Name		9B
1999-05-21	{type token 38 (0)} {value 10763 (32)}	Value	00 00 2A 0B	A7 CF 2A 0B
</shipDate>	{name token 0 (0)}	Name		9C
</item>	{name token 0 (0)}	Name		9C
</items>	{name token 0 (0)}	Name		9C
</purchaseOrder>	{name token 0 (0)}	Name		9C