"BX" Proposal for XML Binary Format
Paul R. Pierce
Abstract
A proposal for a binary format for XML, specified in terms of a set of independent reversible transformations between standard XML and a binary form.
Revision
February 5, 2006
Contents
XML is a very successful syntax for carrying information, being a careful compromise between many conflicting goals to hit the broad center of possible uses. One significant non-goal for XML was to be terse, so XML is completely text based and tends to be verbose. A binary format for XML would describe alternate syntax to carry information just the same way as XML but in a form more suitable for many valuable fringe uses for which standard XML is not suitable, often because it is too verbose.
Other than possibly a need for terseness these other uses do not consistently have a lot in common. To achieve sufficient generality to cover more than one corner of the possible uses, the binary format must either be extremely clever or it must be flexible. This proposal goes in the direction of flexibility by specifying a number of options. In fact, the proposal is little more than a specification of a set of optional transformations on standard XML that yield a more terse syntax, together with a simple mechanism for selecting from the options.
The transformations specified here were selected in an attempt to meet the requirements set forth in the report of the W3C XML Binary Characterization working group. Most either represent existing practice or are an obvious, simple invention to directly address a need of the format. Each transformation is reversible and as much as possible independent of the others.
The binary format for XML is specified in terms of a set of transformations applied to standard XML, resulting in a new syntax that carries most or all of the original textual format and all of the original data. The transformations are ordered, mostly independent, optional, and reversible. Thus one simple use model is to encode standard XML by applying a sequence of selected transformations, then later decode by applying the reverse transformations in reverse order. However, the results of many of the transformations are straightforward and can be constructed and used directly without producing or processing the textual form. These transformations might be called translations, and form the first part of the ordered sequence, the remaining transformations are for compression.
To accurately and efficiently translate names and different data types its necessary to use tables or some sort of magic; if tables are used they can be explicitly included in the format or implicitly derived from schema or example. To avoid excessive invention this proposal specifies the use of tables implicitly derived from unmodified XML-Schema documents. To better achieve some of the desired characteristics, there is also specified a schema for annotation of XML-Schema, described in the last section. The optional annotation provides more explicit direction for building the tables and configuring the transformations. This specification does not support use of DTD's unless they are first translated to and replaced by XML-Schema.
Only some of the transformations need schema information. Early translation transformations interpret the schema, later transformations inherit this information.
Some of the compression transformations are performed on blocks rather than the entire data set, when several of them are selected they all operate on the same blocks instead of choosing blocks independently.
Here are the specified transformations:
Use of the binary format and the set of transformations to be used are specified in the encoding attribute of the XML declaration.
This section contains the complete specification of the "BX" binary format. It is presented as a process of encoding a presumed XML document into the "BX" format.
The binary format is declared as an encoding in the "encoding" attribute of the XML declaration. This is consistent with XML 1.x if one takes a broad view of what it means to encode a stream of characters. The declaration indicates use of the binary format, provides a version number to indicate the specific specification in use, and indicates which of the optional transformations are selected.
The encoding name consists of the prefix "BX-" followed by a two-part version number such as "1.0" followed by the transformation selection suffix. The transformation selection suffix consists of a dash followed by the capitalized first letters of the selected transformations, in order.
BxEncName ::= 'BX-' Version '-' ['D']['S']['N']['E']['V']['T']['O']['B']['Z']['H']
An example declaration that selects all of the translation transformations except Declaration might be:
<?XML version="1.1" encoding="BX-1.0-SNEVO"?> |
All the transformations are specified in terms of streams of symbols. Symbols are essentially integers. The initial symbols are the Unicode characters of the presumed input XML document. Transformations can produce different kinds of symbols. The final result of a sequence of transformations is the simple concatenation of the binary representation of all the symbols.
During intermediate processing steps individual symbols or groups of symbols might have associated type or other information. Any such information that remains at the end of the last selected transformation is lost, except for width. Each symbol has an associated width in bits that determines how much space it occupies in the final stream.
Unicode characters are symbols with width 21 bits and type "character". The range of values is that specified in XML 1.x and/or Unicode.
An octet is a symbol with width 8 bits and type "octet". The range of values is 0-255.
Length symbols are used to define lists. They are only produced by translation transformations, in fact, the Name, Entity and Value translations always turn all character sequences into lists. They are integers that provide a count of the number output 8-bit bytes used by the following symbols in the list, excluding the length symbol itself. There is a global width for length symbols that depends on the maximum block size. For the default maximum block size the width is 21 bits. Length symbols are always either the global width or, if redundant, width 0.
Lists can cross block boundaries. In this case the list is segmented with a length symbol at the head of each segment. The most significant bit of the value of the length symbol indicates whether there are more segments. 0 means this is the final (or only) segment, 1 means there are more segments.
Count symbols are produced in compression transformations. Like length symbols they often tell how many symbols follow, but their value is the number of symbols rather than the number of bytes taken up by the symbols.
Name tokens represent element, attribute and entity names and delimit their respective constructs. They are symbols of type "name token".The value range for name tokens is built up from three contiguous subranges, one each for elements, attributes and entities. Once the name token symbol space is fully evaluated the global name token width is defined as the number of bits required to represent the maximum value.
There are two special name tokens. Token 0 is the end-of-element token, and token 1 is the end-of-attributes token.
There is also a pre-defined element name token, transpose, that takes the first number (2) of the element name range.
The width of an individual name token can be either the global token width or zero. A token with zero width still has a value and other properties, but is redundant and so won't appear unless it is subsequently converted to some other symbol.
Type tokens represent XML-Schema primitive types and their encodings, and are symbols of type "type token" and width 8 or 0, where as for name tokens width zero indicates that its presence is not required in the output. The possible values of type tokens are given in Table 1. Each type token also has a property of whether or not it is a list, encoded in its value as even for plain and odd for list. A type token is followed by value symbols that encode the value of an attribute or element body.
Value |
Hex |
Primitive Type |
Encoding (Width) |
0, 1 |
00, 01 |
string or anySimpleType |
list of characters |
2, 3 |
02, 03 |
nil |
none |
4, 5 |
04, 05 |
boolean |
integer (1) |
6, 7 |
06, 07 |
float |
float (32) |
8, 9 |
08, 09 |
double |
double (64) |
10, 11 |
0A, 0B |
decimal |
unsigned integer (8) |
12, 13 |
0C, 0D |
decimal |
signed integer (8) |
14, 15 |
0E, 0F |
decimal |
unsigned integer (16) |
16, 17 |
10, 11 |
decimal |
signed integer (16) |
18, 19 |
12, 13 |
decimal |
unsigned integer (32) |
20, 21 |
14, 15 |
decimal |
signed integer (32) |
22, 23 |
16, 17 |
decimal |
signed integer (64) |
24, 25 |
18, 19 |
decimal |
fraction (8), (32) |
26, 27 |
1A, 1B |
decimal |
fraction (16), (64) |
28, 29 |
1C. 1D |
decimal |
fraction (32), (64) |
30, 31 |
1E, 1F |
decimal |
fraction (64), (64) |
32, 33 |
20, 21 |
duration |
fraction (32), (64) |
34, 35 |
22, 23 |
dateTime |
fraction (32), (64), tz (16) |
36, 37 |
24, 25 |
time |
fraction (32), (64) |
38, 39 |
26, 27 |
date |
integer (32) |
40, 41 |
28, 29 |
gYearMonth |
integer (16) |
42, 43 |
2A, 2B |
gYear |
integer (16) |
44, 45 |
2C, 2D |
gMonthDay |
integer (16) |
46, 47 |
2E, 2F |
gMonth |
integer (8) |
48, 49 |
30, 31 |
gDay |
integer (8) |
50, 51 |
32, 33 |
hexBinary or base64Binary |
list of octets |
52, 53 |
34, 35 |
anyURI |
list of characters |
54, 55 |
36, 37 |
QName |
name token |
56, 57 |
38, 39 |
QName |
list of characters |
58, 59 |
3A, 3B |
enumeration |
integer (8) |
60, 61 |
3C, 3D |
enumeration |
integer (16) |
62, 63 |
3E, 3F |
enumeration |
integer (32) |
Table 1
Value symbols encode the numeric data values shown in Table 1.
Integer is a signed or unsigned twos-complement integer value of the given width.
Float is a 32-bit IEEE-754-1985 floating-point value. Double is a 64-bit IEEE-754-1985 floating-point value.
Fraction is a pair of integers, which when the second is divided by the first yield the value. The first (divisor) is unsigned and non-zero, the second (dividend) is signed. When a fraction is derived from a decimal string literal, the divisor will always be a power of ten, but in general divisor values are not so limited.
The primitive types duration, dateTime and time are encoded in seconds as fractions. For dateTime the fraction is followed by the timezone, which is a signed 16-bit integer. Zero timezone means UTC, while the special value -32768 means there is no timezone specified.
TODO: is there an issue for dateTime and date with regard to the epoch? Should the date instead be represented as gYear, gMonth, gDay? If so, should duration be the same?
An implicit transformation in this specification is the rendering of symbols to the final bit stream of the binary format. Two of the transformations, Octet and Huffman, make this straightforward as their output is either octets or bits. However, they are both optional and if neither is selected there is a default rendering. This rendering is also referenced in creating blocks and length symbols.
Each symbol is padded on the left (most significant bits) with zeroes so that its width is a multiple of 8. Then it is placed in the stream in network byte order, most significant 8-bit bytes first.
All properties and type information are discarded. Symbols of width zero are skipped altogether.
Several transformations depend on or can make use of schema for the document. In all cases this is XML-Schema, optionally annotated with the BX Schema Annotation described in section 3.3. All such transformations are specified to operate properly with any valid XML-Schema. If no schema exists it is possible to automatically create a schema for a given document and either encapsulate it with the document (see XML Container below) or select the Octet transformation which preserves type and other needed information. If a DTD exists it might be possible to perform a canonical translation of the DTD into a schema, and use that.
The Schema transformation depends on a registration scheme for XML-Schema documents. The details are outside the scope of this specification, but we will assume a two-level mechanism similar to that used for PCI devices. A global entity such as the W3C registers parties who wish to register documents, and assigns each party a number. Each party then assigns its own numbers to its schema documents. The global entity maintains a web accessible database of the parties, accessible by number, providing for each a link to their web-accessible database of documents. Each party's database is accessible by document IRI and number, and given one provides the other and a link to the schema document itself.
The party number 0 is reserved for local, private and experimental use. For use in examples we will also assume that party number 1 is the W3C and the XML-Schema instance document is the W3C's document number 1.
If it should happen that a recommendation is created for an XML container schema, then it would be possible to encapsulate required schema along with a document. In this case all translation transformations would recognize the container syntax and would, when necessary, treat encapsulated documents individually.
Several transformations work on blocks of symbols. In addition, length symbols are limited to the block size, so transformations that produce length symbols also respect block boundaries. When any such transformations are selected, the document is divided into contiguous blocks at the input to the first transformation that uses blocks (or within the first transformation as an optimization) and all the transformations operate on the same blocks.
Unless configured differently by use of XML Schema Annotation, block boundaries are placed between the outermost elements possible. Maximum block size is 1 binary megabyte, 1048576 8-bit bytes, but can be configured to a smaller size.
Each block is preceded by a symbol of type "block" of width 20 bits. The value of the symbol is the length of the block in 8-bit bytes. These symbols do not participate in transformations except their value is kept up to date when the size of the block changes and some transformations alter the width to zero or back to 20.
The transformations are specified in terms of how they map an input symbol stream to an output symbol stream. All the transformations are supposed to be reversible, so both encoding and decoding are possible as well as direct production and interpretation of translated symbol streams or serialization from and parsing to internal representations.
It is common practice in binary specifications to specify packed values in terms of bit fields, but here there are a few places where the packing is more complicated, so packed values are specified in terms of integer ranges and sums of integer products. In many cases it should be obvious that they are in fact bit fields.
The Declaration transformation recasts the XML declaration into six octets. The first two octets are the characters 'B' and 'X'. The third octet encodes the XML version as 16*<major version> + <minor version>. The fourth octet similarly encodes the binary format version. The fifth octet encodes the selected translation transformations except for Declaration, which is implicit when the BX prefix is present. The sixth octet similarly encodes the selected compression transformations.
The fifth octet flags are encoded as 16*<Schema> + 8*<Name> + 4*<Entity> + 2*<Value> + <Octet>.
The sixth octet flags are encoded as 8*<Transpose> + 4*<Block> + 2*<Ziv-Lempel> + <Huffman>.
Using the example above, the following are equivalent except in the use of the Declaration transformation, with the second shown in hex:
<?XML version="1.1" encoding="BX-1.0-SNEVO"?> |
42 58 11 10 1F 00 |
All further transformations pass the declaration (in either form) through unmodified.
TODO: Should there be a bit for the Standalone Document Declaration? If so, a document that is not standalone might be identified by using the most significant bit of the sixth octet (128*<not standalone>).
When registered XML-Schema are referenced in the document element, the Schema transformation removes no longer needed markup (i.e. xsi:schemaLocation, xsi:noNamespaceSchemaLocation) and inserts binary reference symbols after the declaration. In any case it inserts a length symbol before any reference symbols.
The Schema transformation introduces a special schema symbol, with no value and 0 width, followed by a list (beginning with a length symbol) of pairs of 32-bit unsigned integers. Each pair is the party number followed by the document number. The schema symbol and list immediately follows the XML declaration. The rest of the document remains unchanged except to eliminate from the document element any xsi:schemaLocation and xsi:noNameSchemaLocation attributes for the listed schema.
In the case of container documents, the Schema transformation is performed at the document element of each encapsulated document, with all schema added to the single list following the XML declaration.
Element and attribute names and syntax map to token symbols. Namespace declarations are eliminated.
The transformation numbers each element and attribute name found in the schema in order, implicitly building a table of names to numbers, and determining the global token properties.
All namespace declaration attributes are removed from the document element.
Within the document body, including the document element, each element name together with its introductory angle bracket and whitespace maps to the corresponding element token. If the presence of a specific element can be deduced from the schema, its token width is zero. If an element has a body the closing angle bracket and surrounding white space maps to the end-of-attributes token, and the element's end-tag maps to the end-of-element token. If there is no body the closing angle bracket and surrounding white space maps to the end-of-element token. The width is zero on the final trail of end-of-element tokens in the document.
The attributes are sorted into the order in which they appear in the schema declaration for the element, required attributes first and then optional attributes. Each attribute name and its following equals sign and surrounding white space maps to the corresponding attribute token. For required attributes the token width is zero.
All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol.
Entity names and syntax map to token symbols and lists of characters.
Starting at the entity base number the transformation numbers each entity name found in the document, implicitly building a table of names to numbers, and determines or updates the global name token properties.
Note that if the Name transformation is not selected the entity base is 2.
All internal entity declarations are grouped at the head of the document, following all non-character symbols after the XML declaration. Each internal entity declaration maps to a sequence consisting of its name token followed by its replacement strings encoded as a string type token followed by a list of characters. Throughout the rest of the document entity references map to their corresponding name token.
All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol. Each entity reference must be surrounded on both sides with lists of characters, even if of zero length.
TODO: deal with external entities and parameter entities.
Element bodies and attribute values map to sequences consisting of a type symbol followed by an encoding of the value.
The transformation maps each attribute value and each simpleType in an element body to a type token and an encoding of the value. It does not rely on the Name transformation so must work either with text XML or with name tokens to find elements and attributes.
The type token reflects the ultimate primitive type (or enumeration) of the value according to the schema, and specifies the encoding of the value as shown in Table 1. The actual value encountered does not affect the type token unless the simpleType of the value involves a union.
In several cases the type token indicates the width of the value encoding. This width is determined from the schema by examining the restrictions on the type. If there are no restrictions, the maximum width is used. The actual value does not affect the width specified by the type token.
If the type is restricted by enumeration then the type token is of enumeration type with the width determined by the number of possibilities. The value is encoded as a zero-based integer index that selects among the possibilities in the order they are listed in the schema.
In the case of a union the appropriate type token is selected from a list of possible primitive types according to the schema. If multiple enumerations are encountered in a union they are combined in the order found into a single enumeration by concatenating their value ranges.
Since, for instance, the string type is encoded as a list, a list of strings results in two layers of length symbols. The outermost length embraces the entire list while each of the inner lengths corresponds to its string's characters.
Fixed attributes are left with both type token and value symbol width 0. Default values are not inserted.
All remaining contiguous sequences of characters in the document map to lists of characters beginning with a length symbol.
3.2.6 Transposition for Compression
The transposition transformation rearranges data for better compression in following steps. It is useful in applications that involve a large number of samples, where each sample contains several more-or-less independent data items. A programming language equivalent of the transformation would be changing an array of structures into a set of arrays of each of the original structure's elements. This places related data items next to each other. Experience shows this can result in modest but significant improvement in the amount of compression that can be obtained from lossless compression algorithms such as the compression transforms specified here.
The transformation works on sequences of leaf elements (elements with no body or body of simpleType), all with the same element name, the same number of attributes and, for body or attributes of list type, the same number of items in each list. An exception is that at the end of the sequence the attributes and numbers of items in lists can fall off monotonically. By default the minimum length of such a sequence of elements is 8. Transposition does not cross block boundaries.
Sequences are identified in terms of token symbols and their associated value encodings. (This transform produces no mapping unless both Name and Value translation are selected; this also means the attributes are already mapped into the same order in every element.) Each eligible sequence of N elements, with maximum body or attribute list length of M, maps to a new transpose element containing M new elements with the original name, but with all attributes and the body retyped to lists (if they were not a list already) of maximum length N. Each attribute and the body of the first new element has as value a list of the N (first) values of the corresponding attribute or body of the original element. In the second and subsequent new elements only those attributes or the body that were originally lists are present, and list the up to N second or subsequent values from those lists.
As a comprehensive example, consider the following fragment of XML. Note that the last sample element is not eligible because it reintroduces an attribute and the length of one of its lists increased.
<testrun> <sample a="11 12 13" b="11 12" c="1" d="1" e="1">1</sample> <sample a="21 22 23" b="21 22" c="2" d="2" e="2">2</sample> <sample a="31 32 33" b="31 32" c="3" d="3">3</sample> <sample a="41 42 43" b="41 42" c="4" d="4">4</sample> <sample a="51 52 53" b="51 52" c="5" d="5">5</sample> <sample a="61 62 63" b="61 62" c="6" d="6">6</sample> <sample a="71 72 73" b="71" c="7" d="7">7</sample> <sample a="81 82 83" b="81" c="8" d="8">8</sample> <sample a="91 92 93 94" b="91" c="9" d="9" e="9">9</sample> </testrun> |
The fragment maps to the equivalent of this XML:
<testrun> <bx:transpose> <sample a="11 21 31 41 51 61 71 81" b="11 21 31 41 51 61 71 81" c="1 2 3 4 5 6 7 8" d="1 2 3 4 5 6 7 8" e="1 2"> 1 2 3 4 5 6 7 8 </sample> <sample a="12 22 32 42 52 62 72 82" b="12 22 32 42 52 62"/> <sample a="13 23 33 43 53 63 73 83"/> </bx:transpose> <sample a="91 92 93 94" b="91" c="9" d="9" e="9">9</sample> </testrun> |
The Octet translation transformation provides an alternative to the standard output mechanism. It is less suited to applications that want static formatted data and more suited to streaming in that it uses variable length sequences that depend on the actual values encountered. In addition it carries enough type information that a document might be reconstructed without reference to a schema.
Past the XML declaration all symbol sequences (except block symbols) map to sequences of octets, preserving redundant type and length information. Each symbol or group of symbols translates to an introductory octet followed by zero or more additional octets. Table 2 lists the values of the introductory octet and what they mean. Unless otherwise noted, a range of values for the introductory octet means that it carries the most significant bits of the item value. Further data octets carry the rest of the value in network byte order. The definition of following octets is recursive, so one introductory octet might be followed by shorter sequences containing their own introductory octets.
Characters not preceded by token symbols map to a short sequence beginning with an introductory octet in the range 0-144.
A schema symbol and its associated list of registered schema numbers maps to a sequence beginning with an introductory octet in the range 145-147.
Length symbols are discarded. Lists of zero length are discarded.
Block length symbols are retained within the stream of octet symbols but set to 0 width.
A name token maps to an introductory octet with value 148-150 for the first occurrence, followed by an integer sequence and a string sequence listing the name; or for any subsequent occurrence value 151-153 followed by an integer sequence. The introductory octet indicates whether it is an element, attribute or entity name, and for the first occurrence of an entity name there is also a string carrying the replacement text. The end-of-attributes and end-of-element tokens have their own introductory octets, 154 and 155.
A type token and its following value map to sequences with different introductory octet ranges depending on the type. The string type maps to a sequence consisting of a length integer (providing the number of characters) followed by a sequence as described above for each character. This string sequence is in turn reused inside sequences for name tokens. Similarly the integer sequence is used for integer-valued decimal types as well as inside many of the other sequences.
If a type token is for a list it and its values map to a sequence beginning with an introductory octet with value 156 followed by an integer sequence that provides the number of items in the list, then followed by each item translated as for an individual item.
The length of an octet sequence depends only on its value, being the shortest length that will encode the value. Any length information carried from the schema by type tokens is discarded.
Introductory octet values |
Hex |
Symbols replaced |
Indicated type |
Following octets |
0-127 |
00 - 7F |
character in the range hex 00-7F |
character |
none |
128 |
80 |
character in the range hex 0080-FFFF |
character |
2 data |
129-144 |
81 - 90 |
character in the range hex 100000-10FFFF |
character |
2 data |
145 |
91 |
schema and list of length 1 pair |
schema |
2 integer |
146 |
92 |
schema and list of length 2 pair |
schema |
4 integer |
147 |
93 |
schema and longer list of length n pairs |
schema |
integer=n, 2*n integer |
148 |
94 |
(unused) |
none |
none |
149 |
95 |
1st element name token |
element |
integer, string |
150 |
96 |
1st attribute name token |
attribute |
integer, string |
151 |
97 |
entity definition name token and list of characters |
entity declaration |
integer, string, string |
152 |
98 |
element name token |
element |
integer |
153 |
99 |
attribute name token |
attribute |
integer |
154 |
9A |
entity name token |
entity reference |
integer |
155 |
9B |
name token |
end of attributes |
none |
156 |
9C |
name token |
end of element |
none |
157 |
9D |
list of length n |
list |
integer=n, n value items |
158 |
9E |
nil type token |
nil |
none |
159 |
9F |
boolean type token, value=false |
boolean |
none |
160 |
A0 |
boolean type token, value=true |
boolean |
none |
161 |
A1 |
float type token and value symbol |
float |
4 data |
162 |
A2 |
double type token and value symbol |
double |
8 data |
163 |
A3 |
duration type token and value symbols |
duration |
2 integer |
164 |
A4 |
dateTime type token and value symbols, no timezone |
dateTime |
2 integer |
165 |
A5 |
dateTime type token and value symbols with timezone |
dateTime |
3 integer |
166 |
A6 |
time type token and value symbols |
time |
2 integer |
167 |
A7 |
date type token and value symbol |
date |
integer |
168 |
A8 |
gYearMonth type token and value symbol |
gYearMonth |
integer |
169 |
A9 |
gYear type token and value symbol |
gYear |
integer |
170 |
AA |
gMonthDay type token and value symbol |
gMonthDay |
integer |
171 |
AB |
gDay type token and value symbol |
gDay |
integer |
172 |
AC |
gMonth type token and value symbol |
gMonth |
integer |
173 |
AD |
binary type token and octet list of length n |
binary data |
integer=n, n data |
174 |
AE |
anyURI type token and string |
anyURI |
string |
175 |
AF |
QName type token and name token |
QName |
integer |
176 |
B0 |
QName type token and string |
QName |
string |
177 |
B1 |
string or anySimpleType type token and list of n characters or (string) |
string |
integer=n, n character |
178 |
B2 |
fractional decimal type token and value symbol |
decimal |
2 integer |
200 |
C8 |
decimal type token and value symbol or (integer) |
decimal |
8 data (signed 2's complement integer) |
201 |
C9 |
decimal type token and value symbol or (integer) |
decimal |
7 data (signed 2's complement integer) |
202 |
CA |
decimal type token and value symbol or (integer) |
decimal |
6 data (signed 2's complement integer) |
203 |
CB |
decimal type token and value symbol or (integer) |
decimal |
5 data (signed 2's complement integer) |
204 |
CC |
decimal type token and value symbol or (integer) |
decimal |
4 data (signed 2's complement integer) |
205 |
CD |
decimal type token and value symbol or (integer) |
decimal |
3 data (signed 2's complement integer) |
206 |
CE |
decimal type token and value symbol or (integer) |
decimal |
2 data (unsigned integer) |
207 |
CF |
decimal type token and value symbol or (integer) |
decimal |
2 data (signed 2's complement integer) |
208-223 |
D0 - DF |
decimal type token and value symbol or (integer) |
decimal |
1 data (signed 2's complement integer, including 4 bits from intro octet) |
224-255 |
E0 - FF |
decimal type token and value symbol or (integer) |
decimal |
none (intro octet represents range 0-32) |
Table 2
3.2.8 Block Sort for Compression
The Block Sort transformation provides the same essential mechanism as that used in the bzip2 utility. It maps a block of symbols into a new sequence based on sorting. While this does not reduce their size, it significantly improves the chances for good compression in later steps.
Since the transformation destroys the original order of the symbols it is no longer possible to infer the width of each symbol. Thus, it must be followed by a transformation that encodes symbol width. Also, it benefits from run-length encoding (or better yet, simple dictionary compression) before the sort and it is of little use without dictionary compression following the sort. So this transformation cannot be used without Ziv-Lempel and Huffman coding. If the Block Sort transformation is selected without both of Ziv-Lempel and Huffman coding, Block Sort becomes the identity mapping.
The transformation remaps each block independently.
The block is first transformed using the Ziv-Lempel transformation below. The encoder need only search for runs, duplicated sequences immediately following the sequence they duplicate. The count symbols from the Ziv-Lempel transformation map to the beginning of the new block. Following the count symbols the original symbols in the block are sorted as follows.
Every symbol in the block is associated with a (reverse order) sequence beginning with the preceding symbol and going backwards, wrapping around from the beginning to the end of the block to finish back with the symbol itself. The new mapping orders the symbols by sorting their associated sequences by (unsigned integer) symbol value in lexical order such that symbols earlier in each sequence have greater priority. For the purpose of sorting, each symbol whose value is not a non-negative integer (i.e. float, double and negative integers) is treated as an unsigned binary integer of the symbol's width.
A new count symbol is prepended to the sequence (after the block symbol and Ziv-Lempel count symbols and before the new mapping). The value is the 0-based index of the symbol in the new mapping that was the first symbol in the original mapping.
Oddly enough, this transformation is completely reversible. A very short example will show how it works. Consider the sequence "bcab". These symbols correspond to the following reversed and wrapped sequences, followed by the sequences sorted with the symbol highlighted:
Original Symbols |
Reverse Wrapped Sequences |
Sequences Sorted |
Output Symbols |
Decoding Columns |
Reconstruction |
b |
bacb |
acbb |
b |
ab |
...ab |
c |
bbac |
bacb |
b* |
bb |
|
a |
cbba |
bbac |
c |
bc |
.bc |
b |
acbb |
cbba |
a |
ca |
..ca |
The new mapping is "bbca" and the 0-based index of the original first symbol (starred) is 1. To decode, first sort all the individual symbols into one column, then list the new mapping in a second column. The pair of symbols in each row is in the original order, now you just have to figure out how to connect the pairs. Start with the original first symbol in the second column of the indexed row, the starred "b". Note that its the second "b" in the column. Now find the second "b" in the first column. This is your first pair. Take the following "c" and repeat the process to find the next pair. Repeat until you recover the entire sequence.
The Ziv-Lempel compression transformation remaps the symbols within a block to a new, hopefully smaller, set of symbols. It is an LZ77 algorithm related to the dictionary algorithm used in gzip.
The transformation makes use of count symbols. It segments the original symbols within each block into sequences and remaps pairs of sequences (the second of which is a duplicate of any sequence earlier in the block) into a triple of count symbols followed, later, by the first sequence of the pair. (Sometimes the first sequence is nil). All the triples of count symbols are mapped to the front of the block, all the remaining original sequences of symbols follow.
Each triple of count symbols consists of, first, a count of unduplicated symbols (the number of symbols in the first sequence of the original pair); second, a count of duplicated symbols (the number of symbols in the second sequence of the original pair, which duplicates a sequence earlier in the block); third, the offset (in symbols) from the earlier duplicated sequence to the second sequence of the pair.
When a duplicate sequence is immediately followed by another, the second is treated as the second sequence of a new pair, where the first sequence is nil. In that case the first count symbol if its triple has value 0. In the same way, if the last sequence in the block is not a duplicate it is treated as the first sequence of a pair where the second sequence is nil, and in its triple the second count symbol has value 0.
There is always a final triple where the second count symbol has value 0, whether or not the block ends with a duplicate sequence. In this final triple the third symbol is not present, so it actually consists of only two symbols.
Here is an example of a block of letters segmented into unduplicated and duplicated segments, with nil segments depicted with a dash, and the count triples and symbols produced by the mapping:
abcdeeeeabcdf abcde eee - abcd f - 5 3 1 0 4 8 1 0 abcde f |
In finding duplicate sequences symbol values may be considered equal if they have the same unsigned integer value as for block sort. Any more strict definition of equality is also acceptable.
3.2.10 Huffman Coding Compression
The Huffman Coding compression transformation remaps the symbols within a block to a stream of bits. The mapping process involves alternating steps of Huffman Coding (producing bit counts, a list of symbols, and a bit stream) and Ziv-Lempel Compression as described above (producing count triples and a list of symbols). After the penultimate step there is a list of mixed count symbols followed by a bit stream. The count symbols are Huffman coded using a predefined symbol list and set of bit counts into a bit stream, resulting in a complete Huffman coded bit stream as the final mapping.
TODO: Actually specify the algorithms and/or their input and output.
All of the transformations are defined to work with unmodified XML-Schema. However, the default mappings do not provide sufficient flexibility to cover all of the possible uses of binary XML. To cover more possible uses there is an annotation schema to control the use of binary XML and guide the transformations. It also provides a namespace for the transpose element used in the Transpose transformation.
The annotation schema defines fragments and attributes to be inserted in the schema for a class of target binary XML documents. The binary element is for overall control and configuration and may be inserted anywhere at the top level of the schema. The others are for annotation within the schema to control how the transformations apply to various parts of target documents.
Some control of target documents is possible within standard XML-Schema, as described in some of the transformations above, without use of this annotation schema. The methods are described again here along with the new methods available with annotation so that all schema options are described in one place.
Here is an example of a fragment that can appear at the top level of a schema to constrain and configure the binary format of a document.
<bx:binary> <bx:required bx:encoding="BX-1.0-DSNEVO"/> <bx:optional bx:encoding="BX-1.0-ZH"/> <bx:transform bx:name="schema"> <bx:property bx:name="local" bx:value="true"/> <bx:property bx:name="global" bx:value="false"/> </bx:transform> <bx:property bx:name="blocksize" bx:value="65536"/> </bx:binary> |
This shows the two parts of the binary element.
The first part consists of the required and optional elements. The encoding attribute of the required element shows the minimum version and minimum set of selected transformations to be used in encoding a target document. The encoding attribute of the optional element shows the maximum version and additional allowed transformations. In either element the version is optional and the set of transformations can be empty, or can be "*" to indicate all transformations. In any case both "-" characters must be present. In the optional element the required transformations can be duplicated or not without changing the meaning. Note that the Declaration transformation can be specified here, even though it never appears in an a ctual XML declaration.
The second part consists of transform elements that provide configuration information to the transforms in the form of a property list consisting of property elements. Global configuration is provided by property elements outside a transform element.
TODO: Document some properties.
The enable and disable elements control the application of individual transforms (given in the name attribute) to schema-defined types.
The index attribute may be used in element and attribute type declarations to fix the value of the corresponding name token in the Name transformation.
Without using the annotation schema it is possible to determine the width of simpleType value symbols by restrictions on length and/or minimum and maximum value.
TODO: Write the actual schema document.
Here are examples of all of the translation transformations performed on the Purchase Order example from the XML-Schema tutorial.
For reference, here is the example document and its schema. The example document has been modified to include an internal entity and reference to the schema.
Following is the schema annotated in such a way as to produce the same results as in this example without annotation. This shows the use of some of the annotations and also shows the default numbering of name tokens.
4.1.1.1 Purchase Order Document
<?xml version="1.0"?> <!ENTITY california "CA"> <purchaseOrder orderDate="1999-10-20" xmlns="http://example.com:po" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="po.xsd"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>&california;</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> </items> </purchaseOrder> |
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation xml:lang="en"> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType> <xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="partNum" type="SKU" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <!-- Stock Keeping Unit, a code for identifying products --> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> |
4.1.1.3 Annotated Purchase Order Schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:bx="http://example.org/BX"> <xsd:annotation> <xsd:documentation xml:lang="en"> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation> <bx:binary> <bx:required bx:encoding="BX-1.0-DSNEV"/> <bx:optional bx:encoding="BX-1.0-O"/> </bx:binary> <xsd:element name="purchaseOrder" bx:index="3" type="PurchaseOrderType"/> <xsd:element name="comment" bx:index="4" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" bx:index="5" type="USAddress"/> <xsd:element name="billTo" bx:index="6" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" bx:index="7" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" bx:index="18" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" bx:index="8" type="xsd:string"/> <xsd:element name="street" bx:index="9" type="xsd:string"/> <xsd:element name="city" bx:index="10" type="xsd:string"/> <xsd:element name="state" bx:index="11" type="xsd:string"/> <xsd:element name="zip" bx:index="12" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" bx:index="19" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType> <xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" bx:index="13" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" bx:index="14" type="xsd:string"/> <xsd:element name="quantity" bx:index="15"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" bx:index="16" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" bx:index="17" type="xsd:date" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="partNum" bx:index="20" type="SKU" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <!-- Stock Keeping Unit, a code for identifying products --> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> |
4.1.2 Individual Transformations
The following sections show the mapping produced by each of the translation transformations (except Octet) selected individually, before the final output rendering.
Characters are shown naturally as above, symbols are shown in curly braces "{}" with the type first followed by the decimal value (unless preceded by "hex") and then the width (in bits) in parentheses. Comments to help keep track of whats happening follow "//" on each line, they are not part of the transformation, nor are any additional line breaks. Ellipses (...) indicate the rest of the document follows without modification.
{octet hex 42 (8)} {octet hex 58 (8)} {octet hex 10 (8)} {octet hex 10 (8)} {octet hex 00 (8)} {octet hex 00 (8)} <!ENTITY california "CA"> <purchaseOrder orderDate="1999-10-20" xmlns="http://example.com:po" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="po.xsd"> <shipTo country="US"> ... |
We will assume the po.xsd schema is registered privately as document number 15.
<?xml version="1.0"?> {schema 0 (0)} {length 8 (21)} {integer 0 (32)} {integer 15 (32)} <!ENTITY california "CA"> <purchaseOrder orderDate="1999-10-20" xmlns="http://example.com:po" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <shipTo country="US"> ... |
We will assume the noNamespaceSchemaLocation attribute is name 100 in a canonical schema for XML-Schema.
Notice that because the schema provides very strict guidance on the sequence of the content, most of the name tokens can be inferred from context and so have a width of 0.
<?xml version="1.0"?> {block n (20)} {length 24 {21)}<!ENTITY california "CA"> {name token 3 (0)} // <purchaseOrder> {name token 18 (7)} // orderDate {length 10 (21)}1999-10-20 {name token 121 (7)} // xsi:noNamespaceSchemaLocation {length 6 (21)}po.xsd {name token 1 (7)} {name token 5 (0)} // <shipTo> {name token 19 (0)} // country {length 2 (21)}US {name token 1 (0)} {name token 8 (0)} // <name> {name token 1 (0)} {length 11 (21)}Alice Smith {name token 0 (0)} // </name> {name token 9 (0)} // <street> {name token 1 (0)} {length 16 (21)}123 Maple Street {name token 0 (0)} // </street> {name token 10 (0)} // <city> {name token 1 (0)} {length 11 (21)}Mill Valley {name token 0 (0)} // </city> {name token 11 (0)} // <state> {name token 1 (0)} {length 12 (21)}&california; {name token 0 (0)} // </state> {name token 12 (0)} // <zip> {name token 1 (0)} {length 5 (21)}90952 {name token 0 (0)} // </zip> {name token 0 (0)} // </shipTo> {name token 6 (0)} // <billTo> {name token 19 (0)} // country {length 2 (21)}US {name token 1 (0)} {name token 8 (0)} // <name> {name token 1 (0)} {length 12 (21)}Robert Smith {name token 0 (0)} // </name> {name token 9 (0)} // <street> {name token 1 (0)} {length 12 (21)}8 Oak Avenue {name token 0 (0)} // </street> {name token 10 (0)} // <city> {name token 1 (0)} {length 8 (21)}Old Town {name token 0 (0)} // </city> {name token 11 (0)} // <state> {name token 1 (0)} {length 2 (21)}PA {name token 0 (0)} // </state> {name token 12 (0)} // <zip> {name token 1 (0)} {length 5 (21)}95819 {name token 0 (0)} // </zip> {name token 0 (0)} // </billTo> {name token 4 (7)} // <comment> {name token 1 (0)} {length 29 (21)}Hurry, my lawn is going wild! {name token 0 (0)} // </comment> {name token 7 (7)} // <items> {name token 1 (0)} {name token 13 (7)} // <item> {name token 20 (0)} // partNum {length 6 (21)}872-AA {name token 1 (0)} {name token 14 (0)} // <productName> {name token 1 (0)} {length 9 (21)}Lawnmower {name token 0 (0)} // </productName> {name token 15 (0)} // <quantity> {name token 1 (0)} {length 1 (21)}1 {name token 0 (0)} // </quantity> {name token 16 (0)} // <USPrice> {name token 1 (0)} {length 6 (21)}148.95 {name token 0 (0)} // </USPrice> {name token 4 (7)} // <comment> {name token 1 (0)} {length 24 (21)}Confirm this is electric {name token 0 (0)} // </comment> {name token 0 (7)} // </item> {name token 13 (7)} // <item> {name token 20 (0)} // partNum {length 6 (21)}926-AA {name token 1 (0)} {name token 14 (0)} // <productName> {name token 1 (0)} {length 12 (21)}Baby Monitor {name token 0 (0)} // </productName> {name token 15 (0)} // <quantity> {name token 1 (0)} {length 1 (21)}1 {name token 0 (0)} // </quantity> {name token 16 (0)} // <USPrice> {name token 1 (0)} {length 5 (21)}39.98 {name token 0 (0)} // </USPrice> {name token 17 (7)} // <shipDate> {name token 1 (0)} {length 10 (21)}1999-05-21 {name token 0 (0)} // </shipDate> {name token 0 (0)} // </item> {name token 0 (0)} // </items> {name token 0 (0)} // </purchaseOrder> |
<?xml version="1.0"?> {block n (20)} {name token 2 (2)} {type token 0 (8)} {length 2 (21)}CA {length 305 (21)}<purchaseOrder orderDate="1999-10-20" xmlns="http://example.com:po" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="po.xsd"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state> {name token 2 (2)} {length 728 (21)}</state> <zip>90952</zip> </shipTo> ... |
We will assume the Unix epoch for dates.
Note that since the schema defines all types without unions, all of the type tokens are redundant and so have width 0 instead of 8.
<?xml version="1.0"?> {block n (20)} {length 26 (21)}<!ENTITY california "CA"> <purchaseOrder orderDate= {type token 38 (0)} { value 10915 (32)} xmlns= {type token 0 (0)} {length 21 (21)}http://example.com:po xmlns:xsi= {type token 0 (0)} {length 41 (21)}http://www.w3.org/2001/XMLSchema-instance xsi:noNamespaceSchemaLocation= {type token 0 (0)} {length 6 (21)}po.xsd > <shipTo country= {type token 0 (0)} {length 2 (0)}US > <name> {type token 0 (0)} {length 11 (21)}Alice Smith </name> <street> {type token 0 (0)} {length 16 (21)}123 Maple Street </street> <city> {type token 0 (0)} {length 11 (21)}Mill Valley </city> <state> {type token 0 (0)} {length 12 (21)}&california; </state> <zip> {type token 0 (0)} {length 5 (21)}90952 </zip> </shipTo> <billTo country= {type token 0 (0)} {length 2 (0)}US > <name> {type token 0 (0)} {length 12 (21)}Robert Smith </name> <street> {type token 0 (0)} {length 12 (21)}8 Oak Avenue </street> <city> {type token 0 (0)} {length 8 (21)}Old Town </city> <state> {type token 0 (0)} {length 2 (21)}PA </state> <zip> {type token 0 (0)} {length 8 (21)}95819 </zip> </billTo> <comment> {type token 0 (0)} {length 29 (21)}Hurry, my lawn is going wild! </comment> <items> <item partNum= {type token 0 (0)} {length 6 (0)}872-AA > <productName> {type token 0 (0)} {length 9 (21)}Lawnmower </productName> <quantity> {type token 10 (0)} {value 1 (8)} </quantity> <USPrice> {type token 30 (0)} {value 100 (64)} {value 14895 (64)} </USPrice> <comment> {type token 0 (0)} {length 24 (21)}Confirm this is electric </comment> </item> <item partNum= {type token 0 (0)} {length 6 (0)}926-AA > <productName> {type token 0 (0)} {length 12 (21)}Baby Monitor </productName> <quantity> {type token 10 (0)} {value 1 (8)} </quantity> <USPrice> {type token 30 (0)} {value 100 (64)} {value 3998 (64)} </USPrice> <shipDate> {type token 38 (0)} {value 10763 (32)} </shipDate> </item> </items> </purchaseOrder> |
4.1.3 Combined Transformations
In the following table all the above translation transformations are selected together, showing the original XML text, the resulting mapping into symbols (and the source of the symbols), the default rendering of those symbols (in hex), and the alternative rendering if the Octets transformation is also selected.
Original XML |
Symbols |
Source |
Output |
Octets |
<?xml version="1.0"?> |
{octet hex 42 (8)} {octet hex 58 (8)} {octet hex 10 (8)} {octet hex 10 (8)} {octet hex 00 (8)} {octet hex 00 (8)} |
Declaration |
42 58 10 10 1E 00 |
42 58 10 10 1F 00 |
{schema 0 (0)} {length 8 (21)} {integer 0 (32)} {integer 15 (32)} |
Schema |
00 00 08 00 00 00 00 00 00 00 0F |
91 E0 EF |
|
<!ENTITY california "CA"> |
{name token 21 (2)} {type token 0 (8)} {length 2 (21)}CA |
Entity |
00 15 00 00 00 02 43 41 |
97 F5 B1 EA 63 61 6C 69 66 6F 72 6E 69 61 B1 E2 43 41 |
<purchaseOrder |
{name token 3 (0)} |
Name |
95 E3 B1 ED 70 75 72 63 68 61 73 65 4F 72 64 65 72 |
|
orderDate= |
{name token 18 (7)} |
Name |
12 |
96 F2 B1 E9 6F 72 64 65 72 44 61 74 65 |
"1999-10-20" |
{type token 38 (0)} { value 10915 (32)} |
Value |
00 00 2A A3 |
A7 CF 2A A3 |
> |
{name token 1 (7)} |
Name |
01 |
9B |
<shipTo |
{name token 5 (0)} |
Name |
95 E5 B1 E6 73 68 69 70 54 6F |
|
> |
{name token 1 (0)} |
Name |
01 |
9B |
<name |
{name token 8 (0)} |
Name |
95 E8 B1 E4 6E 61 6D 65 |
|
> |
{name token 1 (7)} |
Name |
01 |
9B |
Alice Smith |
{type token 0 (0)} {length 11 (21)}Alice Smith |
Value |
00 00 0B 41 6C 69 63 65 20 53 6D 69 74 68 |
B1 EB 41 6C 69 63 65 20 53 6D 69 74 68 |
</name> |
{name token 0 (0)} |
Name |
9C |
|
<street> |
{name token 9 (0)} |
Name |
95 E9 B1 E5 73 74 72 65 65 74 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
123 Maple Street |
{type token 0 (0)} {length 16 (21)}123 Maple Street |
Value |
00 00 10 31 32 33 20 4D 61 70 6C 65 20 53 74 72 65 65 74 |
B1 F0 31 32 33 20 4D 61 70 6C 65 20 53 74 72 65 65 74 |
</street> |
{name token 0 (0)} |
Name |
9C |
|
<city |
{name token 10 (0)} |
Name |
95 EA B1 E4 63 69 74 79 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
Mill Valley |
{type token 0 (0)} {length 11 (21)}Mill Valley |
Value |
00 00 0B 3E 4D 69 6C 6C 20 56 61 6C 6C 65 79 |
B1 EB 3E 4D 69 6C 6C 20 56 61 6C 6C 65 79 |
</city> |
{name token 0 (0)} |
Name |
9C |
|
<state |
{name token 11 (0)} |
Name |
95 EB B1 E5 73 74 61 74 65 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
{length 0 (21)} |
Entity |
00 00 00 |
||
&california; |
{name token 21 (7)} |
Entity |
15 |
9A F5 |
{length 0 (21)} |
Entity |
00 00 00 |
||
</state> |
{name token 0 (0)} |
Name |
9C |
|
<zip |
{name token 12 (0)} |
Name |
95 EC B1 E3 7A 69 70 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
90952 |
{type token 0 (0)} {length 5 (21)}90952 |
Value |
00 00 05 39 30 39 35 32 |
B1 E5 39 30 39 35 32 |
</zip> |
{name token 0 (0)} |
Name |
9C |
|
</shipTo> |
{name token 0 (0)} |
Name |
9C |
|
<billTo |
{name token 6 (0)} |
Name |
95 E6 B1 E6 62 69 6C 6C 54 6F |
|
> |
{name token 1 (0)} |
Name |
9B |
|
<name |
{name token 8 (0)} |
Name |
98 E8 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
Robert Smith |
{type token 0 (0)} {length 12 (21)}Robert Smith |
Value |
00 00 0C 52 6F 62 65 72 74 20 53 6D 69 74 68 |
B1 EC 52 6F 62 65 72 74 20 53 6D 69 74 68 |
</name> |
{name token 0 (0)} |
Name |
9C |
|
<street |
{name token 9 (0)} |
Name |
98 E9 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
8 Oak Avenue |
{type token 0 (0)} {length 12 (21)}8 Oak Avenue |
Value |
00 00 0C 38 20 4F 61 6B 20 41 76 65 6E 75 65 |
B1 EC 38 20 4F 61 6B 20 41 76 65 6E 75 65 |
</street> |
{name token 0 (0)} |
Name |
9C |
|
<city |
{name token 10 (0)} |
Name |
98 EA |
|
> |
{name token 1 (0)} |
Name |
9B |
|
Old Town |
{type token 0 (0)} {length 8 (21)} |
Value |
00 00 08 4F 6C 64 20 54 6F 77 6E |
B1 E8 4F 6C 64 20 54 6F 77 6E |
</city> |
{name token 0 (0)} |
Name |
9C |
|
<state |
{name token 11 (0)} |
Name |
98 EB |
|
> |
{name token 1 (0)} |
Name |
9B |
|
PA |
{type token 0 (0)} {length 2 (21)}PA |
Value |
00 00 02 50 41 |
B1 E2 50 41 |
</state> |
{name token 0 (0)} |
Name |
9C |
|
<zip |
{name token 12 (0)} |
Name |
98 EC |
|
> |
{name token 1 (0)} |
Name |
9B |
|
95819 |
{type token 0 (0)} {length 5 (21)}95819 |
Value |
00 00 05 39 35 38 31 39 |
B1 E5 39 35 38 31 39 |
</zip> |
{name token 0 (0)} |
Name |
9C |
|
</billTo> |
{name token 0 (0)} |
Name |
9C |
|
<comment |
{name token 4 (7)} |
Name |
04 |
95 E4 B1 E7 63 6F 6D 6D 65 6E 74 |
> |
{name token 1 (0)} |
Name |
9B |
|
Hurry, my lawn is going wild! |
{type token 0 (0)} {length 29 (21)}Hurry, my lawn is going wild! |
Value |
00 00 1D 48 75 72 72 79 2C 20 6D 79 20 6C 61 77 6E 20 69 73 20 67 6F 69 6E 67 20 77 69 6C 64 21 |
B1 FD 48 75 72 72 79 2C 20 6D 79 20 6C 61 77 6E 20 69 73 20 67 6F 69 6E 67 20 77 69 6C 64 21 |
</comment> |
{name token 0 (0)} |
Name |
9C |
|
<items |
{name token 7 (7)} |
Name |
07 |
95 E7 B1 E5 69 74 65 6D 73 |
> |
{name token 1 (0)} |
Name |
9B |
|
<item |
{name token 13 (7)} |
Name |
0D |
95 ED B1 E4 69 74 65 6D |
partNum= |
{name token 20 (0)} |
Name |
96 F4 B1 E7 70 61 72 74 4E 75 6D |
|
"872-AA" |
{type token 0 (0)} {length 6 (0)}872-AA |
Value |
00 00 06 38 37 32 2D 41 41 |
B1 E6 38 37 32 2D 41 41 |
> |
{name token 1 (0)} |
Name |
9B |
|
<productName |
{name token 14 (0)} |
Name |
95 EE B1 ED 70 72 6F 64 75 63 74 4E 61 6D 65 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
Lawnmower |
{type token 0 (0)} {length 9 (21)}Lawnmower |
Value |
00 00 09 4C 61 77 6E 6D 6F 77 65 72 |
B1 E9 4C 61 77 6E 6D 6F 77 65 72 |
</productName> |
{name token 0 (0)} |
Name |
9C |
|
<quantity |
{name token 15 (0)} |
Name |
95 EF B1 E8 71 75 61 6E 74 69 74 79 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
1 |
{type token 10 (0)} {value 1 (8)} |
Value |
01 |
E1 |
</quantity> |
{name token 0 (0)} |
Name |
9C |
|
<USPrice |
{name token 16 (0)} |
Name |
95 F0 B1 E7 55 53 50 72 69 63 65 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
148.95 |
{type token 30 (0)} {value 100 (64)} {value 14895 (64)} |
Value |
00 00 00 00 00 00 00 64 00 00 00 00 00 00 3A 2F |
B2 D0 64 CF 3A 2F |
</USPrice> |
{name token 0 (0)} |
Name |
9C |
|
<comment |
{name token 4 (7)} |
Name |
04 |
98 E4 |
> |
{name token 1 (0)} |
Name |
9B |
|
Confirm this is electric |
{type token 0 (0)} {length 24 (21)}Confirm this is electric |
Value |
00 00 18 43 6F 6E 66 69 72 6D 20 74 68 69 73 20 69 73 20 65 6C 65 63 74 72 69 63 |
B1 F8 43 6F 6E 66 69 72 6D 20 74 68 69 73 20 69 73 20 65 6C 65 63 74 72 69 63 |
</comment> |
{name token 0 (0)} |
Name |
9C |
|
</item> |
{name token 0 (0)} |
Name |
9C |
|
<item |
{name token 13 (7)} |
Name |
98 ED |
|
partNum= |
{name token 20 (0)} |
Name |
99 F4 |
|
"926-AA" |
{type token 0 (0)} {length 6 (0)}926-AA |
Value |
00 00 06 39 32 36 2D 41 41 |
B1 E6 39 32 36 2D 41 41 |
> |
{name token 1 (0)} |
Name |
9B |
|
<productName |
{name token 14 (0)} |
Name |
98 EE |
|
> |
{name token 1 (0)} |
Name |
9B |
|
Baby Monitor |
{type token 0 (0)} {length 12 (21)}Baby Monitor |
Value |
00 00 0C 42 61 62 79 20 4D 6F 6E 69 74 6F 72 |
B1 EC 42 61 62 79 20 4D 6F 6E 69 74 6F 72 |
</productName> |
{name token 0 (0)} |
Name |
9C |
|
<quantity |
{name token 15 (0)} |
Name |
98 EF |
|
> |
{name token 1 (0)} |
Name |
9B |
|
1 |
{type token 10 (0)} {value 1 (8)} |
Value |
01 |
E1 |
</quantity> |
{name token 0 (0)} |
Name |
9C |
|
<USPrice |
{name token 16 (0)} |
Name |
98 F0 |
|
> |
{name token 1 (0)} |
Name |
9B |
|
39.98 |
{type token 30 (0)} {value 100 (64)} {value 3998 (64)} |
Value |
00 00 00 00 00 00 00 64 00 00 00 00 00 00 0F 9E |
B2 D0 64 DF 9E |
</USPrice> |
{name token 0 (0)} |
Name |
9C |
|
<shipDate |
{name token 17 (7)} |
Name |
11 |
95 F1 B1 E8 73 68 69 70 44 61 74 65 |
> |
{name token 1 (0)} |
Name |
9B |
|
1999-05-21 |
{type token 38 (0)} {value 10763 (32)} |
Value |
00 00 2A 0B |
A7 CF 2A 0B |
</shipDate> |
{name token 0 (0)} |
Name |
9C |
|
</item> |
{name token 0 (0)} |
Name |
9C |
|
</items> |
{name token 0 (0)} |
Name |
9C |
|
</purchaseOrder> |
{name token 0 (0)} |
Name |
9C |