RE: Related Work: XSD-based Data Format Description Language

> -----Original Message-----
> From: public-xml-binary-request@w3.org 
> [mailto:public-xml-binary-request@w3.org] On Behalf Of 
> mike.beckerle@ascentialsoftware.com
> Sent: Monday, August 09, 2004 10:59
> To: public-xml-binary@w3.org
> Subject: Related Work: XSD-based Data Format Description Language
> 
> 
> 
> I wanted to call the attention of this group to the existance 
> of another synergistic standards effort taking place within 
> the Global Grid Forum (GGF). The project is called Data 
> Format Description Language (DFDL). 
> 
> It is likely that there are people working on binary XML that 
> also share interests with those working on the GGF DFDL. We 
> certainly hear requests for clarification of what we're doing 
> in the GGF DFDL that distinguishes our project from the 
> binary-XML, ASN.1, XDR, and so forth, and I'm providing this 
> posting because I'm assuming you will encounter the symmetric issue.
> 
> In common between binary-XML and GGF-DFDL are requirements to 
> save space and expended computation/energy by using binary 
> formats for density. 
> 
> The big difference is that binary-XML is a prescriptive 
> approach, that is, it specifies a universal format that data 
> must be put in. Binary-XML shares this category with ASN.1 
> and XDR. The DFDL apprach is descriptive. That is, the data 
> has some format. You describe in DFDL the format the data is 
> in. A good example of why we need this is that 
> high-performance programs often want to arrange for data 
> structures to be aligned and directly mappable into memory 
> layouts or randomly accessible on disk. DFDL allows data to 
> meet these requirements while still being universally 
> described for interchange with other programs. We also intend 
> to accomodate a broad array of legacy data formats.


Mike,

You may be aware that, within the ASN.1 family of standards, there is a
standard notation available for precisely the same purpose (if I understand
correctly what you describe above):  the "Encoding Control Notation" or ECN
(ISO/IEC 8825-3, ITU-T Rec. X.692).

ECN is designed to be used along with ASN.1.  You specify your abstract data
structures (types) in ASN.1 notation and specify the details of the
encodings in ECN notation.

Normally (without ECN), the ASN.1 abstract type definitions are encoded
using one of the standard encoding rules of ASN.1:  BER, PER, XER, and so
on.

With ECN, instead, you write an ECN module and associate it with the ASN.1
module.  Once you do that, the ASN.1 abstract type definitions are encoded
using the ECN specification, instead of the standard encoding rules.

ECN has many features that enable the specification of a large number of
details of the encodings, incuding how to determine the presence of optional
fields, how to determine the selected alternative of a CHOICE, how to
determine the end of a repetition, and other details such as field
pre-padding, post-padding, alignment, bit-order, variable-length encodings
for integers, huffman-style encodings for enumerations and choice indexes,
and so on.

ECN has been designed to enable:

- re-specifying legacy protocols (previously specified using narrative text
and tabular descriptions) in a fully-formal notation;

- tweaking the standard encodings (e.g., PER encodings) to achieve some
optimizations (for example, encoding an integer in a variable-length field
based on frequency analysis);

- designing totally new, optimized encodings in certain applications of
ASN.1 in which the standard encodings are not satisfactory.

The ECN standard can be downloaded free of charge from the ITU-T website:

http://www.itu.int/ITU-T/studygroups/com17/languages/X.692-0203.pdf

It may be worth-while for your working group to give a look at ECN, in case
they have not done it yet.

Alessandro Triglia
OSS Nokalva


> 
> What makes the binary-XML and GGF-DFDL approaches very 
> synergistic is that GGF-DFDL has chosen the XML Schema 
> Descrption Language as its core. The idea is that you 
> describe the information content of the data using an XSD. 
> You then add standard annotations to this which provide the 
> format/layout information. 
> 
> Here's a brief example to clarify. Here's some binary data 
> displayed in a hex dump:
> 
>    0000 0005 0077 9e8c 
>    169a 54dd 0a1b 4a3f 
>    ce29 46f6
> 
> Here's the same information content in XML:
> 
>    <w>5</w>
>    <x>7839372</x>
>    <y>8.6E-200</y>
>    <z>-7.1E8</z>
> 
> Here's the DFDL (XSD + annotations) which describe the binary data:
> 
>    <xs:complexType name="example1">
>    	<xs:annotation>
>    		<xs:appinfo>
>    			<binaryProperties>
>    				<byteOrder>bigEndian</byteOrder>
>    			</binaryProperties>
>    		</xs:appinfo>
>    	</xs:annotation>
>    	<xs:sequence>
>    		<xs:element name="w" type="dfdl:binaryInt"/>
>    		<xs:element name="x" type="dfdl:binaryInt"/>
>    		<xs:element name="y" type="dfdl:binaryDouble"/>
>    		<xs:element name="z" type="dfdl:binaryFloat"/>
>    	</xs:sequence>
>    </xs:complexType>
> 
> Here's the same information content in a text format:
> 
>    5, 7839372, 8.6E-200, -7.1E8
> 
> Here's the DFDL for describing it:
> 
>    <xs:complexType name="example1">
> 	<xs:annotation>
> 		<xs:appinfo>
> 			<characterProperties>
> 				<characterSet>UTF-8</characterSet>
> 			</characterProperties>
> 			<numericTextProperties>
> 				<decimalSeparator>.</decimalSeparator>
> 			</numericTextProperties>
> 			<groupProperties>
> 				<fieldSeparator>,</fieldSeparator>
> 			</groupProperties>
> 		</xs:appinfo>
> 	</xs:annotation>
> 	<xs:sequence>
> 		<xs:element name="w" type="dfdl:textInt"/>
> 		<xs:element name="x" type="dfdl:textInt"/>
> 		<xs:element name="y" type="dfdl:textDouble"/>
> 		<xs:element name="z" type="dfdl:textFloat"/>
> 	</xs:sequence>
>    </xs:complexType>
> 
> The above syntax is still in development, but gives the 
> flavor of what we're studying. Those in interested in more 
> information about the GGF DFDL-WG can find out more at 
> http://forge.gridforum.org/projects/dfdl-wg, or www.ggf.org. 
> The most relevant information is found at 
> http://forge.gridforum.org/docman2/ViewCategory.php?group_id=1
13&category_id
=753

Thanks for your attention

Mike Beckerle
Co-Chair DFDL Working Group, Global Grid Forum
Ascential Software
50 Washington St. 
Westborough, MA 01581
508-366-3888

Received on Monday, 9 August 2004 16:43:20 UTC