Related Work: XSD-based Data Format Description Language

I wanted to call the attention of this group to the existance of another
synergistic standards effort taking place within the Global Grid Forum
(GGF). The project is called Data Format Description Language (DFDL). 

It is likely that there are people working on binary XML that also share
interests with those working on the GGF DFDL. We certainly hear requests for
clarification of what we're doing in the GGF DFDL that distinguishes our
project from the binary-XML, ASN.1, XDR, and so forth, and I'm providing
this posting because I'm assuming you will encounter the symmetric issue.

In common between binary-XML and GGF-DFDL are requirements to save space and
expended computation/energy by using binary formats for density. 

The big difference is that binary-XML is a prescriptive approach, that is,
it specifies a universal format that data must be put in. Binary-XML shares
this category with ASN.1 and XDR. The DFDL apprach is descriptive. That is,
the data has some format. You describe in DFDL the format the data is in. A
good example of why we need this is that high-performance programs often
want to arrange for data structures to be aligned and directly mappable into
memory layouts or randomly accessible on disk. DFDL allows data to meet
these requirements while still being universally described for interchange
with other programs. We also intend to accomodate a broad array of legacy
data formats.

What makes the binary-XML and GGF-DFDL approaches very synergistic is that
GGF-DFDL has chosen the XML Schema Descrption Language as its core. The idea
is that you describe the information content of the data using an XSD. You
then add standard annotations to this which provide the format/layout
information. 

Here's a brief example to clarify. Here's some binary data displayed in a
hex dump:

   0000 0005 0077 9e8c 
   169a 54dd 0a1b 4a3f 
   ce29 46f6

Here's the same information content in XML:

   <w>5</w>
   <x>7839372</x>
   <y>8.6E-200</y>
   <z>-7.1E8</z>

Here's the DFDL (XSD + annotations) which describe the binary data:

   <xs:complexType name="example1">
   	<xs:annotation>
   		<xs:appinfo>
   			<binaryProperties>
   				<byteOrder>bigEndian</byteOrder>
   			</binaryProperties>
   		</xs:appinfo>
   	</xs:annotation>
   	<xs:sequence>
   		<xs:element name="w" type="dfdl:binaryInt"/>
   		<xs:element name="x" type="dfdl:binaryInt"/>
   		<xs:element name="y" type="dfdl:binaryDouble"/>
   		<xs:element name="z" type="dfdl:binaryFloat"/>
   	</xs:sequence>
   </xs:complexType>

Here's the same information content in a text format:

   5, 7839372, 8.6E-200, -7.1E8

Here's the DFDL for describing it:

   <xs:complexType name="example1">
	<xs:annotation>
		<xs:appinfo>
			<characterProperties>
				<characterSet>UTF-8</characterSet>
			</characterProperties>
			<numericTextProperties>
				<decimalSeparator>.</decimalSeparator>
			</numericTextProperties>
			<groupProperties>
				<fieldSeparator>,</fieldSeparator>
			</groupProperties>
		</xs:appinfo>
	</xs:annotation>
	<xs:sequence>
		<xs:element name="w" type="dfdl:textInt"/>
		<xs:element name="x" type="dfdl:textInt"/>
		<xs:element name="y" type="dfdl:textDouble"/>
		<xs:element name="z" type="dfdl:textFloat"/>
	</xs:sequence>
   </xs:complexType>

The above syntax is still in development, but gives the flavor of what we're
studying.
Those in interested in more information about the GGF DFDL-WG can find out
more at http://forge.gridforum.org/projects/dfdl-wg, or www.ggf.org. The
most relevant information is found at
http://forge.gridforum.org/docman2/ViewCategory.php?group_id=113&category_id
=753

Thanks for your attention

Mike Beckerle
Co-Chair DFDL Working Group, Global Grid Forum
Ascential Software
50 Washington St. 
Westborough, MA 01581
508-366-3888

Received on Monday, 9 August 2004 15:03:31 UTC