RE: use cases: binary XML for scientifc computing from Cutler, Roger (RogerCutler) on 2004-11-22 (public-xml-binary@w3.org from November 2004)

From: Cutler, Roger (RogerCutler) <RogerCutler@chevrontexaco.com>
Date: Mon, 22 Nov 2004 17:47:42 -0600
To: mike.beckerle@ascentialsoftware.com, aslom@cs.indiana.edu, sdw@lig.net
cc: whoschek@lbl.gov, xml-dev@lists.xml.org, public-xml-binary@w3.org, kchiu@cs.binghamton.edu, mgovinda@cs.binghamton.edu
Message-ID: <71C38086EA230D43941DD0A3BAFF8CA90591E4@bocnte2k3.hou150.chevrontexaco.net>
Seems to me that DFDL fits the floating point data usage case I
contributed very well -- except maybe for the desire for somebody else
to handle the floating point conversion issues between platforms.  That
is, "just get on with it" when your IO libraries expect different float
structures than that of the source of the data can be a bit painful.  My
contacts really don't writing that low level code over and over, with of
course the potential for getting it a bit wrong somehow each time.
 
-----Original Message-----
From: mike.beckerle@ascentialsoftware.com
[mailto:mike.beckerle@ascentialsoftware.com] 
Sent: Monday, November 22, 2004 5:31 PM
To: Cutler, Roger (RogerCutler); aslom@cs.indiana.edu; sdw@lig.net
Cc: whoschek@lbl.gov; xml-dev@lists.xml.org; public-xml-binary@w3.org;
kchiu@cs.binghamton.edu; mgovinda@cs.binghamton.edu
Subject: RE: use cases: binary XML for scientifc computing


I do believe that GGF DFDL is relevant to the discussion here.
https://forge.gridforum.org/projects/dfdl-wg/ is the site, and 
https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&cate
gory_id=803&document_content_id=2973 (or http://tinyurl.com/435j7 in
case email clobbered the long URL) is the most recent presentation.
Around slide 7 is where you'll find content.
 
Here's a snippet to give you the "DFDL" idea:
 
E.g., a description of a million element array of little endian double
floats would be this XSD:
 
<sequence>
   <element name="data" type="double" minOccurs="1000000"
maxOccurs="1000000">
       <annotation><appinfo source="http://dfdl.org">
            <representation repType="binary" byteOrder="littleEndian"/>
       </appinfo></annotation>
   </element>
</sequence>
 
Several people and companies have been exploring this notion, so we're
trying to standardize it.
 
I feel DFDL differs from binaryXML in being descriptive of format rather
than prescriptive of format. This matters why?
 
1) Legacy data formats - Much of the complexity of DFDL comes from the
need to handle quite complex legacy formats which are tricky to
describe.
 
2) New data formats, but you need random-access I/O capabilities or the
ability to memory map the files into some exact memory layout with all
the alignments and inter-item offsets exactly specified.
 
3) you don't want to bother to have to use any particular XML-oriented
library to write out your data. So long as the data format is
describable in DFDL you can do your I/O with ordinary I/O operations. In
other words minimal investment has to be made up front in worrying about
data format and data interchange issues. DFDL lets you "just get on with
it". 
 
If none of these 3 apply, then either XML or binaryXML *should* be the
right thing depending on your data size and performance needs. If you
are just after efficiency and density then DFDL may be less effective
for you since a DFDL-described data file isn't necessarily nicely
self-contained like an XML or binaryXML file should be. (Though we do
have a placeholder on the issue of how to associate DFDL descriptors
tightly to binary data so they can't get separated.)
 
 
...mikeb
Mike Beckerle
co-chair DFDL WG, GGF
 
 
 
 

  _____  

	From: Cutler, Roger (RogerCutler)
[mailto:RogerCutler@chevrontexaco.com] 
	Sent: Monday, November 22, 2004 5:54 PM
	To: Aleksander Slominski; Stephen D. Williams
	Cc: Wolfgang Hoschek; xml-dev@lists.xml.org;
public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju
	Subject: RE: use cases: binary XML for scientifc computing
	
	
	If you are going to be looking at how this stuff fits in with
grid computing, perhaps it would be worthwhile also to make some
comments about DFDL?  I posted this suggestion previously (11/1) and
nobody seems to have picked up on it, so maybe the thought is not
appropriate for some reason, but at first glance DFDL does seem related
to me.
	 
	-----Original Message-----
	From: public-xml-binary-request@w3.org
[mailto:public-xml-binary-request@w3.org] On Behalf Of Aleksander
Slominski
	Sent: Monday, November 22, 2004 4:41 PM
	To: Stephen D. Williams
	Cc: Wolfgang Hoschek; xml-dev@lists.xml.org;
public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju
	Subject: use cases: binary XML for scientifc computing
	
	
	Stephen D. Williams wrote: 




			what are use cases for nux: what do you plan to
use it for? 
			
			are use cases related to XML Binary
Characterization <http://www.w3.org/TR/xbc-use-cases/>
<http://www.w3.org/TR/xbc-use-cases/> ? 
			
			i am a bit disappointed that scientific
requirements are completely omitted form XBC use cases - the closest i
could find is http://www.w3.org/TR/xbc-use-cases/#FPenergy but it skips
over whole issue how to transfer array of doubles without changing
endianess ... 
			


		I have proposed to the group recently that I create one
or more use cases that cover supercomputing, grid processing, and sensor
networks. 
		

	great to hear this. i think we worked in all those areas -it
seems XML became very popular and now wit convergence on Grid Web
Services having efficient binary XML format that can be used between
"optimized" peers seems to be very important ...
	

		Your observation seems to validate that point.  I would
be happy to incorporate anything you could provide.  My company builds
and maintains Linux supercomputers and I have present and past
experience with grid-like processing, so I have some resources and
contacts. 
		
		

			we did lot of work in past related to XML
performance (in Indiana University and Binghamton) and are very
concerned that whatever binary XML will be characterized/standardized in
W3C will be of no much use for scientific computing and grids ... 
			


		Could you provide links or details to any of this work?


	we worked on SOAP parsing and optimization for scientific
computing:
	
	Madhusudhan Govindaraju, Aleksander Slominski, Venkatash
Choppella, Randall Bramley, and Dennis Gannon. Requirements for and
evaluation of RMI protocols for scientific computing
<http://www.extreme.indiana.edu/xgws/papers/sc00_paper/> . In
Proceedings of SC00 Conference, Dallas TX, Nov 2000. Available on CD-ROM
from IEEE
	Kenneth Chiu, Madhusudhan Govindaraju, and Randall Bramley.
Investigating the limits of SOAP performance for scientific computing
<http://www.computer.org/proceedings/hpdc/1686/16860246abs.htm> . In The
11-th IEEE International Symposium on High Performance Distributed
Computing HPDC-11 2002 (HPDC'02), Jul 2002.
	Madhusudhan Govindaraju, Aleksander Slominski, Kenneth Chiu, Pu
Liu, Robert van Engelen, and Michael J. Lewis. Toward Characterizing the
Performance of SOAP Toolkits
<http://www.extreme.indiana.edu/xgws/papers/soap_perf_char_grid2004.pdf>
. In 5th IEEE/ACM International Workshop on Grid Computing, November
2004
	Kenneth Chiu and Wei Lu. A Compiler-Based Approach to
Schema-Specific XML Parsing
<http://wam.inrialpes.fr/www-workshop2004/ChiuLu.pdf> . In First
International Worksop on High Performance XML Processing(Satellite of
WWW2004), May 2004.
	Kenneth Chiu. XBS: A Streaming Binary Serializer for High
Performance Computing. In Proceedings of the High Performance Computing
Symposium 2004. Society for Computer Simulation International, 2004
	
	however we never got enough forward momentum to come up with a
proposal for binary XML but still we are willing to work to get use
cases described.
	

		How do you think that XML, espeically a binary
characterized XML, should related to HDF5? 
		

	HDF5 looks to me like a separate problem as it defines its own
schema for its own representation so that is a big task how to make HDF5
to XML Infoset.
	
	we are more interested in how to transfer scientific data
(mostly arrays of primitive types or simple structs with primitive types
that can be perfectly well expressed in XML Infoset but are also
extremely inefficient including dreaded IEEE float conversion to string
and back) and make it consistent with XML messaging (such as SOAP).
	
	thanks,
	
	alek
	
	-- 
	The best way to predict the future is to invent it - Alan Kay
Received on Monday, 22 November 2004 23:49:20 UTC