W3C home > Mailing lists > Public > public-xml-binary@w3.org > November 2004

RE: use cases: binary XML for scientifc computing

From: <mike.beckerle@ascentialsoftware.com>
Date: Mon, 22 Nov 2004 18:31:25 -0500
Message-ID: <0F2A05A54977F248982FF81EA9578CDB0F65E2A4@ASC-MSG-02.ascential.com>
To: RogerCutler@chevrontexaco.com, aslom@cs.indiana.edu, sdw@lig.net
Cc: whoschek@lbl.gov, xml-dev@lists.xml.org, public-xml-binary@w3.org, kchiu@cs.binghamton.edu, mgovinda@cs.binghamton.edu
I do believe that GGF DFDL is relevant to the discussion here.
https://forge.gridforum.org/projects/dfdl-wg/
<https://forge.gridforum.org/projects/dfdl-wg/>  is the site, and 
https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113
<https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&categor
y_id=803&document_content_id=2973> &category_id=803&document_content_id=2973
(or http://tinyurl.com/435j7 <http://tinyurl.com/435j7>  in case email
clobbered the long URL) is the most recent presentation. Around slide 7 is
where you'll find content.
 
Here's a snippet to give you the "DFDL" idea:
 
E.g., a description of a million element array of little endian double
floats would be this XSD:
 
<sequence>
   <element name="data" type="double" minOccurs="1000000"
maxOccurs="1000000">
       <annotation><appinfo source="http://dfdl.org">
            <representation repType="binary" byteOrder="littleEndian"/>
       </appinfo></annotation>
   </element>
</sequence>
 
Several people and companies have been exploring this notion, so we're
trying to standardize it.
 
I feel DFDL differs from binaryXML in being descriptive of format rather
than prescriptive of format. This matters why?
 
1) Legacy data formats - Much of the complexity of DFDL comes from the need
to handle quite complex legacy formats which are tricky to describe.
 
2) New data formats, but you need random-access I/O capabilities or the
ability to memory map the files into some exact memory layout with all the
alignments and inter-item offsets exactly specified.
 
3) you don't want to bother to have to use any particular XML-oriented
library to write out your data. So long as the data format is describable in
DFDL you can do your I/O with ordinary I/O operations. In other words
minimal investment has to be made up front in worrying about data format and
data interchange issues. DFDL lets you "just get on with it". 
 
If none of these 3 apply, then either XML or binaryXML *should* be the right
thing depending on your data size and performance needs. If you are just
after efficiency and density then DFDL may be less effective for you since a
DFDL-described data file isn't necessarily nicely self-contained like an XML
or binaryXML file should be. (Though we do have a placeholder on the issue
of how to associate DFDL descriptors tightly to binary data so they can't
get separated.)
 
 
...mikeb
Mike Beckerle
co-chair DFDL WG, GGF
 
 
 
 

  _____  

From: Cutler, Roger (RogerCutler) [mailto:RogerCutler@chevrontexaco.com] 
Sent: Monday, November 22, 2004 5:54 PM
To: Aleksander Slominski; Stephen D. Williams
Cc: Wolfgang Hoschek; xml-dev@lists.xml.org; public-xml-binary@w3.org;
Kenneth Chiu; Madhusudhan Govindaraju
Subject: RE: use cases: binary XML for scientifc computing


If you are going to be looking at how this stuff fits in with grid
computing, perhaps it would be worthwhile also to make some comments about
DFDL?  I posted this suggestion previously (11/1) and nobody seems to have
picked up on it, so maybe the thought is not appropriate for some reason,
but at first glance DFDL does seem related to me.
 
-----Original Message-----
From: public-xml-binary-request@w3.org
[mailto:public-xml-binary-request@w3.org] On Behalf Of Aleksander Slominski
Sent: Monday, November 22, 2004 4:41 PM
To: Stephen D. Williams
Cc: Wolfgang Hoschek; xml-dev@lists.xml.org; public-xml-binary@w3.org;
Kenneth Chiu; Madhusudhan Govindaraju
Subject: use cases: binary XML for scientifc computing


Stephen D. Williams wrote: 




what are use cases for nux: what do you plan to use it for? 

are use cases related to XML Binary Characterization
<http://www.w3.org/TR/xbc-use-cases/> <http://www.w3.org/TR/xbc-use-cases/>?


i am a bit disappointed that scientific requirements are completely omitted
form XBC use cases - the closest i could find is
http://www.w3.org/TR/xbc-use-cases/#FPenergy
<http://www.w3.org/TR/xbc-use-cases/#FPenergy>  but it skips over whole
issue how to transfer array of doubles without changing endianess ... 



I have proposed to the group recently that I create one or more use cases
that cover supercomputing, grid processing, and sensor networks. 


great to hear this. i think we worked in all those areas -it seems XML
became very popular and now wit convergence on Grid Web Services having
efficient binary XML format that can be used between "optimized" peers seems
to be very important ...


Your observation seems to validate that point.  I would be happy to
incorporate anything you could provide.  My company builds and maintains
Linux supercomputers and I have present and past experience with grid-like
processing, so I have some resources and contacts. 



we did lot of work in past related to XML performance (in Indiana University
and Binghamton) and are very concerned that whatever binary XML will be
characterized/standardized in W3C will be of no much use for scientific
computing and grids ... 



Could you provide links or details to any of this work?  

we worked on SOAP parsing and optimization for scientific computing:

XML_RMI_1Madhusudhan Govindaraju, Aleksander Slominski, Venkatash Choppella,
Randall Bramley, and Dennis Gannon. Requirements
<http://www.extreme.indiana.edu/xgws/papers/sc00_paper/> for and evaluation
of RMI protocols for scientific computing. In Proceedings of SC00
Conference, Dallas TX, Nov 2000. Available on CD-ROM from IEEESoapPerf
Kenneth Chiu, Madhusudhan Govindaraju, and Randall Bramley. Investigating
<http://www.computer.org/proceedings/hpdc/1686/16860246abs.htm> the limits
of SOAP performance for scientific computing. In The 11-th IEEE
International Symposium on High Performance Distributed Computing HPDC-11
2002 (HPDC'02), Jul 2002.chiu_xbs
aslom_soapPerfChar_grid2004Madhusudhan Govindaraju, Aleksander Slominski,
Kenneth Chiu, Pu Liu, Robert van Engelen, and Michael J. Lewis. Toward
<http://www.extreme.indiana.edu/xgws/papers/soap_perf_char_grid2004.pdf>
Characterizing the Performance of SOAP Toolkits. In 5th IEEE/ACM
International Workshop on Grid Computing, November 2004
chiu_sspKenneth Chiu and Wei Lu. A Compiler-Based
<http://wam.inrialpes.fr/www-workshop2004/ChiuLu.pdf> Approach to
Schema-Specific XML Parsing. In First International Worksop on High
Performance XML Processing(Satellite of WWW2004), May 2004.
chiu_xbsKenneth Chiu. XBS: A Streaming Binary Serializer for High
Performance Computing. In Proceedings of the High Performance Computing
Symposium 2004. Society for Computer Simulation International, 2004

however we never got enough forward momentum to come up with a proposal for
binary XML but still we are willing to work to get use cases described.


How do you think that XML, espeically a binary characterized XML, should
related to HDF5? 


HDF5 looks to me like a separate problem as it defines its own schema for
its own representation so that is a big task how to make HDF5 to XML
Infoset.

we are more interested in how to transfer scientific data (mostly arrays of
primitive types or simple structs with primitive types that can be perfectly
well expressed in XML Infoset but are also extremely inefficient including
dreaded IEEE float conversion to string and back) and make it consistent
with XML messaging (such as SOAP).

thanks,

alek

-- 

The best way to predict the future is to invent it - Alan Kay
Received on Monday, 22 November 2004 23:32:36 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.30 : Thursday, 1 December 2005 00:07:42 GMT