Re: use cases: binary XML for scientifc computing from Stephen D. Williams on 2004-11-23 (public-xml-binary@w3.org from November 2004)

From: Stephen D. Williams <sdw@lig.net>
Date: Mon, 22 Nov 2004 20:02:50 -0500
To: mike.beckerle@ascentialsoftware.com
Cc: RogerCutler@chevrontexaco.com, aslom@cs.indiana.edu, whoschek@lbl.gov, xml-dev@lists.xml.org, public-xml-binary@w3.org, kchiu@cs.binghamton.edu, mgovinda@cs.binghamton.edu
Message-ID: <41A28C3A.2010303@lig.net>
This is interesting and relevant to the discussion of binary payloads of 
scalar data.

In my opinion, the first level for "binary XML" (i.e., a new more 
efficient XML-like format) is to improve the efficiency of the 
structure.  The second level can involve exactly the ideas apparent 
below in the description of DFDL.  It is debatable whether a format spec 
would include definition of the binary data, standard types, and 
built-in type notation in a self-contained way, but if it's not in the 
spec for "binary XML", then it would be layered on top just as the 
schema specs are now layered.

A format could be able to contain all standard scalar formats and have 
an efficient MIME-like way to note the types but that any schema 
language should be a separate specification.  A format could support 
text, labeled scalar, and opaque binary formatted data with choice and 
placement of metadata controlled by the application.  The DFDL work 
could support both labeled self-described and opaque methods, with the 
former supporting self-contained instances and the latter being normally 
more efficient by having metadata out of band.

sdw

mike.beckerle@ascentialsoftware.com wrote:

> I do believe that GGF DFDL is relevant to the discussion here. 
> https://forge.gridforum.org/projects/dfdl-wg/ is the site, and
> https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973 
> <https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973> (or 
> http://tinyurl.com/435j7 in case email clobbered the long URL) is the 
> most recent presentation. Around slide 7 is where you'll find content.
>  
> Here's a snippet to give you the "DFDL" idea:
>  
> E.g., a description of a million element array of little endian double 
> floats would be this XSD:
>  
> <sequence>
>    <element name="data" type="double" minOccurs="1000000" 
> maxOccurs="1000000">
>        <annotation><appinfo source="http://dfdl.org">
>             <representation repType="binary" byteOrder="littleEndian"/>
>        </appinfo></annotation>
>    </element>
> </sequence>
>  
> Several people and companies have been exploring this notion, so we're 
> trying to standardize it.
>  
> I feel DFDL differs from binaryXML in being descriptive of format 
> rather than prescriptive of format. This matters why?
>  
> 1) Legacy data formats - Much of the complexity of DFDL comes from the 
> need to handle quite complex legacy formats which are tricky to describe.
>  
> 2) New data formats, but you need random-access I/O capabilities or 
> the ability to memory map the files into some exact memory layout with 
> all the alignments and inter-item offsets exactly specified.
>  
> 3) you don't want to bother to have to use any particular XML-oriented 
> library to write out your data. So long as the data format is 
> describable in DFDL you can do your I/O with ordinary I/O operations. 
> In other words minimal investment has to be made up front in worrying 
> about data format and data interchange issues. DFDL lets you "just get 
> on with it".
>  
> If none of these 3 apply, then either XML or binaryXML *should* be the 
> right thing depending on your data size and performance needs. If you 
> are just after efficiency and density then DFDL may be less effective 
> for you since a DFDL-described data file isn't necessarily nicely 
> self-contained like an XML or binaryXML file should be. (Though we do 
> have a placeholder on the issue of how to associate DFDL descriptors 
> tightly to binary data so they can't get separated.)
>  
>  
> ...mikeb
> Mike Beckerle
> co-chair DFDL WG, GGF
>  
>  
>  
>  
>
>     ------------------------------------------------------------------------
>     *From:* Cutler, Roger (RogerCutler)
>     [mailto:RogerCutler@chevrontexaco.com]
>     *Sent:* Monday, November 22, 2004 5:54 PM
>     *To:* Aleksander Slominski; Stephen D. Williams
>     *Cc:* Wolfgang Hoschek; xml-dev@lists.xml.org;
>     public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju
>     *Subject:* RE: use cases: binary XML for scientifc computing
>
>     If you are going to be looking at how this stuff fits in with grid
>     computing, perhaps it would be worthwhile also to make some
>     comments about DFDL?  I posted this suggestion previously (11/1)
>     and nobody seems to have picked up on it, so maybe the thought is
>     not appropriate for some reason, but at first glance DFDL does
>     seem related to me.
>      
>     -----Original Message-----
>     *From:* public-xml-binary-request@w3.org
>     [mailto:public-xml-binary-request@w3.org] *On Behalf Of
>     *Aleksander Slominski
>     *Sent:* Monday, November 22, 2004 4:41 PM
>     *To:* Stephen D. Williams
>     *Cc:* Wolfgang Hoschek; xml-dev@lists.xml.org;
>     public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju
>     *Subject:* use cases: binary XML for scientifc computing
>
>     Stephen D. Williams wrote:
>
>>
>>
>>>
>>>     what are use cases for nux: what do you plan to use it for?
>>>
>>>     are use cases related to XML Binary Characterization
>>>     <http://www.w3.org/TR/xbc-use-cases/>?
>>>
>>>     i am a bit disappointed that scientific requirements are
>>>     completely omitted form XBC use cases - the closest i could find
>>>     is http://www.w3.org/TR/xbc-use-cases/#FPenergy but it skips
>>>     over whole issue how to transfer array of doubles without
>>>     changing endianess ...
>>
>>
>>     I have proposed to the group recently that I create one or more
>>     use cases that cover supercomputing, grid processing, and sensor
>>     networks.
>
>     great to hear this. i think we worked in all those areas -it seems
>     XML became very popular and now wit convergence on Grid Web
>     Services having efficient binary XML format that can be used
>     between "optimized" peers seems to be very important ...
>
>>     Your observation seems to validate that point.  I would be happy
>>     to incorporate anything you could provide.  My company builds and
>>     maintains Linux supercomputers and I have present and past
>>     experience with grid-like processing, so I have some resources
>>     and contacts.
>>
>>>     we did lot of work in past related to XML performance (in
>>>     Indiana University and Binghamton) and are very concerned that
>>>     whatever binary XML will be characterized/standardized in W3C
>>>     will be of no much use for scientific computing and grids ...
>>
>>
>>     Could you provide links or details to any of this work?  
>
>     we worked on SOAP parsing and optimization for scientific computing:
>
>     Madhusudhan Govindaraju, Aleksander Slominski, Venkatash
>     Choppella, Randall Bramley, and Dennis Gannon. Requirements for
>     and evaluation of RMI protocols for scientific computing
>     <http://www.extreme.indiana.edu/xgws/papers/sc00_paper/>. In
>     Proceedings of SC00 Conference, Dallas TX, Nov 2000. Available on
>     CD-ROM from IEEE
>     Kenneth Chiu, Madhusudhan Govindaraju, and Randall Bramley.
>     Investigating the limits of SOAP performance for scientific
>     computing
>     <http://www.computer.org/proceedings/hpdc/1686/16860246abs.htm>.
>     In The 11-th IEEE International Symposium on High Performance
>     Distributed Computing HPDC-11 2002 (HPDC'02), Jul 2002.
>     Madhusudhan Govindaraju, Aleksander Slominski, Kenneth Chiu,
>     Pu Liu, Robert van Engelen, and Michael J. Lewis. Toward
>     Characterizing the Performance of SOAP Toolkits
>     <http://www.extreme.indiana.edu/xgws/papers/soap_perf_char_grid2004.pdf>.
>     In 5th IEEE/ACM International Workshop on Grid Computing, November
>     2004
>     Kenneth Chiu and Wei Lu. A Compiler-Based Approach to
>     Schema-Specific XML Parsing
>     <http://wam.inrialpes.fr/www-workshop2004/ChiuLu.pdf>. In First
>     International Worksop on High Performance XML Processing(Satellite
>     of WWW2004), May 2004.
>     Kenneth Chiu. XBS: A Streaming Binary Serializer for High
>     Performance Computing. In Proceedings of the High Performance
>     Computing Symposium 2004. Society for Computer Simulation
>     International, 2004
>
>     however we never got enough forward momentum to come up with a
>     proposal for binary XML but still we are willing to work to get
>     use cases described.
>
>>     How do you think that XML, espeically a binary characterized XML,
>>     should related to HDF5?
>
>     HDF5 looks to me like a separate problem as it defines its own
>     schema for its own representation so that is a big task how to
>     make HDF5 to XML Infoset.
>
>     we are more interested in how to transfer scientific data (mostly
>     arrays of primitive types or simple structs with primitive types
>     that can be perfectly well expressed in XML Infoset but are also
>     extremely inefficient including dreaded IEEE float conversion to
>     string and back) and make it consistent with XML messaging (such
>     as SOAP).
>
>     thanks,
>
>     alek
>
>-- 
>The best way to predict the future is to invent it - Alan Kay
>    
>


-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
Received on Tuesday, 23 November 2004 01:04:40 UTC