- From: Stephen D. Williams <sdw@lig.net>
- Date: Mon, 22 Nov 2004 20:02:50 -0500
- To: mike.beckerle@ascentialsoftware.com
- Cc: RogerCutler@chevrontexaco.com, aslom@cs.indiana.edu, whoschek@lbl.gov, xml-dev@lists.xml.org, public-xml-binary@w3.org, kchiu@cs.binghamton.edu, mgovinda@cs.binghamton.edu
This is interesting and relevant to the discussion of binary payloads of scalar data. In my opinion, the first level for "binary XML" (i.e., a new more efficient XML-like format) is to improve the efficiency of the structure. The second level can involve exactly the ideas apparent below in the description of DFDL. It is debatable whether a format spec would include definition of the binary data, standard types, and built-in type notation in a self-contained way, but if it's not in the spec for "binary XML", then it would be layered on top just as the schema specs are now layered. A format could be able to contain all standard scalar formats and have an efficient MIME-like way to note the types but that any schema language should be a separate specification. A format could support text, labeled scalar, and opaque binary formatted data with choice and placement of metadata controlled by the application. The DFDL work could support both labeled self-described and opaque methods, with the former supporting self-contained instances and the latter being normally more efficient by having metadata out of band. sdw mike.beckerle@ascentialsoftware.com wrote: > I do believe that GGF DFDL is relevant to the discussion here. > https://forge.gridforum.org/projects/dfdl-wg/ is the site, and > https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973 > <https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973> (or > http://tinyurl.com/435j7 in case email clobbered the long URL) is the > most recent presentation. Around slide 7 is where you'll find content. > > Here's a snippet to give you the "DFDL" idea: > > E.g., a description of a million element array of little endian double > floats would be this XSD: > > <sequence> > <element name="data" type="double" minOccurs="1000000" > maxOccurs="1000000"> > <annotation><appinfo source="http://dfdl.org"> > <representation repType="binary" byteOrder="littleEndian"/> > </appinfo></annotation> > </element> > </sequence> > > Several people and companies have been exploring this notion, so we're > trying to standardize it. > > I feel DFDL differs from binaryXML in being descriptive of format > rather than prescriptive of format. This matters why? > > 1) Legacy data formats - Much of the complexity of DFDL comes from the > need to handle quite complex legacy formats which are tricky to describe. > > 2) New data formats, but you need random-access I/O capabilities or > the ability to memory map the files into some exact memory layout with > all the alignments and inter-item offsets exactly specified. > > 3) you don't want to bother to have to use any particular XML-oriented > library to write out your data. So long as the data format is > describable in DFDL you can do your I/O with ordinary I/O operations. > In other words minimal investment has to be made up front in worrying > about data format and data interchange issues. DFDL lets you "just get > on with it". > > If none of these 3 apply, then either XML or binaryXML *should* be the > right thing depending on your data size and performance needs. If you > are just after efficiency and density then DFDL may be less effective > for you since a DFDL-described data file isn't necessarily nicely > self-contained like an XML or binaryXML file should be. (Though we do > have a placeholder on the issue of how to associate DFDL descriptors > tightly to binary data so they can't get separated.) > > > ...mikeb > Mike Beckerle > co-chair DFDL WG, GGF > > > > > > ------------------------------------------------------------------------ > *From:* Cutler, Roger (RogerCutler) > [mailto:RogerCutler@chevrontexaco.com] > *Sent:* Monday, November 22, 2004 5:54 PM > *To:* Aleksander Slominski; Stephen D. Williams > *Cc:* Wolfgang Hoschek; xml-dev@lists.xml.org; > public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju > *Subject:* RE: use cases: binary XML for scientifc computing > > If you are going to be looking at how this stuff fits in with grid > computing, perhaps it would be worthwhile also to make some > comments about DFDL? I posted this suggestion previously (11/1) > and nobody seems to have picked up on it, so maybe the thought is > not appropriate for some reason, but at first glance DFDL does > seem related to me. > > -----Original Message----- > *From:* public-xml-binary-request@w3.org > [mailto:public-xml-binary-request@w3.org] *On Behalf Of > *Aleksander Slominski > *Sent:* Monday, November 22, 2004 4:41 PM > *To:* Stephen D. Williams > *Cc:* Wolfgang Hoschek; xml-dev@lists.xml.org; > public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju > *Subject:* use cases: binary XML for scientifc computing > > Stephen D. Williams wrote: > >> >> >>> >>> what are use cases for nux: what do you plan to use it for? >>> >>> are use cases related to XML Binary Characterization >>> <http://www.w3.org/TR/xbc-use-cases/>? >>> >>> i am a bit disappointed that scientific requirements are >>> completely omitted form XBC use cases - the closest i could find >>> is http://www.w3.org/TR/xbc-use-cases/#FPenergy but it skips >>> over whole issue how to transfer array of doubles without >>> changing endianess ... >> >> >> I have proposed to the group recently that I create one or more >> use cases that cover supercomputing, grid processing, and sensor >> networks. > > great to hear this. i think we worked in all those areas -it seems > XML became very popular and now wit convergence on Grid Web > Services having efficient binary XML format that can be used > between "optimized" peers seems to be very important ... > >> Your observation seems to validate that point. I would be happy >> to incorporate anything you could provide. My company builds and >> maintains Linux supercomputers and I have present and past >> experience with grid-like processing, so I have some resources >> and contacts. >> >>> we did lot of work in past related to XML performance (in >>> Indiana University and Binghamton) and are very concerned that >>> whatever binary XML will be characterized/standardized in W3C >>> will be of no much use for scientific computing and grids ... >> >> >> Could you provide links or details to any of this work? > > we worked on SOAP parsing and optimization for scientific computing: > > Madhusudhan Govindaraju, Aleksander Slominski, Venkatash > Choppella, Randall Bramley, and Dennis Gannon. Requirements for > and evaluation of RMI protocols for scientific computing > <http://www.extreme.indiana.edu/xgws/papers/sc00_paper/>. In > Proceedings of SC00 Conference, Dallas TX, Nov 2000. Available on > CD-ROM from IEEE > Kenneth Chiu, Madhusudhan Govindaraju, and Randall Bramley. > Investigating the limits of SOAP performance for scientific > computing > <http://www.computer.org/proceedings/hpdc/1686/16860246abs.htm>. > In The 11-th IEEE International Symposium on High Performance > Distributed Computing HPDC-11 2002 (HPDC'02), Jul 2002. > Madhusudhan Govindaraju, Aleksander Slominski, Kenneth Chiu, > Pu Liu, Robert van Engelen, and Michael J. Lewis. Toward > Characterizing the Performance of SOAP Toolkits > <http://www.extreme.indiana.edu/xgws/papers/soap_perf_char_grid2004.pdf>. > In 5th IEEE/ACM International Workshop on Grid Computing, November > 2004 > Kenneth Chiu and Wei Lu. A Compiler-Based Approach to > Schema-Specific XML Parsing > <http://wam.inrialpes.fr/www-workshop2004/ChiuLu.pdf>. In First > International Worksop on High Performance XML Processing(Satellite > of WWW2004), May 2004. > Kenneth Chiu. XBS: A Streaming Binary Serializer for High > Performance Computing. In Proceedings of the High Performance > Computing Symposium 2004. Society for Computer Simulation > International, 2004 > > however we never got enough forward momentum to come up with a > proposal for binary XML but still we are willing to work to get > use cases described. > >> How do you think that XML, espeically a binary characterized XML, >> should related to HDF5? > > HDF5 looks to me like a separate problem as it defines its own > schema for its own representation so that is a big task how to > make HDF5 to XML Infoset. > > we are more interested in how to transfer scientific data (mostly > arrays of primitive types or simple structs with primitive types > that can be perfectly well expressed in XML Infoset but are also > extremely inefficient including dreaded IEEE float conversion to > string and back) and make it consistent with XML messaging (such > as SOAP). > > thanks, > > alek > >-- >The best way to predict the future is to invent it - Alan Kay > > -- swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
Received on Tuesday, 23 November 2004 01:04:40 UTC