- From: <mike.beckerle@ascentialsoftware.com>
- Date: Mon, 22 Nov 2004 18:31:25 -0500
- To: RogerCutler@chevrontexaco.com, aslom@cs.indiana.edu, sdw@lig.net
- Cc: whoschek@lbl.gov, xml-dev@lists.xml.org, public-xml-binary@w3.org, kchiu@cs.binghamton.edu, mgovinda@cs.binghamton.edu
- Message-ID: <0F2A05A54977F248982FF81EA9578CDB0F65E2A4@ASC-MSG-02.ascential.com>
I do believe that GGF DFDL is relevant to the discussion here. https://forge.gridforum.org/projects/dfdl-wg/ <https://forge.gridforum.org/projects/dfdl-wg/> is the site, and https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113 <https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&categor y_id=803&document_content_id=2973> &category_id=803&document_content_id=2973 (or http://tinyurl.com/435j7 <http://tinyurl.com/435j7> in case email clobbered the long URL) is the most recent presentation. Around slide 7 is where you'll find content. Here's a snippet to give you the "DFDL" idea: E.g., a description of a million element array of little endian double floats would be this XSD: <sequence> <element name="data" type="double" minOccurs="1000000" maxOccurs="1000000"> <annotation><appinfo source="http://dfdl.org"> <representation repType="binary" byteOrder="littleEndian"/> </appinfo></annotation> </element> </sequence> Several people and companies have been exploring this notion, so we're trying to standardize it. I feel DFDL differs from binaryXML in being descriptive of format rather than prescriptive of format. This matters why? 1) Legacy data formats - Much of the complexity of DFDL comes from the need to handle quite complex legacy formats which are tricky to describe. 2) New data formats, but you need random-access I/O capabilities or the ability to memory map the files into some exact memory layout with all the alignments and inter-item offsets exactly specified. 3) you don't want to bother to have to use any particular XML-oriented library to write out your data. So long as the data format is describable in DFDL you can do your I/O with ordinary I/O operations. In other words minimal investment has to be made up front in worrying about data format and data interchange issues. DFDL lets you "just get on with it". If none of these 3 apply, then either XML or binaryXML *should* be the right thing depending on your data size and performance needs. If you are just after efficiency and density then DFDL may be less effective for you since a DFDL-described data file isn't necessarily nicely self-contained like an XML or binaryXML file should be. (Though we do have a placeholder on the issue of how to associate DFDL descriptors tightly to binary data so they can't get separated.) ...mikeb Mike Beckerle co-chair DFDL WG, GGF _____ From: Cutler, Roger (RogerCutler) [mailto:RogerCutler@chevrontexaco.com] Sent: Monday, November 22, 2004 5:54 PM To: Aleksander Slominski; Stephen D. Williams Cc: Wolfgang Hoschek; xml-dev@lists.xml.org; public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju Subject: RE: use cases: binary XML for scientifc computing If you are going to be looking at how this stuff fits in with grid computing, perhaps it would be worthwhile also to make some comments about DFDL? I posted this suggestion previously (11/1) and nobody seems to have picked up on it, so maybe the thought is not appropriate for some reason, but at first glance DFDL does seem related to me. -----Original Message----- From: public-xml-binary-request@w3.org [mailto:public-xml-binary-request@w3.org] On Behalf Of Aleksander Slominski Sent: Monday, November 22, 2004 4:41 PM To: Stephen D. Williams Cc: Wolfgang Hoschek; xml-dev@lists.xml.org; public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju Subject: use cases: binary XML for scientifc computing Stephen D. Williams wrote: what are use cases for nux: what do you plan to use it for? are use cases related to XML Binary Characterization <http://www.w3.org/TR/xbc-use-cases/> <http://www.w3.org/TR/xbc-use-cases/>? i am a bit disappointed that scientific requirements are completely omitted form XBC use cases - the closest i could find is http://www.w3.org/TR/xbc-use-cases/#FPenergy <http://www.w3.org/TR/xbc-use-cases/#FPenergy> but it skips over whole issue how to transfer array of doubles without changing endianess ... I have proposed to the group recently that I create one or more use cases that cover supercomputing, grid processing, and sensor networks. great to hear this. i think we worked in all those areas -it seems XML became very popular and now wit convergence on Grid Web Services having efficient binary XML format that can be used between "optimized" peers seems to be very important ... Your observation seems to validate that point. I would be happy to incorporate anything you could provide. My company builds and maintains Linux supercomputers and I have present and past experience with grid-like processing, so I have some resources and contacts. we did lot of work in past related to XML performance (in Indiana University and Binghamton) and are very concerned that whatever binary XML will be characterized/standardized in W3C will be of no much use for scientific computing and grids ... Could you provide links or details to any of this work? we worked on SOAP parsing and optimization for scientific computing: XML_RMI_1Madhusudhan Govindaraju, Aleksander Slominski, Venkatash Choppella, Randall Bramley, and Dennis Gannon. Requirements <http://www.extreme.indiana.edu/xgws/papers/sc00_paper/> for and evaluation of RMI protocols for scientific computing. In Proceedings of SC00 Conference, Dallas TX, Nov 2000. Available on CD-ROM from IEEESoapPerf Kenneth Chiu, Madhusudhan Govindaraju, and Randall Bramley. Investigating <http://www.computer.org/proceedings/hpdc/1686/16860246abs.htm> the limits of SOAP performance for scientific computing. In The 11-th IEEE International Symposium on High Performance Distributed Computing HPDC-11 2002 (HPDC'02), Jul 2002.chiu_xbs aslom_soapPerfChar_grid2004Madhusudhan Govindaraju, Aleksander Slominski, Kenneth Chiu, Pu Liu, Robert van Engelen, and Michael J. Lewis. Toward <http://www.extreme.indiana.edu/xgws/papers/soap_perf_char_grid2004.pdf> Characterizing the Performance of SOAP Toolkits. In 5th IEEE/ACM International Workshop on Grid Computing, November 2004 chiu_sspKenneth Chiu and Wei Lu. A Compiler-Based <http://wam.inrialpes.fr/www-workshop2004/ChiuLu.pdf> Approach to Schema-Specific XML Parsing. In First International Worksop on High Performance XML Processing(Satellite of WWW2004), May 2004. chiu_xbsKenneth Chiu. XBS: A Streaming Binary Serializer for High Performance Computing. In Proceedings of the High Performance Computing Symposium 2004. Society for Computer Simulation International, 2004 however we never got enough forward momentum to come up with a proposal for binary XML but still we are willing to work to get use cases described. How do you think that XML, espeically a binary characterized XML, should related to HDF5? HDF5 looks to me like a separate problem as it defines its own schema for its own representation so that is a big task how to make HDF5 to XML Infoset. we are more interested in how to transfer scientific data (mostly arrays of primitive types or simple structs with primitive types that can be perfectly well expressed in XML Infoset but are also extremely inefficient including dreaded IEEE float conversion to string and back) and make it consistent with XML messaging (such as SOAP). thanks, alek -- The best way to predict the future is to invent it - Alan Kay
Received on Monday, 22 November 2004 23:32:36 UTC