Re: use cases: binary XML for scientifc computing from Stephen D. Williams on 2004-11-23 (public-xml-binary@w3.org from November 2004)

From: Stephen D. Williams <sdw@lig.net>
Date: Tue, 23 Nov 2004 13:17:28 -0500
To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevrontexaco.com>
Cc: mike.beckerle@ascentialsoftware.com, aslom@cs.indiana.edu, whoschek@lbl.gov, public-xml-binary@w3.org, kchiu@cs.binghamton.edu, mgovinda@cs.binghamton.edu
Message-ID: <41A37EB8.90103@lig.net>
It's not different, it IS what I meant by option 3.  If the proponents 
of a format decide that it is necessary to support binary scalars using 
option 3, then a fairly complete initial set of scalar types would have 
to be chosen and supported by any library supporting that format in a 
conforming way.

This seems to translate to two issues: what are the initial scalar 
formats and how big and slow would the necessary conversion code be on 
various platforms?  It doesn't seem that the conversion code would need 
to be very large, depending on how exotic the types were.

Ints are mostly easy, floating point, decimal, "numeric", and all of the 
variations are more interesting.  IEEE floats/doubles are a baseline, 
but there are good reasons for other options for some uses.  It doesn't 
make sense to include every or any data compression trick from audio, 
video, and imaging problems, but it could make sense to have some 
commonly needed encoding solutions to bulk data such as run length and 
array differential encoding.

There is a lot of existing work covering standardized binary interchange 
in various ways, but I'm not sure it is as complete as it should be for 
these purposes.  As I said, it is my opinion that completing the detail 
needed for binary typing is separate from solving the structure problem.

sdw

Cutler, Roger (RogerCutler) wrote:

> I was kind of hoping for 3 but for the parser to take care of it.  
> That is, for Microsoft, IBM, Sun or whoever to do the grunt work, 
> based on complying with some spec, rather than repeating the 
> programming task N times at N places that use the data.  I think 
> that's a little different from any of your options, isn't it?
>
> ------------------------------------------------------------------------
> *From:* Stephen D. Williams [mailto:sdw@lig.net]
> *Sent:* Monday, November 22, 2004 8:38 PM
> *To:* Cutler, Roger (RogerCutler)
> *Cc:* mike.beckerle@ascentialsoftware.com; aslom@cs.indiana.edu; 
> whoschek@lbl.gov; xml-dev@lists.xml.org; public-xml-binary@w3.org; 
> kchiu@cs.binghamton.edu; mgovinda@cs.binghamton.edu
> *Subject:* Re: use cases: binary XML for scientifc computing
>
> This is the whole problem of binary scalars: there are several 
> existing formats and more are obviously possible in the future.
> The arguments related to binary scalars include:
>
>    1. It's an open-ended mess, just use character representation
>    2. Choosing one standard method ("network byte order") is the way
>       to go
>    3. Choose the best 'local' method which is great for homogeneity
>       and 'reader makes right' doable by dissimilar communicators.
>    4. A new custom binary format is appropriate to the application
>       (such as Oracle's internal Number format which has interesting
>       properties).
>
> Option 3 seems to have the most backing for those who are willing to 
> work past option 1.  This would require that a full implementation of 
> a format, or a layer above it, be able to convert from any "generally 
> accepted" scalar to the local version.  Converting to any "generally 
> accepted" format could be optional, but useful.
>
> This problem of one application, directly or indirectly, choosing a 
> particular format that differs from the reading application also 
> occurs at the character encoding level.  The solution of being able to 
> convert at the receiver and optionally at the sender seems reasonable. 
>
> The remaining problem then is the ability to integrate newly invented 
> scalar representations, but this seems to be a minor issue currently.
>
> sdw
>
> Cutler, Roger (RogerCutler) wrote:
>
>> Seems to me that DFDL fits the floating point data usage case I 
>> contributed very well -- except maybe for the desire for somebody 
>> else to handle the floating point conversion issues between 
>> platforms.  That is, "just get on with it" when your IO libraries 
>> expect different float structures than that of the source of the data 
>> can be a bit painful.  My contacts really don't writing that low 
>> level code over and over, with of course the potential for getting it 
>> a bit wrong somehow each time.
>>  
>> -----Original Message-----
>> *From:* mike.beckerle@ascentialsoftware.com 
>> [mailto:mike.beckerle@ascentialsoftware.com]
>> *Sent:* Monday, November 22, 2004 5:31 PM
>> *To:* Cutler, Roger (RogerCutler); aslom@cs.indiana.edu; sdw@lig.net
>> *Cc:* whoschek@lbl.gov; xml-dev@lists.xml.org; 
>> public-xml-binary@w3.org; kchiu@cs.binghamton.edu; 
>> mgovinda@cs.binghamton.edu
>> *Subject:* RE: use cases: binary XML for scientifc computing
>>
>> I do believe that GGF DFDL is relevant to the discussion here. 
>> https://forge.gridforum.org/projects/dfdl-wg/ is the site, and
>> https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973 
>> <https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973> (or 
>> http://tinyurl.com/435j7 in case email clobbered the long URL) is the 
>> most recent presentation. Around slide 7 is where you'll find content.
>>  
>> Here's a snippet to give you the "DFDL" idea:
>
> ...
>
>-- 
>swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
>Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
>  
>


-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
Received on Tuesday, 23 November 2004 18:16:07 UTC