W3C home > Mailing lists > Public > www-tag@w3.org > April 2005

RE: Dave Orchard individual review of XBC for the TAG

From: Rice, Ed (HP.com) <ed.rice@hp.com>
Date: Tue, 26 Apr 2005 04:40:52 -0700
Message-ID: <7D6953BFA3975C44BD80BA89292FD60E0290E2D5@cacexc08.americas.cpqcorp.net>
To: "David Orchard" <dorchard@bea.com>, <www-tag@w3.org>
I wanted to add to Dave Orchard's response a few additional items at the




<David Orchard's previous comments>


1. Process suggestion

I suggest that a W3C working group, perhaps a rechartered xml binary
characterizations wg, continue the work of providing further information
for a "go/no-go" recommendation.  I do not believe that the TAG should
endorse chartering a working group to produce a Rec track deliverable
for unknown numbers of binary formats at this time.  Further progress
down the path of benchmarking and use case validation is necessary to
justify a Rec track deliverable.   The comments from here on could be
used to assist in writing the charter for a XML Binary Characterizations
"the sequel" WG. 


2. Benchmarks

My position remains the same as articulated in BEA's position paper [1]
for the binary interchange workshop, particularly in the "How To Measure
Candidate Solutions" section, bullet 4.c ("Measurable
benefit(properties) from benchmarks") in the "Recommendations" section,
and expressed further in the workshop.  My position also remains the
same on the importance architectural properties of self-description and
extensibility, also articulated in the BEA paper.  


The Working Group did not provide benchmarks that indicate a high
likelihood that a single format will sufficiently alter the mix of
properties of text xml to be worth standardization at the W3C.  


This may very well be a charter or timing question.  The charter is
somewhat vague in whether benchmarks and property assessment is required
or not, and even if it was, the WG's charter expired.  


3. Properties

Related to the benchmarks, there is no framework for evaluating the
properties of interest.  A survey of properties are well described, such
as compactness, space efficiency, etc.   These are good starting points
for a properties for a benchmark.  But the critical piece of information
that I am looking for a binary xml go/no-go is a hard-nosed approach to
these properties and their trade-offs to support binary xml.  For
example, is one new trade-off point a 3x increase in compactness for an
arbitrary set of documents with a 2x increase in processor time?  Is
there a 10x increase in compactness with 3x increase in processor speed
combination point as well?  


The documents do not provide the property trade-off points. 


4. Parser implementation

Further related to the benchmarks, the central question of whether
improving parser implementation will provide sufficient increase to
mitigate the perceived performance problems of XML is not addressed.  I
noted in the Sun benchmarks provided to the workshop that a large
percentage of time was spent on "binding" and had nothing to do with the
actual on the wire transmission.  It's quite conceivable that parser
implementations will continue to improve to meet this need.  


There is no evaluation of what is likely to happen in time based upon
historical evidence.  For example, it may be that every 18 months, the
processor speed has doubled AND the efficiency of an XML Parser
implementation has doubled, provided a 4x increase in processing speed.
Looking out 18 months, to when a Recommendation might be produced, how
does the parser implementation affect the property trade-offs?


5. Relating property evaluation to threshold for WG formation.

Given a holistic approach to properties including the historical and
predicted changes, what new property points justify a binary WG?  And
which set of applications would this be sufficient for?  For example, is
it 1/3 of all messages would use binary if the 10/3
compactness/processor time trade-off was met?  Would it be 1/10 of
messages?  9/10?  Is 1/10, 1/3, 9/10 the threshold for standardizing a
binary format?  Is even such a spread necessary?   Is it that 1/3 of the
messages are suffering incredible pain that they would gladly take the
3/2 trade-off and be ecstatic with 10/3, and is this good enough for a
new WG?


The use case analysis seems to be what the "Generality" property was
trying to achieve, but the properties should be reserved for a technical
analysis of each solution.  I believe that "generality" as satisfying
use cases/scenarios is different than the technical trade-offs.  


6. Feasibility of Binary XML and evaluation of XML

I was surprised to find that XML was rated as PREVENTs for processing
efficiency, small footprint, forwards compatibility, considering that
all these properties are relative to XML.  I didn't understand this, and
it seems to cast XML in a bad light compared to itself.


I did not believe that generality is a property, and even if so, it's
self evident that XML has achieved the Generality property as that
property is currently loosely defined.  If anything, XML should be the
only format that has the "Generality" "property".  I believe that
"generality" should not be retained in this feasibility section.  


7. How many formats?

Because there is a lack of thresholds for formats, there is no
indication of how many binary formats will be standardized.  For
example, we could be in a situation where 2/3 of messages could be
satisfied by 1 format that achieves the 3/2 ratio.  We could also be in
a situation where 2/3 of messages could use some binary but none are
satisfied by the 3/2 yet there are 3 different solutions that yield 10/3
that collectively meet the 2/3 messages.  


8. Format evaluation and selection process

The selection process for formats and how organizations will submit
formats is not specified.  There are a wide variety of formats
available.  Certainly most of the major vendors have at least one binary
format that is used within their software.  For example, BEA provides a
TokenStream format for BEA's XQuery engine [2].  It is possible that BEA
would be quite happy if it's Token Stream binary format were adopted and
it is possible that BEA would submit TokenStream.  It seems inevitable
that other vendors will submit a variety of their format(s) - such as a
Microsoft binary Indigo format [3], [4].  How would a BEA or other
vendor know the process, including evaluation methodology and selection
criteria, that all the submitted format(s) will be subjected to?


(Ed's additions)


9. The Fragmentable requirements [5] requires that partial files need to
be able to be processed, yet at the same time the Schema Extensions and
Deviations section [6] refers to embedded schemas in the same file.  In
a binary file with random update, I think its highly unlikely that a
partial transmition would allow for any ability to utilize the binary
file format.


10. The webarch document refers to human readability "Textual formats
are usually more portable and interoperable. Textual formats also have
the considerable advantage that they can be directly read by human
beings" [7] which would be lost with a binary xml format. 


11.  I was also disappointed to see that partial document security
wasn't really addressed.  For example a binary document would contain
header/routing information as well as one or more 'payloads' of data.
It seems to me that we're missing an opportunity to allow the key binary
data to be encrypted, and signed by one authority but routed my multiple
authorities.  This wasn't addressed in the document.


12.  I'm also concerned about they overhead in creating and maintaining
random access content on the small memory footprint/small processor
systems described in the document.  We're looking at uncompressing (or
decrypting) the data stream, loading the content into memory for random
access and performing functions against the binary data as a method to
minimize footprint?



[2] http://www.dbis.ethz.ch/research/publications/vldbj.pdf



[5] http://www.w3.org/TR/xbc-properties/#fragmentable 

[6] http://www.w3.org/TR/xbc-properties/#schema-extensions-deviations

[7] http://www.w3.org/TR/2004/REC-webarch-20041215/




Received on Tuesday, 26 April 2005 11:41:04 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:08 UTC