Dave Orchard individual review of XBC for the TAG

I have reviewed the output documents of the XML Binary characterizations
working group.  The work outputs went quite a ways down the
investigation path and are good outputs, but I found the evidence for
formation of a new working group to produce a W3C Recommendation for one
or more binary XML formats uncompelling and unmotivating.  I do not
believe the deliverables provide sufficient motivation.   This review is
written as an elected TAG member, and not as a W3C member company that
has a publicly available RF binary format.

 

1. Process suggestion

I suggest that a W3C working group, perhaps a rechartered xml binary
characterizations wg, continue the work of providing further information
for a "go/no-go" recommendation.  I do not believe that the TAG should
endorse chartering a working group to produce a Rec track deliverable
for unknown numbers of binary formats at this time.  Further progress
down the path of benchmarking and use case validation is necessary to
justify a Rec track deliverable.   The comments from here on could be
used to assist in writing the charter for a XML Binary Characterizations
"the sequel" WG. 

 

2. Benchmarks

My position remains the same as articulated in BEA's position paper [1]
for the binary interchange workshop, particularly in the "How To Measure
Candidate Solutions" section, bullet 4.c ("Measurable
benefit(properties) from benchmarks") in the "Recommendations" section,
and expressed further in the workshop.  My position also remains the
same on the importance architectural properties of self-description and
extensibility, also articulated in the BEA paper.  

 

The Working Group did not provide benchmarks that indicate a high
likelihood that a single format will sufficiently alter the mix of
properties of text xml to be worth standardization at the W3C.  

 

This may very well be a charter or timing question.  The charter is
somewhat vague in whether benchmarks and property assessment is required
or not, and even if it was, the WG's charter expired.  

 

3. Properties

Related to the benchmarks, there is no framework for evaluating the
properties of interest.  A survey of properties are well described, such
as compactness, space efficiency, etc.   These are good starting points
for a properties for a benchmark.  But the critical piece of information
that I am looking for a binary xml go/no-go is a hard-nosed approach to
these properties and their trade-offs to support binary xml.  For
example, is one new trade-off point a 3x increase in compactness for an
arbitrary set of documents with a 2x increase in processor time?  Is
there a 10x increase in compactness with 3x increase in processor speed
combination point as well?  

 

The documents do not provide the property trade-off points. 

 

4. Parser implementation

Further related to the benchmarks, the central question of whether
improving parser implementation will provide sufficient increase to
mitigate the perceived performance problems of XML is not addressed.  I
noted in the Sun benchmarks provided to the workshop that a large
percentage of time was spent on "binding" and had nothing to do with the
actual on the wire transmission.  It's quite conceivable that parser
implementations will continue to improve to meet this need.  

 

There is no evaluation of what is likely to happen in time based upon
historical evidence.  For example, it may be that every 18 months, the
processor speed has doubled AND the efficiency of an XML Parser
implementation has doubled, provided a 4x increase in processing speed.
Looking out 18 months, to when a Recommendation might be produced, how
does the parser implementation affect the property trade-offs?

 

5. Relating property evaluation to threshold for WG formation.

Given a holistic approach to properties including the historical and
predicted changes, what new property points justify a binary WG?  And
which set of applications would this be sufficient for?  For example, is
it 1/3 of all messages would use binary if the 10/3
compactness/processor time trade-off was met?  Would it be 1/10 of
messages?  9/10?  Is 1/10, 1/3, 9/10 the threshold for standardizing a
binary format?  Is even such a spread necessary?   Is it that 1/3 of the
messages are suffering incredible pain that they would gladly take the
3/2 trade-off and be ecstatic with 10/3, and is this good enough for a
new WG?

 

The use case analysis seems to be what the "Generality" property was
trying to achieve, but the properties should be reserved for a technical
analysis of each solution.  I believe that "generality" as satisfying
use cases/scenarios is different than the technical trade-offs.  

 

6. Feasibility of Binary XML and evaluation of XML

I was surprised to find that XML was rated as PREVENTs for processing
efficiency, small footprint, forwards compatibility, considering that
all these properties are relative to XML.  I didn't understand this, and
it seems to cast XML in a bad light compared to itself.

 

I did not believe that generality is a property, and even if so, it's
self evident that XML has achieved the Generality property as that
property is currently loosely defined.  If anything, XML should be the
only format that has the "Generality" "property".  I believe that
"generality" should not be retained in this feasibility section.  

 

7. How many formats?

Because there is a lack of thresholds for formats, there is no
indication of how many binary formats will be standardized.  For
example, we could be in a situation where 2/3 of messages could be
satisfied by 1 format that achieves the 3/2 ratio.  We could also be in
a situation where 2/3 of messages could use some binary but none are
satisfied by the 3/2 yet there are 3 different solutions that yield 10/3
that collectively meet the 2/3 messages.  

 

8. Format evaluation and selection process

The selection process for formats and how organizations will submit
formats is not specified.  There are a wide variety of formats
available.  Certainly most of the major vendors have at least one binary
format that is used within their software.  For example, BEA provides a
TokenStream format for BEA's XQuery engine [2].  It is possible that BEA
would be quite happy if it's Token Stream binary format were adopted and
it is possible that BEA would submit TokenStream.  It seems inevitable
that other vendors will submit a variety of their format(s) - such as a
Microsoft binary Indigo format [3], [4].  How would a BEA or other
vendor know the process, including evaluation methodology and selection
criteria, that all the submitted format(s) will be subjected to?

 

Cheers,

Dave

 

[1]
http://www.w3.org/2003/08/binary-interchange-workshop/26-bea-BinaryXMLWS
.pdf

[2] http://www.dbis.ethz.ch/research/publications/vldbj.pdf

[3]
http://winfx.msdn.microsoft.com/library/default.asp?url=/library/en-us/i
ndigo_con/html/1243a070-6e5d-4cbc-919c-90727f96eae3.asp

[4]
http://hyperthink.net/blog/PermaLink,guid,7e62d706-84eb-4ad0-9250-90c265
6f9a01.aspx

 

 

 

Received on Monday, 11 April 2005 19:35:44 UTC