W3C home > Mailing lists > Public > www-tag@w3.org > May 2005

RE: Dave Orchard individual review of XBC for the TAG

From: Rice, Ed (HP.com) <ed.rice@hp.com>
Date: Tue, 3 May 2005 10:45:34 -0700
Message-ID: <7D6953BFA3975C44BD80BA89292FD60E0290E3F2@cacexc08.americas.cpqcorp.net>
To: "Rice, Ed (HP.com)" <ed.rice@hp.com>, "David Orchard" <dorchard@bea.com>, <www-tag@w3.org>
Some revisions as discussed;


1.	The TAG does not feel that the WG has made their case for the
value proposition that an xml binary standard has been made.  
2.	Benchmarks

	a.	My position remains the same as articulated in BEA's
position paper [1] for the binary interchange workshop, particularly in
the "How To Measure Candidate Solutions" section, bullet 4.c
("Measurable benefit(properties) from benchmarks") in the
"Recommendations" section, and expressed further in the workshop.  My
position also remains the same on the importance architectural
properties of self-description and extensibility, also articulated in
the BEA paper.  
	b.	The Working Group did not provide benchmarks that
indicate a high likelihood that a single format will sufficiently alter
the mix of properties of text xml to be worth standardization at the
	c.	This may very well be a charter or timing question.  The
charter is somewhat vague in whether benchmarks and property assessment
is required or not, and even if it was, the WG's charter expired.  

3.	Properties

	a.	Related to the benchmarks, there is no framework for
evaluating the properties of interest.  A survey of properties are well
described, such as compactness, space efficiency, etc.   These are good
starting points for a properties for a benchmark.  But the critical
piece of information that I am looking for a binary xml go/no-go is a
hard-nosed approach to these properties and their trade-offs to support
binary xml.  For example, is one new trade-off point a 3x increase in
compactness for an arbitrary set of documents with a 2x increase in
processor time?  Is there a 10x increase in compactness with 3x increase
in processor speed combination point as well? 
	b.	The documents do not provide the property trade-off

4.	Parser implementation

	a.	Further related to the benchmarks, the central question
of whether improving parser implementation will provide sufficient
increase to mitigate the perceived performance problems of XML is not
addressed.  I noted in the Sun benchmarks provided to the workshop that
a large percentage of time was spent on "binding" and had nothing to do
with the actual on the wire transmission.  It's quite conceivable that
parser implementations will continue to improve to meet this need.  
	b.	There is no evaluation of what is likely to happen in
time based upon historical evidence.  For example, it may be that every
18 months, the processor speed has doubled AND the efficiency of an XML
Parser implementation has doubled, provided a 4x increase in processing
speed.  Looking out 18 months, to when a Recommendation might be
produced, how does the parser implementation affect the property

5.	Relating property evaluation to threshold for WG formation.

	a.	Given a holistic approach to properties including the
historical and predicted changes, what new property points justify a
binary WG?  And which set of applications would this be sufficient for?
For example, is it 1/3 of all messages would use binary if the 10/3
compactness/processor time trade-off was met?  Would it be 1/10 of
messages?  9/10?  Is 1/10, 1/3, 9/10 the threshold for standardizing a
binary format?  Is even such a spread necessary?   Is it that 1/3 of the
messages are suffering incredible pain that they would gladly take the
3/2 trade-off and be ecstatic with 10/3, and is this good enough for a
new WG?
	b.	The use case analysis seems to be what the "Generality"
property was trying to achieve, but the properties should be reserved
for a technical analysis of each solution.  I believe that "generality"
as satisfying use cases/scenarios is different than the technical

6.	Feasibility of Binary XML and evaluation of XML

	a.	I was surprised to find that XML was rated as PREVENTs
for processing efficiency, small footprint, forwards compatibility,
considering that all these properties are relative to XML.  I didn't
understand this, and it seems to cast XML in a bad light compared to
	b.	I did not believe that generality is a property, and
even if so, it's self evident that XML has achieved the Generality
property as that property is currently loosely defined.  If anything,
XML should be the only format that has the "Generality" "property".  I
believe that "generality" should not be retained in this feasibility

7.	How many formats?

	a.	Because there is a lack of thresholds for formats, there
is no indication of how many binary formats will be standardized.  For
example, we could be in a situation where 2/3 of messages could be
satisfied by 1 format that achieves the 3/2 ratio.  We could also be in
a situation where 2/3 of messages could use some binary but none are
satisfied by the 3/2 yet there are 3 different solutions that yield 10/3
that collectively meet the 2/3 messages.  

8.	Format evaluation and selection process

	a.	The selection process for formats and how organizations
will submit formats is not specified.  There are a wide variety of
formats available.  Certainly most of the major vendors have at least
one binary format that is used within their software.  For example, BEA
provides a TokenStream format for BEA's XQuery engine [2].  It is
possible that BEA would be quite happy if it's Token Stream binary
format were adopted and it is possible that BEA would submit
TokenStream.  It seems inevitable that other vendors will submit a
variety of their format(s) - such as a Microsoft binary Indigo format
[3], [4].  How would a BEA or other vendor know the process, including
evaluation methodology and selection criteria, that all the submitted
format(s) will be subjected to?

9.	The Fragmentable requirements [5] requires that partial files
need to be able to be processed, yet at the same time the Schema
Extensions and Deviations section [6] refers to embedded schemas in the
same file.  In a binary file with random update, I think its highly
unlikely that a partial transmition would allow for any ability to
utilize the binary file format.
10.	The webarch document refers to human readability "Textual
formats are usually more portable and interoperable. Textual formats
also have the considerable advantage that they can be directly read by
human beings" [7] which would be lost with a binary xml format. 
11.	I was also disappointed to see that partial document security
wasn't really addressed.  For example a binary document would contain
header/routing information as well as one or more 'payloads' of data.
It seems to me that we're missing an opportunity to allow the key binary
data to be encrypted, and signed by one authority but routed my multiple
authorities.  This wasn't addressed in the document.
12.	I'm also concerned about the overhead in creating and
maintaining random access content on the small memory footprint/small
processor systems described in the document.  We're looking at
uncompressing (or decrypting) the data stream, loading the content into
memory for random access and performing functions against the binary
data as a method to minimize footprint?



[2] http://www.dbis.ethz.ch/research/publications/vldbj.pdf



[5] http://www.w3.org/TR/xbc-properties/#fragmentable 

[6] http://www.w3.org/TR/xbc-properties/#schema-extensions-deviations

[7] http://www.w3.org/TR/2004/REC-webarch-20041215/






Received on Tuesday, 3 May 2005 17:46:11 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:08 UTC