Review of EXI from David Orchard on 2007-10-25 (www-tag@w3.org from October 2007)

From: David Orchard <dorchard@bea.com>
Date: Wed, 24 Oct 2007 22:12:42 -0700
To: "W3C-TAG Group WG" <www-tag@w3.org>
Message-ID: <BEBB9CBE66B372469E93FFDE3EDC493EF071CA@repbex01.amer.bea.com>
Draft TAG response to EXI.  I believe there could be significant
negative response so I encourage the TAG members to suggest
modifications as they see fit.
 
The TAG does not consider it's concerns raised in May 2005[1] to be
addressed.  We are unable to ascertain whether our concerns have been
addressed or not from the measurements document[2] produced by the
working group.  It appears that the Working Group believes that
non-Working Group members must prove the WG wrong in it's decisions on
EXI technologies, rather than convincing the world that there is indeed
a conclusive case.
 
The measurements document is full of 127 pages of data and analysis but
we did not see any material in a summary or abstract that addressed our
concerns.   In the almost 3 months since publication of the document on
July 25th 2007, the EXI working group has had 28 messages on their
mailing list, none of which are comments on the measurements document.
During the Oct 4th TAG telcon, the TAG spent approximately 30 minutes
looking through the document but were unable to glean answers to our
questions.  We believe that the measurements document is too detailed
and academic to prove the need for any flavours of binary XML or the
selection of a particular technology, and that it is likely that only a
handful of people will be able to grasp even the data embodied in the
document.  
 
The document states "Based on the measurements described here, the
working group has selected Efficient XML ([EffXML]
<http://www.w3.org/TR/2007/WD-exi-measurements-20070725/#ref-EfficientXM
L> ) to be the basis for the proposed encoding specification to be
prepared as a candidate W3C Recommendation. " There is no high level
supporting evidence for such a conclusion.  
 
It appears that the working group has presupposed a conclusion that
binary xml is necessary and no concerns by the TAG or member companies
on the technology and methodologies themselves or the integration with
the rest of XML (including DSig) need to be addressed directly.  Some
examples:
-  A main author of Efficient XML did press outreach [3] that resulted
in an article called "W3C is working on a solution to bandwidth-hogging,
clunky XML".
- IBM's withdrawal [4] from the Working Group shows their detailed and
specific disagreement with the benefits of EXI and remains unaddressed.
Just recently a working group member said "The same goes for IBM, if
they really thought the result tells something important, they would
have spent some more time establishing a case out the result against
EXI. "[5]
- the evolution of "the XML stack" including DSig and Encryption is
mostly ignored [6].  For example, of all the use cases that EffXML
solves, what cases require a revision of DSig and/or Encryption before
deployment?  It may very well be the case that similar to the failure of
the incompatible evolution of XML 1.1 will be duplicated in the an
incompatible evolution of EXI without supporting evolution of
technologies. 
- the TAG concerns were never directly addressed.
 
We will go through our documented longstanding concerns in order.
 
"The drawbacks are likely to include reduced interoperability with XML
1.0 and XML 1.1 software, and an inability to leverage the benefits of
text-based formats.  "
 
The Working Group has rejected the use of an encoding in the XML
Declaration because of the 18 or so characters.  We fail to understand
why this is an issue.  XML 1.0 allows different encodings and it allows
default encodings to be defined out of band.  Constrained environments
with out of band knowlege of binary encoding could assume binary
encoding on incoming messages, failing over to text xml.  Less tightly
coupled environments, such as the Web, could use the XML declaration
encoding.   This at least would preserve part of the XML stack and
technologies.
 
Continuing further, "In particular, we suggest that a quantitative
analysis is necessary.
For at least a few key use cases, concrete targets should be set for the
size and/or speed gains that would be needed to justify the disruption
introduced by a new format.  For example, a target might be that "in
typical web services scenarios, median speed gains on the order of 3x in
combined parsing and deserialization are deemed sufficient to justify a
new format."  We further suggest that representative binary technologies
be benchmarked and analyzed to a sufficient degree that such speed or
size improvements can be reasonably reliably predicted before we commit
to a Recommendation.  No doubt, any given set of goals or benchmarks
will suffer from some degree of imprecision, but if the gains are
sufficiently compelling to justify a new format, then they should be
relatively easy to demonstrate.  In short, actual measurements should be
a prerequisite to preparing a Recommendation."
 
The Working Group did not establish concrete targets for size and/or
speed gains that would justify disruption prior to running measurements.
Actual measurements have been done but the absence of clear targets for
size and/or speed gains even after the measurements means that the the
selection of any technology appears unjustified.  
 
Continuing in the TAG message, "In doing such measurements, we believe
it is essential that comparisons
be done to the best possible text-based XML 1.x implementations, which
are not necessarily those that are most widely deployed.  Stated
differently: 
if XML 1.x is inherently capable of meeting the needs of users, then our
efforts should go into tuning our XML implementations, not designing new
formats.  Benchmark environments should be as representative as possible
of fully optimized implementations, not just of the XML parser, but of
the surrounding application or middleware stack.  We note that different
application-level optimizations may be necessary to maximize the
performance of the Binary or text cases respectively.  Care should
especially be taken to ensure that the performance of particular APIs
such as DOM or SAX does not obscure the performance possible with either
option (e.g. both SAX and DOM can easily result in high overhead string
conversions when UTF-8 is used.)"
 
The Working Groups call for implementations [7] specifically called for
only XML parsers, not of the surround application or middleware stacks,
JDKs or Java Virtual Machines.  The benchmarks have not been against the
best possible text-based XML 1.x implementations.  
 
The measurements document acknowleges the issue of stack integration in
"Stack integration considers the full XML processing system, not just
the parser. By selectively combining the components of the processing
stack through abstract APIs, the system can directly produce application
data from the bytes that were read. Two prominent examples of this
technique are [Screamer
<http://www.w3.org/TR/2007/WD-exi-measurements-20070725/#ref-screamer> ]
and [EngelenGSOAP
<http://www.w3.org/TR/2007/WD-exi-measurements-20070725/#ref-Engelen-gso
ap> ]. Both of these can also be called schema-derived as they compile a
schema into code. However, neither simply generates a generic parser,
but rather a full stack for converting between application data and
serialized XML. This gives a significant improvement compared to just
applying the pre-compilation to the parsing layer. "  But neither of
these prominent examples appears in the test data.  
 
Further, there were no "real-world end to end" use cases tested, such as
a Web service application, a mobile application, etc.  Thus we do not
know the overall effect of any particular technology on the overall
application performance.  
 
The measurements document states "To begin with, the XBC
Characterization Measurement Methodologies Note
<http://www.w3.org/TR/xbc-measurement/>  defines thresholds for whether
a candidate format achieves sufficient compactness " [8].  The XBC
Characterization Measurement Methodologies Note[8] states "Because XML
documents exist with a wide variety of sizes, structures, schemas, and
regularity, it is not possible to define a single size threshold or
percentage compactness that an XML format must achieve to be considered
sufficiently compact for a general purpose W3C standard. "
 
We attempted to determine the differences between Efficient XML and Gzip
but found the methodology confusing.  The measurements document
specifies that "In the Document
<http://www.w3.org/TR/2007/WD-exi-measurements-20070725/#document>  and
Both <http://www.w3.org/TR/2007/WD-exi-measurements-20070725/#both>
classes, candidates are compared against gzipped XML, while in the
Neither
<http://www.w3.org/TR/2007/WD-exi-measurements-20070725/#neither>  and
Schema <http://www.w3.org/TR/2007/WD-exi-measurements-20070725/#schema>
cases, the comparison was to plain XML".   Examining Document and Both
compactness graphs, Gzip appears to offer improvements over XML that
track the other implementations, with the noteworthy point that
Efficient XML's improvements over Gzip are significant in a significant
part of the Both chart but similar in the Document.   Examining
Processing Efficiency graphs, it appears as though XML is clearly
superior in Java Encoding in Document and Both.  GZip appears further
inferior but yet all solutions vary wildly around XML in Decoding
Document and Both.  A worrying statement is "An interesting point to
note in the decoding results is the shapes of the graphs for each
individual candidate. Namely, these appear similar to each other,
containing similar peaks and troughs. Even more interestingly, this is
also the case with Xals, indicating that there is some feature of the
JAXP parser that is implemented suboptimally and triggered by a subset
of the test documents."   The measurements document states "For
instance, preliminary measurements in the EXI framework indicate that
the default parser shipped with Java improved noticeably from version
5.0 to version 6, showing 2-3-fold improvement for some cases. ", and
the measurements used JDK 1.5.0_05-b05 for Java based parsing and  JDK
1.6.0_02-ea-b01 for native.   Perhaps an improved JDK, Java Virtual
Machine, or virtualized JVM would further improve results.  
These leads us to wonder whether a combination GZip with improved
technologies such as Parsers, JDKs, VMs, or even Stack Integration
technology (that is Schema aware and hence covered under Both and
Schema) would suffice for the community.  
 
Examining the data sets used, there are a number of military
applications (ASMTF, AVCL, JTLM)  and yet comparatively few generic "Web
service" applications.  The Google test suite lists Web services for
small devices and Web services routing; the Invoice test suite lists
Intra/InterBusiness Communication which immediately limits it's scope to
"A large business communicates via XML with a number of remote
businesses, some of which can be small business partners. These remote
or small businesses often have access only to slow transmission lines
and have limited hardware and technical expertise. "; and there is a
WSDL test suite.    This seems to avoid the "common" Web service case of
the many Web APIs provided by hosted solutions like Google, Yahoo,
Amazon, eBay, Salesforce, Facebook, MSN, etc.    Examining the test data
shows that the Google test cases used 5 different test cases (0,7,15,
24,30) which includes 1 soap fault (case #24).  There are 2 AVCL, 5
Invoice, 8 Location Sightings, 6 JTLM, 5 ASMTF, 2 WSDL test cases as
well.  There appears to be broad based coverage of each, though the
rationale for the various weightings aren't documented.  For example,
why 4 Google "success cases" and 2 WSDL cases?  Surely there are more
than 2 times as many SOAP messages than WSDL messages being sent around
the internet.  
 
In conclusion, W3C TAG's concerns have not been addressed and the W3C
TAG does not support an incompatible change to the XML Stack.        
 
Cheers,
Dave
 
[1] http://lists.w3.org/Archives/Public/www-tag/2005May/0044
<http://lists.w3.org/Archives/Public/www-tag/2005May/0044> 
[2] http://www.w3.org/TR/2007/WD-exi-measurements-20070725/
<http://www.w3.org/TR/2007/WD-exi-measurements-20070725/> 
[3] http://www.sdtimes.com/article/latestnews-20071015-10.html
<http://www.sdtimes.com/article/latestnews-20071015-10.html> 
[4] http://lists.w3.org/Archives/Member/member-exi-wg/2007Mar/0014.html
<http://lists.w3.org/Archives/Member/member-exi-wg/2007Mar/0014.html> 
[5] http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/0010.html
<http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/0010.html> 
[6]
http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/att-0005/00-pa
rt
<http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/att-0005/00-p
art> 
[7] http://lists.w3.org/Archives/Public/public-exi/2006Mar/0004.html
<http://lists.w3.org/Archives/Public/public-exi/2006Mar/0004.html> 
[8] http://www.w3.org/TR/xbc-measurement/
<http://www.w3.org/TR/xbc-measurement/>
Received on Thursday, 25 October 2007 05:13:06 UTC