- From: David Orchard <dorchard@bea.com>
- Date: Mon, 11 Apr 2005 12:35:40 -0700
- To: <www-tag@w3.org>
- Message-ID: <32D5845A745BFB429CBDBADA57CD41AF0ED70400@ussjex01.amer.bea.com>
I have reviewed the output documents of the XML Binary characterizations working group. The work outputs went quite a ways down the investigation path and are good outputs, but I found the evidence for formation of a new working group to produce a W3C Recommendation for one or more binary XML formats uncompelling and unmotivating. I do not believe the deliverables provide sufficient motivation. This review is written as an elected TAG member, and not as a W3C member company that has a publicly available RF binary format. 1. Process suggestion I suggest that a W3C working group, perhaps a rechartered xml binary characterizations wg, continue the work of providing further information for a "go/no-go" recommendation. I do not believe that the TAG should endorse chartering a working group to produce a Rec track deliverable for unknown numbers of binary formats at this time. Further progress down the path of benchmarking and use case validation is necessary to justify a Rec track deliverable. The comments from here on could be used to assist in writing the charter for a XML Binary Characterizations "the sequel" WG. 2. Benchmarks My position remains the same as articulated in BEA's position paper [1] for the binary interchange workshop, particularly in the "How To Measure Candidate Solutions" section, bullet 4.c ("Measurable benefit(properties) from benchmarks") in the "Recommendations" section, and expressed further in the workshop. My position also remains the same on the importance architectural properties of self-description and extensibility, also articulated in the BEA paper. The Working Group did not provide benchmarks that indicate a high likelihood that a single format will sufficiently alter the mix of properties of text xml to be worth standardization at the W3C. This may very well be a charter or timing question. The charter is somewhat vague in whether benchmarks and property assessment is required or not, and even if it was, the WG's charter expired. 3. Properties Related to the benchmarks, there is no framework for evaluating the properties of interest. A survey of properties are well described, such as compactness, space efficiency, etc. These are good starting points for a properties for a benchmark. But the critical piece of information that I am looking for a binary xml go/no-go is a hard-nosed approach to these properties and their trade-offs to support binary xml. For example, is one new trade-off point a 3x increase in compactness for an arbitrary set of documents with a 2x increase in processor time? Is there a 10x increase in compactness with 3x increase in processor speed combination point as well? The documents do not provide the property trade-off points. 4. Parser implementation Further related to the benchmarks, the central question of whether improving parser implementation will provide sufficient increase to mitigate the perceived performance problems of XML is not addressed. I noted in the Sun benchmarks provided to the workshop that a large percentage of time was spent on "binding" and had nothing to do with the actual on the wire transmission. It's quite conceivable that parser implementations will continue to improve to meet this need. There is no evaluation of what is likely to happen in time based upon historical evidence. For example, it may be that every 18 months, the processor speed has doubled AND the efficiency of an XML Parser implementation has doubled, provided a 4x increase in processing speed. Looking out 18 months, to when a Recommendation might be produced, how does the parser implementation affect the property trade-offs? 5. Relating property evaluation to threshold for WG formation. Given a holistic approach to properties including the historical and predicted changes, what new property points justify a binary WG? And which set of applications would this be sufficient for? For example, is it 1/3 of all messages would use binary if the 10/3 compactness/processor time trade-off was met? Would it be 1/10 of messages? 9/10? Is 1/10, 1/3, 9/10 the threshold for standardizing a binary format? Is even such a spread necessary? Is it that 1/3 of the messages are suffering incredible pain that they would gladly take the 3/2 trade-off and be ecstatic with 10/3, and is this good enough for a new WG? The use case analysis seems to be what the "Generality" property was trying to achieve, but the properties should be reserved for a technical analysis of each solution. I believe that "generality" as satisfying use cases/scenarios is different than the technical trade-offs. 6. Feasibility of Binary XML and evaluation of XML I was surprised to find that XML was rated as PREVENTs for processing efficiency, small footprint, forwards compatibility, considering that all these properties are relative to XML. I didn't understand this, and it seems to cast XML in a bad light compared to itself. I did not believe that generality is a property, and even if so, it's self evident that XML has achieved the Generality property as that property is currently loosely defined. If anything, XML should be the only format that has the "Generality" "property". I believe that "generality" should not be retained in this feasibility section. 7. How many formats? Because there is a lack of thresholds for formats, there is no indication of how many binary formats will be standardized. For example, we could be in a situation where 2/3 of messages could be satisfied by 1 format that achieves the 3/2 ratio. We could also be in a situation where 2/3 of messages could use some binary but none are satisfied by the 3/2 yet there are 3 different solutions that yield 10/3 that collectively meet the 2/3 messages. 8. Format evaluation and selection process The selection process for formats and how organizations will submit formats is not specified. There are a wide variety of formats available. Certainly most of the major vendors have at least one binary format that is used within their software. For example, BEA provides a TokenStream format for BEA's XQuery engine [2]. It is possible that BEA would be quite happy if it's Token Stream binary format were adopted and it is possible that BEA would submit TokenStream. It seems inevitable that other vendors will submit a variety of their format(s) - such as a Microsoft binary Indigo format [3], [4]. How would a BEA or other vendor know the process, including evaluation methodology and selection criteria, that all the submitted format(s) will be subjected to? Cheers, Dave [1] http://www.w3.org/2003/08/binary-interchange-workshop/26-bea-BinaryXMLWS .pdf [2] http://www.dbis.ethz.ch/research/publications/vldbj.pdf [3] http://winfx.msdn.microsoft.com/library/default.asp?url=/library/en-us/i ndigo_con/html/1243a070-6e5d-4cbc-919c-90727f96eae3.asp [4] http://hyperthink.net/blog/PermaLink,guid,7e62d706-84eb-4ad0-9250-90c265 6f9a01.aspx
Received on Monday, 11 April 2005 19:35:44 UTC