- From: Rice, Ed (HP.com) <ed.rice@hp.com>
- Date: Tue, 3 May 2005 10:45:34 -0700
- To: "Rice, Ed (HP.com)" <ed.rice@hp.com>, "David Orchard" <dorchard@bea.com>, <www-tag@w3.org>
- Message-ID: <7D6953BFA3975C44BD80BA89292FD60E0290E3F2@cacexc08.americas.cpqcorp.net>
Some revisions as discussed; 1. The TAG does not feel that the WG has made their case for the value proposition that an xml binary standard has been made. 2. Benchmarks a. My position remains the same as articulated in BEA's position paper [1] for the binary interchange workshop, particularly in the "How To Measure Candidate Solutions" section, bullet 4.c ("Measurable benefit(properties) from benchmarks") in the "Recommendations" section, and expressed further in the workshop. My position also remains the same on the importance architectural properties of self-description and extensibility, also articulated in the BEA paper. b. The Working Group did not provide benchmarks that indicate a high likelihood that a single format will sufficiently alter the mix of properties of text xml to be worth standardization at the W3C. c. This may very well be a charter or timing question. The charter is somewhat vague in whether benchmarks and property assessment is required or not, and even if it was, the WG's charter expired. 3. Properties a. Related to the benchmarks, there is no framework for evaluating the properties of interest. A survey of properties are well described, such as compactness, space efficiency, etc. These are good starting points for a properties for a benchmark. But the critical piece of information that I am looking for a binary xml go/no-go is a hard-nosed approach to these properties and their trade-offs to support binary xml. For example, is one new trade-off point a 3x increase in compactness for an arbitrary set of documents with a 2x increase in processor time? Is there a 10x increase in compactness with 3x increase in processor speed combination point as well? b. The documents do not provide the property trade-off points. 4. Parser implementation a. Further related to the benchmarks, the central question of whether improving parser implementation will provide sufficient increase to mitigate the perceived performance problems of XML is not addressed. I noted in the Sun benchmarks provided to the workshop that a large percentage of time was spent on "binding" and had nothing to do with the actual on the wire transmission. It's quite conceivable that parser implementations will continue to improve to meet this need. b. There is no evaluation of what is likely to happen in time based upon historical evidence. For example, it may be that every 18 months, the processor speed has doubled AND the efficiency of an XML Parser implementation has doubled, provided a 4x increase in processing speed. Looking out 18 months, to when a Recommendation might be produced, how does the parser implementation affect the property trade-offs? 5. Relating property evaluation to threshold for WG formation. a. Given a holistic approach to properties including the historical and predicted changes, what new property points justify a binary WG? And which set of applications would this be sufficient for? For example, is it 1/3 of all messages would use binary if the 10/3 compactness/processor time trade-off was met? Would it be 1/10 of messages? 9/10? Is 1/10, 1/3, 9/10 the threshold for standardizing a binary format? Is even such a spread necessary? Is it that 1/3 of the messages are suffering incredible pain that they would gladly take the 3/2 trade-off and be ecstatic with 10/3, and is this good enough for a new WG? b. The use case analysis seems to be what the "Generality" property was trying to achieve, but the properties should be reserved for a technical analysis of each solution. I believe that "generality" as satisfying use cases/scenarios is different than the technical trade-offs. 6. Feasibility of Binary XML and evaluation of XML a. I was surprised to find that XML was rated as PREVENTs for processing efficiency, small footprint, forwards compatibility, considering that all these properties are relative to XML. I didn't understand this, and it seems to cast XML in a bad light compared to itself. b. I did not believe that generality is a property, and even if so, it's self evident that XML has achieved the Generality property as that property is currently loosely defined. If anything, XML should be the only format that has the "Generality" "property". I believe that "generality" should not be retained in this feasibility section. 7. How many formats? a. Because there is a lack of thresholds for formats, there is no indication of how many binary formats will be standardized. For example, we could be in a situation where 2/3 of messages could be satisfied by 1 format that achieves the 3/2 ratio. We could also be in a situation where 2/3 of messages could use some binary but none are satisfied by the 3/2 yet there are 3 different solutions that yield 10/3 that collectively meet the 2/3 messages. 8. Format evaluation and selection process a. The selection process for formats and how organizations will submit formats is not specified. There are a wide variety of formats available. Certainly most of the major vendors have at least one binary format that is used within their software. For example, BEA provides a TokenStream format for BEA's XQuery engine [2]. It is possible that BEA would be quite happy if it's Token Stream binary format were adopted and it is possible that BEA would submit TokenStream. It seems inevitable that other vendors will submit a variety of their format(s) - such as a Microsoft binary Indigo format [3], [4]. How would a BEA or other vendor know the process, including evaluation methodology and selection criteria, that all the submitted format(s) will be subjected to? 9. The Fragmentable requirements [5] requires that partial files need to be able to be processed, yet at the same time the Schema Extensions and Deviations section [6] refers to embedded schemas in the same file. In a binary file with random update, I think its highly unlikely that a partial transmition would allow for any ability to utilize the binary file format. 10. The webarch document refers to human readability "Textual formats are usually more portable and interoperable. Textual formats also have the considerable advantage that they can be directly read by human beings" [7] which would be lost with a binary xml format. 11. I was also disappointed to see that partial document security wasn't really addressed. For example a binary document would contain header/routing information as well as one or more 'payloads' of data. It seems to me that we're missing an opportunity to allow the key binary data to be encrypted, and signed by one authority but routed my multiple authorities. This wasn't addressed in the document. 12. I'm also concerned about the overhead in creating and maintaining random access content on the small memory footprint/small processor systems described in the document. We're looking at uncompressing (or decrypting) the data stream, loading the content into memory for random access and performing functions against the binary data as a method to minimize footprint? [1] http://www.w3.org/2003/08/binary-interchange-workshop/26-bea-BinaryXMLWS .pdf [2] http://www.dbis.ethz.ch/research/publications/vldbj.pdf [3] http://winfx.msdn.microsoft.com/library/default.asp?url=/library/en-us/i ndigo_con/html/1243a070-6e5d-4cbc-919c-90727f96eae3.asp [4] http://hyperthink.net/blog/PermaLink,guid,7e62d706-84eb-4ad0-9250-90c265 6f9a01.aspx [5] http://www.w3.org/TR/xbc-properties/#fragmentable [6] http://www.w3.org/TR/xbc-properties/#schema-extensions-deviations [7] http://www.w3.org/TR/2004/REC-webarch-20041215/
Received on Tuesday, 3 May 2005 17:46:11 UTC