Re: Review of XBC specs from Dmitry Lenkov on 2005-04-06 (public-xml-core-wg@w3.org from April 2005)

From: Dmitry Lenkov <dmitry.lenkov@oracle.com>
Date: Tue, 05 Apr 2005 17:34:49 -0700
To: public-xml-core-wg@w3.org
Message-ID: <42532EA9.2010701@oracle.com>
I forgot to mention that I support the recommendation in spec #4, 
partially based on analysis in 4 specs, even incomplete as they are, and 
partially based on my personal experience with binary formats.

Dmitry

Dmitry Lenkov wrote:

> My review is attached.
> In regard to Lew's message, I have no disagreements, including 
> apologies for being late,
> but with a couple of questions. I am not sure I understand:
>
>  - Deltas vs Fragments
>  - Adhoc Schema processing and data typing
>
>
> Dmitry
> --------------------------------------------
>
>------------------------------------------------------------------------
>
>	Review of XBC specs
>	-------------------
>
>The debate over the use of binary serializations of XML documents has been going on for long time. Even at the time of SGML Binary SGML was considered. While there still are many people biased toward textual XML and questioning the need for binary XML serializations, the experience of last 5-7 years shows that numerous application domains transitioned from using XML in its textual representation to using XML binary serializations. In addition, the number of applications using just binary XML serializations, and this way adopting XML, is growing too.
>
>The goal of XBC was to analyse this experience of recent years and, based on it, answer one of the most important questions concerning Binary XML - whether a single solution can operate efficiently (see discussion on spec #3 and #4)  on a vast and uneven set of requirements.
>
>It produced 4 documents:
>1. XML Binary Characterization Use Cases
>2. XML Binary Characterization Properties
>3. XML Binary Characterization Measurement Methodologies
>4. XML Binary Characterization - summary of analysis, answer to the above question, and recommendations.
>
>1. XML Binary Characterization Use Cases
>
>This document describes use cases for evaluating the potential benefits of binary serialization format for XML. It is a well done document presenting a reasonably complete set of use cases. While it can be argued that more use cases could be considered, in my opinion, and most of XBC agree with this, that the addition of more use cases would affect the final analysis very minimally and, so, would have a negligibal benefit.
>
>Each use case enumerates a set of properties (see discussion on spec #2) required to be supported by the binary serialization for this use case and grades them as "must have", "should have", and "nice to have". These grades were assigned to required properties after a discusion in one or more XBC meeting. For some use cases (and maybe more) it can be argued how full is the set of required preperties and how accurately grades are assigned. Any comments and suggestions in this respect will be very useful for the next WG, if any.
>
>2. XML Binary Characterization Properties
>
>This document identifies and defines desirable properties an XML format based on requirements induced by use cases collected in spec #1. An XML format is a format that is capable of representing the information in an XML document. The textual representation of XML documents is considered to be one of the formats (binary or non-binary) to be analyzed (see spec #4).
>
>Important points about this document:
>
>a/ Properties are devided into algorithmic properties and format properties. The format properties characterize XML formats (binary and non-binary) regardless how actual implementations support them. The algorithmic properties relate to actual implementations handling XML formats. In respect to these properties XML fomats can either prevent them or not prevent. At the end of the document additional considerations are added. They are not propertis of XML formats or implementations handling those formats. They rather relate to policies and development practices of organizations supporting XML formats (binary or non-binary). Unfortunately all three categories get mixed up in one pile in spec #4 which delutes the value of analyses there (in spec #4).
>
>b/ Integratable into XML stack. This is an important property. It potentially opens the door to multiple changes in mutiple W3C specs. And it is difficult to measure, in particular during initial phases when potential candidate formats will be considered by the next Binary XML WG, if any.
>
>c/ Partially valid and partially well formed documents. Robustness and Schema Instance Change Resilience properties require to process (understand) partially valid and partially well-formed documents. It might have a significant impact on the way XML processors will have to behave.
>
>d/ Data models, Canonical XML, etc. Several properties (Robustness, Schema Instance Change Resilience, Schema Extensions and Deviations, Self Contained) discuss corresponding requirements on preservation of some information without deifining what this information is. Initial XBC discussed Infoset, PSVI, Querry Data Model, Canonical XML. Later it was decided that to eliminate this descussion due to time limits.
>
>e/ XML and Namespaces 1.x. All considerations and analusis in all XBC specs do not include other specs like XLink, XInclude, xml:id, etc. 
>
>Core WG might want to advice the future Binary XML WG, if any, on this items.
>
>3. XML Binary Characterization Measurement Methodologies
>
>This documents contains a set of good spirited advices on how certain properties can be measured. It's not clear who, when, how, and where will apply them.
>
>4. XML Binary Characterization
>
>This document does the following:
>
>a/ Mixes format properties, algorithmic properties, and additional considerations into one pile and calles all of them properties.
>
>b/ Identifies properties as W3C required, MUST Support, MUST NOT Prevent, and those that are not included. This classification is based on analyses of use cases and properties important for those use cases, emotional speeches and discussions, and periods of silence during telcons when decisions were made fast. While a clasification of properties as more imortant and less important is useful, excusion of some properties is really questionable. This might lead to the situation when some vendors will have to provide proprietary extensions which make the standard not as valuable.
>
>c/ Identifies measurements for all properties as binary - Yes / No or Prevent / Does Not Prevent. I agree with this except for the compactness and generality properties. These properties require a more elaborate measurement schemes. Otherwise, this identification of measurements helps to understand how to group properties in evaluating particular formats.
>
>d/ Provides a table of sample formats where they are evaluated against properties in the final list. I my view it has the following problems:
>- Yes or No for the compactness property do not have a clear meaning or at least require extended explanations
>- Yes or No for Roundtrip Support, Self-Contained, and Schema Extensions and Deviations do not have a clear meaning without indication what Data Model is assumed.
>- XML cannot be measured as Yes for fragmentable
>- Most formats, in particular proprietary ones, look so good that I wander!
>
>
>  
>
Received on Wednesday, 6 April 2005 00:34:54 UTC