- From: <noah_mendelsohn@us.ibm.com>
- Date: Sun, 20 Feb 2005 12:30:37 -0500
- To: "David Orchard" <dorchard@bea.com>, www-tag@w3.org
Background ---------- Dave Orchard is leading the TAG's effort on extensibility and versioning, and with help from co-editor Norm Walsh, Dave has been writing an extensive two part draft finding. Copies of a revised draft were posted to this list in November, just before the TAG's Cambridge F2F [1]. Few TAG members read the revisions in time for the meeting, but Dave did walk us through them. Dan Connolly submitted some comments later [2] which generated a bit of discussion [3,4]. At the meeting, I indicated that I thought the drafts would benefit from more focus on framing the broader issues relating to versioning, XML and the Web, perhaps at the expense of some details relating to XML Schema 1.0 and particular XML versioning idioms. Such broader issues might include: 1) how versioning and extensibility choices affect the utility and stability of XML-based Web technologies and 2) investigation of a somewhat broader range of XML use cases, and 3) deeper exploration of the general characteristics that we might want from any particular solutions. The TAG assigned me an action to make more detailed suggestions, and to help Dave moving forward. This note is in fulfillment of the first part that assignment, I.e. to set out some of the directions I'd like to see explored. I hope to work informally with Dave on whether and how to integrate these ideas. I'm sure we'll have lots of opportunity to talk at the plenary. I should say that overall I like a lot of what he and Norm have written, and I hope these will be viewed as constructive suggestions. Overview of Comments, Suggestions, Concerns ------------------------------------------- I. Pros and cons of extensibility The "first rule" introduced in the draft is a Good Practice Note (GPN) that says [5]: "Allow Extensibility rule: Languages SHOULD be designed for extensibility." Other GPNs advocate specific idioms for doing this. In my opinion, this somewhat jumps to a conclusion regarding one of the most difficult and important tradeoffs relating to extensibility: when do the benefits outweigh the costs? I think it's fair to say that some of the most successful Web technologies have succeeded as much from the ways that they are inflexible as from the ways that they are extensible. XML, which is arguably a success, had as one of its original goals: "The number of optional features in XML is to be kept to the absolute minimum, ideally zero."[6]. Except for the ability to define your own element and attribute names and choose character encodings, XML is remarkably inflexible and not particularly extensible. Sometimes that's frustrating: we couldn't use XML Schema in place of DTDs in the internal subset, and it's proving very hard to roll out the new content conventions for XML 1.1. Users rightly value the very high compatibility that results from XML's inflexibility. Although the draft correctly cites HTML's open content and "must ignore" tag rules as a success, there have also been serious interoperability problems as various vendors exploited that flexibility to introduce their own flavors of HTML. I suspect that similar tradeoffs will apply as XML vocabularies are designed for other purposes: extensibility tends to stand in opposition to interoperability, and both are important. I think the finding would be much stronger if it explored such tradeoffs, and gave some more nuanced guidance as to when things should be locked down and when they should be extensible. In fact, such analysis could be one of the essential contributions of the finding. Yes, the answer is often to provide for certain forms of extensibility, but we shouldn't recommend that blindly. I think this is a subtle question that's particularly appropriate to the scope and mission of the TAG. II. Relationship to namespaces The recent semi-permathread on immutability of namespaces suggests that the community would welcome a lucid analysis of the relationship of namespaces to vocabularies, languages and to versioning of both. Part 2 of the drafts does discuss various strategies, but the permathread suggests that the community is looking for >principles< relating to the immutability or lack thereof of a namespace, principles relating the use of namespaces to the deployment of language versions and schemas, and perhaps principles explaining what role if any namespaces should play in determining how an application should interpret dialects of the vocabularies that it processes. III. Dealing with partial understanding The draft introduces definitions like "forwards-compatible" [7]: "A language change is forwards compatible if older processors can process all instances of the newer language." It also suggests that [8]: "Forwards compatibility can only be achieved by providing a substitution mechanism for Version 2 instances or Version 1 extensions to V1 without knowledge of V2. A V1 consumer must be able to transform any instances, such as V1 + extensions, to a V1 instance in order to process the instance." The finding would be stronger if it stepped up to the fact that processing is a matter of degree. In an extensible system, it's common that even an early version of an application will have partial ability to process features introduced later. Consider a new element introduced into a vocabulary. Can it be completely ignored, I.e. safely eliminated by a substitution? Well, I suspect that if there is a signature on the document then the new element is signed along with the others, even if not otherwise processed. If you save the document on disk, do you not save the elements you didn't understand in detail? Maybe; it depends why you're saving. If you're a SOAP intermediary, do you relay the misunderstood elements? SOAP gives you an attribute [9] that allows you to request such relay of content that was not otherwise understood, and SOAP specifically allows content from such elements to be used as input to other processing (e.g. digital signatures, logging, etc.). If you have function to print an XML document, do you print content from the new element? Perhaps not, but you might also have default printing rules or heuristics that you could use. The version 4 word processor mentioned in [7] may indeed successfully read version 5 documents, but may produce sub-optimal or incorrect output from some of them. All of these are examples of systems in which partial understanding leads to useful processing. Furthermore, if two different applications are deployed based on version 1 of a language, those applications may differ in their ability to deal with contrstucts that are introduced later. I think the drafts jump a bit too quickly to proposals like "a substitution mechanism" and "mustIgnore", and thus obscure important issues relating to partial understanding. Indeed, I'm not convinced that simple substitution mechanisms are the right framework for dealing with partial interoperation. By accurately modeling a more variable notion of compatibility, it also becomes possible to explore a question that the schema WG has been considering in detail: how can a schema language help an application to sort out its different levels of understanding of particular content (e.g. what the application should store, what it should print, which content should be processed with what conventions)? Various options have been suggested, including: (a) because W3C XML schemas uniquely attribute each element in an instance to a particle in a schema content model, you can tell which elements were validated by wildcards -- that might suggest content you can tolerate but don't fully understand; (b) validate various subsets of the document (different substitutions) against multiple schemas or in various forms of fallback mode when content is not found to be fully valid. The point is that, to explore such questions, you have to be very careful with assumptions about what it means for an application to "process" an instance, and how such assumptions relate to schema validity. Thus, I think the finding should more carefully deal with partial understanding of language constructs, and the relationship to schemas. IV. Need general guidelines for XML and Schema solutions I think it's healthy to set up goals and success criteria separately from proposed solutions. The draft does some of this, insofar as it makes the case that flexible extensibility is a goal. I think there are some more detailed goals that should be set out or considered before getting into particular XML and Schema idioms. Some that occurred to me are in the white paper I wrote last year [10,11], including: * The same vocabulary may be versioned or fixed repeatedly. Accordingly, any general approach should be convenient to use even after 20 or 30 such revisions. Both instances and schemas of the later versions should be easy to create and use. * The versioning mechanisms should (in most cases) not presume particular instance constructions such as <extension> elements. * In some but not in all cases, some degree of forward and/or backward compatibility is be required: I.e. it should be possible but not essential to write early schemas that will somehow accept content that is not fully defined until later, and schemas for later versions will often but not always validate earlier forms of the vocabulary. (The draft does cover this one, I think.) * Conversely, breaking changes should not in all cases be forbidden. For example, it may be that an early construct is deprecated at some later time, and perhaps completely disallowed eventually. Likewise, later versions may introduce constructs that are rejected outright by earlier ones. * It should be possible to check for or force various sorts of forward or backward compatibility when desired (this is the notion of partial recognition and processing, mentioned in III above). * Schemas for versions of a vocabulary may but need not form a sequence or tree, in which later versions somehow directly reference particular schema documents for earlier versions. This flexibility allows for possible redefinition of the same vocabulary by multiple organizations or in more than one schema (e.g. there's a debug schema and a production schema, neither based explicitly on the other). * A consequence of the point above is that the schema for version x is not necessarily expressed as a delta on or by direct reference to the schema for version x-1, if in fact the versions form a sequence at all. Such incremental definition schemes are convenient, but do not necessarily scale to the case where the same vocabulary is revised 20 or 30 times. In such a case one would need up to 30 schema documents to assemble the effective schema. Thus, such incremental schemes should be allowed where useful, but not presumed in all cases. * No unnecessary assumptions should be made regarding the relationships between vocabularies and XML Namespaces. Often, a vocabulary will be expressed primarily as a single XML namespace. Often, to maintain forward and backward compatibility, that same namespace will be used in subsequent versions as well. Nothing in the overall XML mechanisms to support versioning (e.g. schema language constructs) should prohibit the use or coordinated evolution of multiple namespaces to define one or more languages, the addition of new namespaces in subsequent versions of a language, etc. (Here I admit I'm staking out a personal position on the Namespaces question raised in II above). The above is NOT necessarily the right list, but I think the finding would make a contribution if it set out such principles separately from any proposed solutions. If we do retain a Part 2 that discusses particular extensibility idioms, then they should each be rated against explicit goals such as the examples listed above. V. The relationship between syntax and semantics Though it mentions other options in passing, the finding deals primarily with examples in which the syntax of the XML more or less directly models the evolving semantics of the underlying data or application. For example, a given parent element may allow for elements or attributes to be introduced to express features of the language as it evolves. This is indeed a common idiom, and it's appropriate that the drafts explore it. Nonetheless, such approaches do not cover the full spectrum of common mechanisms for versioning XML vocabularies. Perhaps, as in SOAP encoding or RDF, the XML is a serialization for a higher level model, versioning of which is not well expressed at the element and attribute level. We should go into more detail about the implications for XML and schemas, I think. Sometimes new versions of a language specify coordinated updates to the use of or constraints on the contents of elements or attributes scattered throughout a document. Perhaps an attribute changes the meaning of a legacy element (e.g. currency="peso"). Perhaps the specification of a SOAP header requires that it be used with other headers (which may be interspersed with other headers). In all these cases, it becomes difficult to tell the versioning story entirely in terms of XML elements and attributes, and it's often problematic to do a useful job of expressing the pertinent constraints in XML Schema languages. In such systems, the extensibility of semantics is only indirectly related to the syntactic structure of the XML. If the finding is to achieve its goal of exploring the versioning of XML vocabularies, then it's as important to either deal with such approaches, or to make the case that they are not important. I think they will be common and are important. (BTW: I suspect that "mustIgnore" at the XML level does not cover such higher level versioning particularly well.) Summary ------- Taken together, the above represent a proposal to focus the finding less on the details of particular XML constructions, and more on the general versioning and evolution strategies that are likely to be essential to the Web's and XML's continued success. Indeed, there's some question as to whether the most useful finding would continue to focus only on XML, or also might introduce some general principles applicable to many media types, and then apply those to XML (or RDF, etc.) in particular. I do recognize that issue XMLVersioning-41 [12] is currently scoped specifically to XML. In general, following the precedent of the Architecture Document [13], we should explore high-level tradeoffs and principles, somewhat in preference to making detailed recommendations on syntactic mechanisms. While there's lots of good work on in the drafts on XML Schema specifics, especially in Part 2, I think those are only the purview of the TAG insofar as they are necessary to motivate the broader themes and principles, or are truly central to the Web's success. Other details of ensuring that W3C XML Schema is usable to support versioning scenarios are explicitly in the charter of the XML Schema WG [14]; indeed, I'm delighted that the TAG and Schema WG are now working more closely together. I think the general balance should be that the Schema WG handles the schema-language-specific parts of the problem, with help from the TAG, and the TAG discusses the broader architectural issues, with help from (among others) the Schema WG. There remains a question of whether the TAG will choose to do a formal finding in this area at all. I am cautiously optimistic that we can and should, but I do feel that our focus should be more on broader themes, perhaps including those discussed above. I certainly think it's worth continued effort in the coming weeks to see whether we can do something that the community would value. My recent rereading of the drafts has reminded me once again what a careful and diligent job Dave has done to take us to this point, and speaking for myself it is much appreciated! This start will prove to be very valuable, regardless of how we proceed, or whether any of the suggestions made above are adopted. I look forward to helping Dave and Norm in any way that I can to improve the drafts. Thank you all for your patience with this long note. Noah [1] http://lists.w3.org/Archives/Public/www-tag/2004Nov/0071.html [2] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0018.html [3] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0019.html [4] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0020.html [5] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att-0071/versioning-part1.html#identify [6] http://www.w3.org/TR/1998/REC-xml-19980210#sec-origin-goals [7] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att-0071/versioning-part1.html#terminology [8] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att-0071/versioning-part1.html#div250901096 [9] http://www.w3.org/TR/soap12-part1/#soaprelay [10] http://lists.w3.org/Archives/Public/www-tag/2004Aug/0010.html [11] http://lists.w3.org/Archives/Public/www-tag/2004Aug/att-0010/NRMVersioningProposal.html [12] http://www.w3.org/2001/tag/issues.html?type=1#XMLVersioning-41 [13] http://www.w3.org/TR/webarch/ [14] http://www.w3.org/2003/09/xmlap/xml-schema-wg-charter.html#Deliverables -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Sunday, 20 February 2005 17:55:09 UTC