- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 21 Feb 2005 19:30:28 -0500
- To: "David Orchard" <dorchard@bea.com>
- Cc: www-tag@w3.org
Dave, This is an excellent note that deserves a careful reply. Unfortunately I'm off for two days of travel, and so will try and respond in more detail later in the week. Failing that, we can surely talk during the plenary In the meantime, I want to thank you for the very careful and thoughtful explanation. Given that we've spoken before, little if any of what you've written is a suprise, but it is still very useful to see it set out in such an integrated way. FWIW, I think many of what appear to be disagreements between us are ones of emphasis more than substance. For example, given that we agree that extensibility is an interesting tradeoff, we seem to have different initial intuitions regarding the best emphasis in a TAG finding. I expect that with some discussion we will find common ground on such questions. Mostly, I wanted to thank you for your note, and apologize for not replying in more detail until later. -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "David Orchard" <dorchard@bea.com> 02/21/05 02:46 PM To: <noah_mendelsohn@us.ibm.com>, <www-tag@w3.org> cc: Subject: RE: [XMLVersioning-41] Comments and Suggestions on Draft Extensibility Finding Noah, I think it's great that you are taking such an active interest in the area of extensibility and versioning, and bringing that to the work of the TAG. To try to summarize my lengthy response within 1 sentence: I somewhat disagree with #1, agree with #2, #3, mostly agree with #5, partially agree with #4 pending my comments within, and I propose the next update of the finding should at least focus on the issue brought to the TAG around namespaces and languages. I agree that the architectural issues of language design and e/v are worthy of TAG work. This was one of the planks in my campaign platform. I believe that this includes the relationships between language design, namespaces, components, etc. Indeed, this is why I proposed UML models for the extensibility/versioning finding and for the Web architecture, though I didn't detect overwhelming TAG interest for these at the time. You may notice that these models, and the textual description in the finding, show relationships between languages, terms, namespaces, etc. I am quite keen to work on a finding that helps describe principles related to namespaces (your point #2) and I think the architectural models proposed in the finding can be a starting point. I am interested in exploring language design beyond "simply" xml and xml schema, but I retain the worry that the more abstract the discussion, the smaller the audience or the less useful particular audiences will find the material. The finding already is almost too general for my tastes as I believe that XML Schema is the most popular choice of schema language for xml design. Moving from XML Schema to XML in general was necessary for the finding to proceed previously, but I'm not yet sure that I can support another layer of abstraction or indirection. This is the natural tendency between the abstract and specific that I articulated in the last TAG telcon. For example, it would be necessary to give non-xml examples, which would "bore" any xml schema reader. I say this with certainty because a common comment on the current finding is that it is too abstract and readers would like to see the XML Schema examples up front. When I have presented at conferences on the topic, the audience has very much appreciated the focus on XML Schema. I have certainly thought of more generalization than just XML schema. I have written up at least one of these in a discussion of Protocol extensibility and versioning, I remember numerous discussions about URIs and various rules, and I've also written about XML's design from an e/v perspective. I previously decided that more generalization, while perhaps useful to a wider audience, would take away from the target audience of XML language authors. Further, there is the issue of how much time and energy to devote to various aspects. I still believe that focusing on XML Schema and solving the "problems" of how to extend and version multi-ns languages in a distributed environment is the single most important aspect of this work. I'm always a little concerned about "boiling the ocean" which is a trite way of talking about scope increases. I have expressed my reservations on this part, and I expect that we will have many more discussions on the scoping side. On to Pros and Cons, your first point. I agree that there are pros and cons to extensibility and versioning. That is system design/architecture 100. However, I believe that the XML community is struggling with extensibility and versioning because the "e/v pendulum" swung too far from the HTML style of extensibility to the "draconian error-handling" side. Arguably, we promised the larger community a web style of loose coupling which we haven't achieved. I regularly hear from Web services customers the comment that they were told they should use Web services rather than older distributed object technologies because they get benefits of the Web wrt coupling, and they've been less than happy with what they've found. To a great extent, my goals have been to push the e/v pendulum back towards the middle. Perhaps I have achieved some of that goal given the interest in e/v that is occurring now that did not occur 2 years ago. Yet as it stands, I do not believe that we are building XML based systems that are more loosely coupled now than we were 2 years ago. In general, there are always trade-offs in the "ilities" of systems. Related to this is the technology choices in the systems. Again, I believe that the underlying designs of XML and XML Schema have pushed the pendulum away from what I call "distributed touchless extensibility". Hence I am also leery of moving the extensibility and versioning finding away from a "pro-e/v" message until I see XML based systems being regularly built that are too extensible or too versionable. I agree with providing a more detailed description of general guidelines and the capability of using these to evaluate various language designs. I was leery of doing this in the finding because it could have seemed too much to "critique" various languages - and in fact I removed lengthy sections on what are desirable characteristics and designs from the findings. I should supply a bit of history and rationale for you. I started on this work a number of years ago - the first internal draft was in April 2003 - because of regular customer and working group requests that are roughly of the form "best practices for non-brittle XML Schemas". I started by working on an article, eventually published in XML.com, that focused on a very common case of extensibility and versioning, which is where one piece of software changes and another doesn't. This work led me to believe that there are implicit constraints in the web architecture. These are the constraints of extensibility and versioning, and avoiding the "myth of single administrator". The community has gradually increased it's understanding of REST constraints, but the e/v issues are not part of REST. As such, I believed that they were not as widely understood or deployed as were needed by customers and specification writers. Having started from the point of answering an apparently simple question and progressing to believing that there are fundamental constraints embodied in the web architecture, it made sense to have a finding on e/v in the web architecture. The finding has always had the delicate task of how far to discuss XML Schema. In very early versions, there was very specific information on how to use XML Schema. There were also very specific comments on why the XML Schema constructs were less than desirable, and it proposed a number of suggestions for what could have been done in XML schema and XML, or perhaps could be. Currently, the older XML.com article is no longer available as there is a redirect from the older article at http://www.xml.com/pub/a/2003/12/03/versioning.html to the update. I attach the relevant text towards the end of my additions. As an aside, I'm pleased that the main suggestions were published in 2003 - multiple namespaces, default extensibility, revised extensibility model for ns, etc. - are contained within your note [11]. To conclude, I think we have a lot to talk about and you've made some excellent suggestions. I would like to be focused on the most pressing issues from our community and this seems to be both: 1) the relationship between namespaces and languages and 2) the ongoing XML Schema NG work. Cheers, Dave >>> begin 12/03/versioning.html extract >>>> Why is this hard? We've shown that using XML and W3C XML Schema to achieve loose coupling via compatible changes that fully utilize yet do not require new schema definitions is hard. Following these extensibility rules leads to W3C XML Schema documents that are more cumbersome and at the same time less expressive than one might like. The structural limitations introduced by W3C XML Schema's handling of extensibility are a consequence of W3C XML Schema's design and are not an inherent limitation of schema-based structures. With respect to W3C XML Schema, it would useful to be able to add elements into arbitrary places, such as before other elements, but the determinism constraint constrains this. A less restrictive type of deterministic model could be employed, such as the "greedy" algorithm defined in the URI specification [4]. This would allow optional elements before wildcards and removing the need for the Extension type we introduced. This still does not allow wildcards before elements, as the wildcard would match the elements instead. Further, this still does not allow wildcards and type extension of the type to co-exist. A "priority" wildcard model, where an element that could be matched by a wildcard or an element would match with an element if possible would allow wildcards before and after element declarations. Additionally, a wildcard that only allowed elements that had not been defined - effectively other namespaces plus anything not defined in the target namespace - is another useful model. These changes would also allow cleaner mixing of inheritance and wildcards. But that still means that the author has to sprinkle wildcards throughout their types. A type-level any element combined with the aforementioned wildcard changes is needed. One potential solution is that the sequence declaration could have an attribute specifying that extensions be allowed in any place, then a commensurate attributes specifying namespaces, elements, and validation rules. The problem with even this last approach is that with a specific schema it is sometimes necessary to apply the same schema in a strict or relaxed fashion in different parts of a system. A long-standing rule for the Internet is the Robustness Principle, articulated in the Internet Protocol [3], as "In general, an implementation must be conservative in its sending behavior, and liberal in its receiving behavior". In schema validation terms, a sender can apply a schema in a strict way while a receiver can apply a schema in a relaxed way. In this case, the degree of strictness is not an attribute of the schema, but of how it is used. A solution that appears to solve these problems is defining a form of schema validation that permits an open content model that is used when schemas are versioned. We call this model validation 'by projection', and it works by ignoring, rather than rejecting, component names that appear in a message that are not explicitly defined by the schema. We plan to explore this relaxed validation model in the future. A final comment on XML Schema extensibility is that there is still the unmet need for the ability to define schemas that validate known extensions while retaining extensibility. An author will want to create a schema based upon an extensible schema but mix in other known schemas in particular wildcards while retaining the wildcard extensibility. We encounter this difficulty in areas like describing SOAP header blocks. The topic of composing schemas from many schemas is difficult yet pressing. Leaving the topic of wildcard extensibility, the use of type extension over the web might be more palatable if the instance document could express a base type if the receiver does not understand the extension type, as in xsi:basetype="". The receiver could then fallback to using the basetype if it did not understand the base type's extension. Another area for architectural improvement is that XML - or even XML Schema - could have provided a mustUnderstand model. As things stand, each vocabulary that provides a mustUnderstand model re-invents the mU wheel. XML could have provided an xml:mustUnderstand attribute and model that each language could use. Tim Berners-Lee articulated the need for this in XML in his design note on mandatory extensions in Feb 2000[18], but neither XML 1.0 nor 1.1 included this model. <<< End insert <<< > -----Original Message----- > From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] > Sent: Sunday, February 20, 2005 9:31 AM > To: David Orchard; www-tag@w3.org > Subject: [XMLVersioning-41] Comments and Suggestions on Draft > Extensibility Finding > > Background > ---------- > > Dave Orchard is leading the TAG's effort on extensibility and versioning, > and with help from co-editor Norm Walsh, Dave has been writing an > extensive two part draft finding. Copies of a revised draft were posted > to this list in November, just before the TAG's Cambridge F2F [1]. Few > TAG members read the revisions in time for the meeting, but Dave > did walk us through them. Dan Connolly submitted some comments later [2] > which generated a bit of discussion [3,4]. > > At the meeting, I indicated that I thought the drafts would benefit from > more focus on framing the broader issues relating to versioning, XML and > the Web, perhaps at the expense of some details relating to XML Schema 1.0 > and particular XML versioning idioms. Such broader issues might include: > 1) how versioning and extensibility choices affect the utility and > stability of XML-based Web technologies and 2) investigation of a somewhat > broader range of XML use cases, and 3) deeper exploration of the general > characteristics that we might want from any particular solutions. > > The TAG assigned me an action to make more detailed suggestions, and to > help Dave moving forward. This note is in fulfillment of the first part > that assignment, I.e. to set out some of the directions I'd like to see > explored. I hope to work informally with Dave on whether and how to > integrate these ideas. I'm sure we'll have lots of opportunity to talk at > the plenary. I should say that overall I like a lot of what he and Norm > have written, and I hope these will be viewed as constructive suggestions. > > Overview of Comments, Suggestions, Concerns > ------------------------------------------- > > I. Pros and cons of extensibility > > The "first rule" introduced in the draft is a Good Practice Note (GPN) > that says [5]: "Allow Extensibility rule: Languages SHOULD be designed > for extensibility." Other GPNs advocate specific idioms for doing this. > In my opinion, this somewhat jumps to a conclusion regarding one of the > most difficult and important tradeoffs relating to extensibility: when do > the benefits outweigh the costs? > > I think it's fair to say that some of the most successful Web technologies > have succeeded as much from the ways that they are inflexible as from the > ways that they are extensible. XML, which is arguably a success, had as > one of its original goals: "The number of optional features in XML is to > be kept to the absolute minimum, ideally zero."[6]. Except for the ability > to define your own element and attribute names and choose character > encodings, XML is remarkably inflexible and not particularly extensible. > Sometimes that's frustrating: we couldn't use XML Schema in place of DTDs > in the internal subset, and it's proving very hard to roll out the new > content conventions for XML 1.1. Users rightly value the very high > compatibility that results from XML's inflexibility. Although the draft > correctly cites HTML's open content and "must ignore" tag rules as a > success, there have also been serious interoperability problems as various > vendors exploited that flexibility to introduce their own flavors of HTML. > > > I suspect that similar tradeoffs will apply as XML vocabularies are > designed for other purposes: extensibility tends to stand in opposition > to interoperability, and both are important. I think the finding would be > much stronger if it explored such tradeoffs, and gave some more nuanced > guidance as to when things should be locked down and when they should be > extensible. In fact, such analysis could be one of the essential > contributions of the finding. Yes, the answer is often to provide for > certain forms of extensibility, but we shouldn't recommend that blindly. I > think this is a subtle question that's particularly appropriate to the > scope and mission of the TAG. > > II. Relationship to namespaces > > The recent semi-permathread on immutability of namespaces suggests that > the community would welcome a lucid analysis of the relationship of > namespaces to vocabularies, languages and to versioning of both. Part 2 > of the drafts does discuss various strategies, but the permathread > suggests that the community is looking for >principles< relating to the > immutability or lack thereof of a namespace, principles relating the use > of namespaces to the deployment of language versions and schemas, and > perhaps principles explaining what role if any namespaces should play in > determining how an application should interpret dialects of the > vocabularies that it processes. > > III. Dealing with partial understanding > > The draft introduces definitions like "forwards-compatible" [7]: > > "A language change is forwards compatible if older processors can process > all instances of the newer language." > > It also suggests that [8]: > > "Forwards compatibility can only be achieved by providing a substitution > mechanism for Version 2 instances or Version 1 extensions to V1 without > knowledge of V2. A V1 consumer must be able to transform any instances, > such as V1 + extensions, to a V1 instance in order to process the > instance." > > The finding would be stronger if it stepped up to the fact that processing > is a matter of degree. In an extensible system, it's common that even an > early version of an application will have partial ability to process > features introduced later. Consider a new element introduced into a > vocabulary. Can it be completely ignored, I.e. safely eliminated by a > substitution? Well, I suspect that if there is a signature on the > document then the new element is signed along with the others, even if not > otherwise processed. If you save the document on disk, do you not save > the elements you didn't understand in detail? Maybe; it depends why > you're saving. If you're a SOAP intermediary, do you relay the > misunderstood elements? SOAP gives you an attribute [9] that allows you > to request such relay of content that was not otherwise understood, and > SOAP specifically allows content from such elements to be used as input to > other processing (e.g. digital signatures, logging, etc.). If you have > function to print an XML document, do you print content from the new > element? Perhaps not, but you might also have default printing rules or > heuristics that you could use. The version 4 word processor mentioned in > [7] may indeed successfully read version 5 documents, but may produce > sub-optimal or incorrect output from some of them. All of these are > examples of systems in which partial understanding leads to useful > processing. Furthermore, if two different applications are deployed based > on version 1 of a language, those applications may differ in their ability > to deal with contrstucts that are introduced later. > > I think the drafts jump a bit too quickly to proposals like "a > substitution mechanism" and "mustIgnore", and thus obscure important > issues relating to partial understanding. Indeed, I'm not convinced that > simple substitution mechanisms are the right framework for dealing with > partial interoperation. > > By accurately modeling a more variable notion of compatibility, it also > becomes possible to explore a question that the schema WG has been > considering in detail: how can a schema language help an application to > sort out its different levels of understanding of particular content (e.g. > what the application should store, what it should print, which content > should be processed with what conventions)? Various options have been > suggested, including: (a) because W3C XML schemas uniquely attribute each > element in an instance to a particle in a schema content model, you can > tell which elements were validated by wildcards -- that might suggest > content you can tolerate but don't fully understand; (b) validate various > subsets of the document (different substitutions) against multiple schemas > or in various forms of fallback mode when content is not found to be fully > valid. The point is that, to explore such questions, you have to be very > careful with assumptions about what it means for an application to > "process" an instance, and how such assumptions relate to schema validity. > > > Thus, I think the finding should more carefully deal with partial > understanding of language constructs, and the relationship to schemas. > > IV. Need general guidelines for XML and Schema solutions > > I think it's healthy to set up goals and success criteria separately from > proposed solutions. The draft does some of this, insofar as it makes the > case that flexible extensibility is a goal. I think there are some more > detailed goals that should be set out or considered before getting into > particular XML and Schema idioms. Some that occurred to me are in the > white paper I wrote last year [10,11], including: > > * The same vocabulary may be versioned or fixed repeatedly. Accordingly, > any general approach should be convenient to use even after 20 or 30 such > revisions. Both instances and schemas of the later versions should be > easy to create and use. > > * The versioning mechanisms should (in most cases) not presume particular > instance constructions such as <extension> elements. > > * In some but not in all cases, some degree of forward and/or backward > compatibility is be required: I.e. it should be possible but not > essential to write early schemas that will somehow accept content that is > not fully defined until later, and schemas for later versions will often > but not always validate earlier forms of the vocabulary. (The draft does > cover this one, I think.) > > * Conversely, breaking changes should not in all cases be forbidden. For > example, it may be that an early construct is deprecated at some later > time, and perhaps completely disallowed eventually. Likewise, later > versions may introduce constructs that are rejected outright by earlier > ones. > > * It should be possible to check for or force various sorts of forward or > backward compatibility when desired (this is the notion of partial > recognition and processing, mentioned in III above). > > * Schemas for versions of a vocabulary may but need not form a sequence or > tree, in which later versions somehow directly reference particular schema > documents for earlier versions. This flexibility allows for possible > redefinition of the same vocabulary by multiple organizations or in more > than one schema (e.g. there's a debug schema and a production schema, > neither based explicitly on the other). > > * A consequence of the point above is that the schema for version x is not > necessarily expressed as a delta on or by direct reference to the schema > for version x-1, if in fact the versions form a sequence at all. Such > incremental definition schemes are convenient, but do not necessarily > scale to the case where the same vocabulary is revised 20 or 30 times. In > such a case one would need up to 30 schema documents to assemble the > effective schema. Thus, such incremental schemes should be allowed where > useful, but not presumed in all cases. > > * No unnecessary assumptions should be made regarding the relationships > between vocabularies and XML Namespaces. Often, a vocabulary will be > expressed primarily as a single XML namespace. Often, to maintain forward > and backward compatibility, that same namespace will be used in subsequent > versions as well. Nothing in the overall XML mechanisms to support > versioning (e.g. schema language constructs) should prohibit the use or > coordinated evolution of multiple namespaces to define one or more > languages, the addition of new namespaces in subsequent versions of a > language, etc. (Here I admit I'm staking out a personal position on the > Namespaces question raised in II above). > > The above is NOT necessarily the right list, but I think the finding would > make a contribution if it set out such principles separately from any > proposed solutions. If we do retain a Part 2 that discusses particular > extensibility idioms, then they should each be rated against explicit > goals such as the examples listed above. > > V. The relationship between syntax and semantics > > Though it mentions other options in passing, the finding deals primarily > with examples in which the syntax of the XML more or less directly models > the evolving semantics of the underlying data or application. For > example, a given parent element may allow for elements or attributes to be > introduced to express features of the language as it evolves. This is > indeed a common idiom, and it's appropriate that the drafts explore it. > > Nonetheless, such approaches do not cover the full spectrum of common > mechanisms for versioning XML vocabularies. Perhaps, as in SOAP encoding > or RDF, the XML is a serialization for a higher level model, versioning of > which is not well expressed at the element and attribute level. We should > go into more detail about the implications for XML and schemas, I think. > Sometimes new versions of a language specify coordinated updates to the > use of or constraints on the contents of elements or attributes scattered > throughout a document. Perhaps an attribute changes the meaning of a > legacy element (e.g. currency="peso"). Perhaps the specification of a > SOAP header requires that it be used with other headers (which may be > interspersed with other headers). In all these cases, it becomes > difficult to tell the versioning story entirely in terms of XML elements > and attributes, and it's often problematic to do a useful job of > expressing the pertinent constraints in XML Schema languages. > > In such systems, the extensibility of semantics is only indirectly related > to the syntactic structure of the XML. If the finding is to achieve its > goal of exploring the versioning of XML vocabularies, then it's as > important to either deal with such approaches, or to make the case that > they are not important. I think they will be common and are important. > (BTW: I suspect that "mustIgnore" at the XML level does not cover such > higher level versioning particularly well.) > > Summary > ------- > > Taken together, the above represent a proposal to focus the finding less > on the details of particular XML constructions, and more on the general > versioning and evolution strategies that are likely to be essential to the > Web's and XML's continued success. Indeed, there's some question as to > whether the most useful finding would continue to focus only on XML, or > also might introduce some general principles applicable to many media > types, and then apply those to XML (or RDF, etc.) in particular. I do > recognize that issue XMLVersioning-41 [12] is currently scoped > specifically to XML. > > In general, following the precedent of the Architecture Document [13], we > should explore high-level tradeoffs and principles, somewhat in preference > to making detailed recommendations on syntactic mechanisms. While there's > lots of good work on in the drafts on XML Schema specifics, especially in > Part 2, I think those are only the purview of the TAG insofar as they are > necessary to motivate the broader themes and principles, or are truly > central to the Web's success. > > Other details of ensuring that W3C XML Schema is usable to support > versioning scenarios are explicitly in the charter of the XML Schema WG > [14]; indeed, I'm delighted that the TAG and Schema WG are now working > more closely together. I think the general balance should be that the > Schema WG handles the schema-language-specific parts of the problem, with > help from the TAG, and the TAG discusses the broader architectural issues, > with help from (among others) the Schema WG. > > There remains a question of whether the TAG will choose to do a formal > finding in this area at all. I am cautiously optimistic that we can and > should, but I do feel that our focus should be more on broader themes, > perhaps including those discussed above. I certainly think it's worth > continued effort in the coming weeks to see whether we can do something > that the community would value. > > My recent rereading of the drafts has reminded me once again what a > careful and diligent job Dave has done to take us to this point, and > speaking for myself it is much appreciated! This start will prove to be > very valuable, regardless of how we proceed, or whether any of the > suggestions made above are adopted. I look forward to helping Dave and > Norm in any way that I can to improve the drafts. > > Thank you all for your patience with this long note. > > Noah > > > [1] http://lists.w3.org/Archives/Public/www-tag/2004Nov/0071.html > [2] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0018.html > [3] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0019.html > [4] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0020.html > [5] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att- > 0071/versioning-part1.html#identify > [6] http://www.w3.org/TR/1998/REC-xml-19980210#sec-origin-goals > [7] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att- > 0071/versioning-part1.html#terminology > [8] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att- > 0071/versioning-part1.html#div250901096 > [9] http://www.w3.org/TR/soap12-part1/#soaprelay > [10] http://lists.w3.org/Archives/Public/www-tag/2004Aug/0010.html > [11] http://lists.w3.org/Archives/Public/www-tag/2004Aug/att- > 0010/NRMVersioningProposal.html > [12] http://www.w3.org/2001/tag/issues.html?type=1#XMLVersioning-41 > [13] http://www.w3.org/TR/webarch/ > [14] http://www.w3.org/2003/09/xmlap/xml-schema-wg- > charter.html#Deliverables > > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- > >
Received on Tuesday, 22 February 2005 00:36:27 UTC