RE: [XMLVersioning-41] Comments and Suggestions on Draft Extensibility Finding

Noah,

I think it's great that you are taking such an active interest in the
area of extensibility and versioning, and bringing that to the work of
the TAG.  To try to summarize my lengthy response within 1 sentence: I
somewhat disagree with #1, agree with #2, #3, mostly agree with #5,
partially agree with #4 pending my comments within, and I propose the
next update of the finding should at least focus on the issue brought to
the TAG around namespaces and languages.  

I agree that the architectural issues of language design and e/v are
worthy of TAG work.  This was one of the planks in my campaign platform.
I believe that this includes the relationships between language design,
namespaces, components, etc.  Indeed, this is why I proposed UML models
for the extensibility/versioning finding and for the Web architecture,
though I didn't detect overwhelming TAG interest for these at the time.
You may notice that these models, and the textual description in the
finding, show relationships between languages, terms, namespaces, etc.
I am quite keen to work on a finding that helps describe principles
related to namespaces (your point #2) and I think the architectural
models proposed in the finding can be a starting point.

I am interested in exploring language design beyond "simply" xml and xml
schema, but I retain the worry that the more abstract the discussion,
the smaller the audience or the less useful particular audiences will
find the material.  The finding already is almost too general for my
tastes as I believe that XML Schema is the most popular choice of schema
language for xml design.  Moving from XML Schema to XML in general was
necessary for the finding to proceed previously, but I'm not yet sure
that I can support another layer of abstraction or indirection.  This is
the natural tendency between the abstract and specific that I
articulated in the last TAG telcon.  For example, it would be necessary
to give non-xml examples, which would "bore" any xml schema reader.  I
say this with certainty because a common comment on the current finding
is that it is too abstract and readers would like to see the XML Schema
examples up front.  When I have presented at conferences on the topic,
the audience has very much appreciated the focus on XML Schema.  I have
certainly thought of more generalization than just XML schema.  I have
written up at least one of these in a discussion of Protocol
extensibility and versioning, I remember numerous discussions about URIs
and various rules, and I've also written about XML's design from an e/v
perspective.  I previously decided that more generalization, while
perhaps useful to a wider audience, would take away from the target
audience of XML language authors.  

Further, there is the issue of how much time and energy to devote to
various aspects.  I still believe that focusing on XML Schema and
solving the "problems" of how to extend and version multi-ns languages
in a distributed environment is the single most important aspect of this
work.  I'm always a little concerned about "boiling the ocean" which is
a trite way of talking about scope increases. 

I have expressed my reservations on this part, and I expect that we will
have many more discussions on the scoping side.  

On to Pros and Cons, your first point.  I agree that there are pros and
cons to extensibility and versioning.  That is system
design/architecture 100.  However, I believe that the XML community is
struggling with extensibility and versioning because the "e/v pendulum"
swung too far from the HTML style of extensibility to the "draconian
error-handling" side.  Arguably, we promised the larger community a web
style of loose coupling which we haven't achieved.  I regularly hear
from Web services customers the comment that they were told they should
use Web services rather than older distributed object technologies
because they get benefits of the Web wrt coupling, and they've been less
than happy with what they've found.  To a great extent, my goals have
been to push the e/v pendulum back towards the middle.  Perhaps I have
achieved some of that goal given the interest in e/v that is occurring
now that did not occur 2 years ago.  Yet as it stands, I do not believe
that we are building XML based systems that are more loosely coupled now
than we were 2 years ago.  

In general, there are always trade-offs in the "ilities" of systems.
Related to this is the technology choices in the systems.  Again, I
believe that the underlying designs of XML and XML Schema have pushed
the pendulum away from what I call "distributed touchless
extensibility".  Hence I am also leery of moving the extensibility and
versioning finding away from a "pro-e/v" message until I see XML based
systems being regularly built that are too extensible or too
versionable.

I agree with providing a more detailed description of general guidelines
and the capability of using these to evaluate various language designs.
I was leery of doing this in the finding because it could have seemed
too much to "critique" various languages - and in fact I removed lengthy
sections on what are desirable characteristics and designs from the
findings.

I should supply a bit of history and rationale for you.  I started on
this work a number of years ago - the first internal draft was in April
2003 - because of regular customer and working group requests that are
roughly of the form "best practices for non-brittle XML Schemas".  I
started by working on an article, eventually published in XML.com, that
focused on a very common case of extensibility and versioning, which is
where one piece of software changes and another doesn't.  

This work led me to believe that there are implicit constraints in the
web architecture.  These are the constraints of extensibility and
versioning, and avoiding the "myth of single administrator".  The
community has gradually increased it's understanding of REST
constraints, but the e/v issues are not part of REST.  As such, I
believed that they were not as widely understood or deployed as were
needed by customers and specification writers.  

Having started from the point of answering an apparently simple question
and progressing to believing that there are fundamental constraints
embodied in the web architecture, it made sense to have a finding on e/v
in the web architecture.  The finding has always had the delicate task
of how far to discuss XML Schema.  In very early versions, there was
very specific information on how to use XML Schema.  There were also
very specific comments on why the XML Schema constructs were less than
desirable, and it proposed a number of suggestions for what could have
been done in XML schema and XML, or perhaps could be.  Currently, the
older XML.com article is no longer available as there is a redirect from
the older article at http://www.xml.com/pub/a/2003/12/03/versioning.html
to the update.  I attach the relevant text towards the end of my
additions.  As an aside, I'm pleased that the main suggestions were
published in 2003 - multiple namespaces, default extensibility, revised
extensibility model for ns, etc. - are contained within your note [11].

To conclude, I think we have a lot to talk about and you've made some
excellent suggestions.   I would like to be focused on the most pressing
issues from our community and this seems to be both: 1) the relationship
between namespaces and languages and 2) the ongoing XML Schema NG work.


Cheers,
Dave

>>> begin 12/03/versioning.html extract >>>>
Why is this hard?
We've shown that using XML and W3C XML Schema to achieve loose coupling
via compatible changes that fully utilize yet do not require new schema
definitions is hard.  Following these extensibility rules leads to W3C
XML Schema documents that are more cumbersome and at the same time less
expressive than one might like.  The structural limitations introduced
by W3C XML Schema's handling of extensibility are a consequence of W3C
XML Schema's design and are not an inherent limitation of schema-based
structures.


With respect to W3C XML Schema, it would useful to be able to add
elements into arbitrary places, such as before other elements, but the
determinism constraint constrains this.  A less restrictive type of
deterministic model could be employed, such as the "greedy" algorithm
defined in the URI specification [4].  This would allow optional
elements before wildcards and removing the need for the Extension type
we introduced.  This still does not allow wildcards before elements, as
the wildcard would match the elements instead.  Further, this still does
not allow wildcards and type extension of the type to co-exist.  A
"priority" wildcard model, where an element that could be matched by a
wildcard or an element would match with an element if possible would
allow wildcards before and after element declarations.  Additionally, a
wildcard that only allowed elements that had not been defined -
effectively other namespaces plus anything not defined in the target
namespace - is another useful model.  These changes would also allow
cleaner mixing of inheritance and wildcards.  But that still means that
the author has to sprinkle wildcards throughout their types.  A
type-level any element combined with the aforementioned wildcard changes
is needed.  One potential solution is that the sequence declaration
could have an attribute specifying that extensions be allowed in any
place, then a commensurate attributes specifying namespaces, elements,
and validation rules.   

The problem with even this last approach is that with a specific schema
it is sometimes necessary to apply the same schema in a strict or
relaxed fashion in different parts of a system.  A long-standing rule
for the Internet is the Robustness Principle, articulated in the
Internet Protocol [3], as "In general, an implementation must be
conservative in its sending behavior, and liberal in its receiving
behavior".  In schema validation terms, a sender can apply a schema in a
strict way while a receiver can apply a schema in a relaxed way.  In
this case, the degree of strictness is not an attribute of the schema,
but of how it is used. A solution that appears to solve these problems
is defining a form of schema validation that permits an open content
model that is used when schemas are versioned.  We call this model
validation 'by projection', and it works by ignoring, rather than
rejecting, component names that appear in a message that are not
explicitly defined by the schema.  We plan to explore this relaxed
validation model in the future.

A final comment on XML Schema extensibility is that there is still the
unmet need for the ability to define schemas that validate known
extensions while retaining extensibility.  An author will want to create
a schema based upon an extensible schema but mix in other known schemas
in particular wildcards while retaining the wildcard extensibility.  We
encounter this difficulty in areas like describing SOAP header blocks.
The topic of composing schemas from many schemas is difficult yet
pressing.

Leaving the topic of wildcard extensibility, the use of type extension
over the web might be more palatable if the instance document could
express a base type if the receiver does not understand the extension
type, as in xsi:basetype="". The receiver could then fallback to using
the basetype if it did not understand the base type's extension.

Another area for architectural improvement is that XML - or even XML
Schema - could have provided a mustUnderstand model.  As things stand,
each vocabulary that provides a mustUnderstand model re-invents the mU
wheel.  XML could have provided an xml:mustUnderstand attribute and
model that each language could use.  Tim Berners-Lee articulated the
need for this in XML in his design note on mandatory extensions in Feb
2000[18], but neither XML 1.0 nor 1.1 included this model. 

<<< End insert <<<


> -----Original Message-----
> From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com]
> Sent: Sunday, February 20, 2005 9:31 AM
> To: David Orchard; www-tag@w3.org
> Subject: [XMLVersioning-41] Comments and Suggestions on Draft
> Extensibility Finding
> 
> Background
> ----------
> 
> Dave Orchard is leading the TAG's effort on extensibility and
versioning,
> and with help from co-editor Norm Walsh, Dave has been writing an
> extensive two part draft finding.  Copies of a revised draft were
posted
> to this list in November, just before the TAG's Cambridge F2F  [1].
Few
> TAG members read the revisions in time for the meeting, but Dave
> did walk us through them.  Dan Connolly submitted some comments later
[2]
> which generated a bit of discussion [3,4].
> 
> At the meeting, I indicated that I thought the drafts would benefit
from
> more focus on framing the broader issues relating to versioning, XML
and
> the Web, perhaps at the expense of some details relating to XML Schema
1.0
> and particular XML versioning idioms.  Such broader issues might
include:
> 1) how versioning and extensibility choices affect the utility and
> stability of XML-based Web technologies and 2) investigation of a
somewhat
> broader range of XML use cases, and 3) deeper exploration of the
general
> characteristics that we might want from any particular solutions.
> 
> The TAG assigned me an action to make more detailed suggestions, and
to
> help Dave moving forward.  This note is in fulfillment of the first
part
> that assignment, I.e. to set out some of the directions I'd like to
see
> explored.  I hope to work informally with Dave on whether and how to
> integrate these ideas.  I'm sure we'll have lots of opportunity to
talk at
> the plenary.  I should say that overall I like a lot of what he and
Norm
> have written, and I hope these will be viewed as constructive
suggestions.
> 
> Overview of Comments, Suggestions, Concerns
> -------------------------------------------
> 
> I. Pros and cons of extensibility
> 
> The "first rule" introduced in the draft is a Good Practice Note (GPN)
> that says [5]:  "Allow Extensibility rule: Languages SHOULD be
designed
> for extensibility."  Other GPNs advocate specific idioms for doing
this.
> In my opinion, this somewhat jumps to a conclusion regarding one of
the
> most difficult and important tradeoffs relating to extensibility:
when do
> the benefits outweigh the costs?
> 
> I think it's fair to say that some of the most successful Web
technologies
> have succeeded as much from the ways that they are inflexible as from
the
> ways that they are extensible.  XML, which is arguably a success, had
as
> one of its original goals: "The number of optional features in XML is
to
> be kept to the absolute minimum, ideally zero."[6]. Except for the
ability
> to define your own element and attribute names and choose character
> encodings, XML is remarkably inflexible and not particularly
extensible.
> Sometimes that's frustrating:  we couldn't use XML Schema in place of
DTDs
> in the internal subset, and it's proving very hard to roll out the new
> content conventions for XML 1.1.  Users rightly value the very high
> compatibility that results from XML's inflexibility.  Although the
draft
> correctly cites HTML's open content and "must ignore" tag rules as a
> success, there have also been serious interoperability problems as
various
> vendors exploited that flexibility to introduce their own flavors of
HTML.
> 
> 
> I suspect that similar tradeoffs will apply as XML vocabularies are
> designed for other purposes:  extensibility tends to stand in
opposition
> to interoperability, and both are important.  I think the finding
would be
> much stronger if it explored such tradeoffs, and gave some more
nuanced
> guidance as to when things should be locked down and when they should
be
> extensible.  In fact, such analysis could be one of the essential
> contributions of the finding.  Yes, the answer is often to provide for
> certain forms of extensibility, but we shouldn't recommend that
blindly. I
> think this is a subtle question that's particularly appropriate to the
> scope and mission of the TAG.
> 
> II. Relationship to namespaces
> 
> The recent semi-permathread on immutability of namespaces suggests
that
> the community would welcome a lucid analysis of the relationship of
> namespaces to vocabularies, languages and to versioning of both.
Part 2
> of the drafts does discuss various strategies, but the permathread
> suggests that the community is looking for >principles< relating to
the
> immutability or lack thereof of a namespace, principles relating the
use
> of namespaces to the deployment of language versions and schemas, and
> perhaps principles explaining what role if any namespaces should play
in
> determining how an application should interpret dialects of the
> vocabularies that it processes.
> 
> III. Dealing with partial understanding
> 
> The draft introduces definitions like "forwards-compatible" [7]:
> 
> "A language change is forwards compatible if older processors can
process
> all instances of the newer language."
> 
> It also suggests that [8]:
> 
> "Forwards compatibility can only be achieved by providing a
substitution
> mechanism for Version 2 instances or Version 1 extensions to V1
without
> knowledge of V2.  A V1 consumer must be able to transform any
instances,
> such as V1 + extensions, to a V1 instance in order to process the
> instance."
> 
> The finding would be stronger if it stepped up to the fact that
processing
> is a matter of degree.  In an extensible system, it's common that even
an
> early version of an application will have partial ability to process
> features introduced later.  Consider a new element introduced into a
> vocabulary.  Can it be completely ignored, I.e. safely eliminated by a
> substitution?  Well, I suspect that if there is a signature on the
> document then the new element is signed along with the others, even if
not
> otherwise processed.  If you save the document on disk, do you not
save
> the elements you didn't understand in detail?  Maybe; it depends why
> you're saving.  If you're a SOAP intermediary, do you relay the
> misunderstood elements?  SOAP gives you an attribute [9] that allows
you
> to request such relay of content that was not otherwise understood,
and
> SOAP specifically allows content from such elements to be used as
input to
> other processing (e.g. digital signatures, logging, etc.).  If you
have
> function to print an XML document, do you print content from the new
> element?  Perhaps not, but you might also have default printing rules
or
> heuristics that you could use.  The version 4 word processor mentioned
in
> [7] may indeed successfully read version 5 documents, but may produce
> sub-optimal or incorrect output from some of them.  All of these are
> examples of systems in which partial understanding leads to useful
> processing.  Furthermore, if two different applications are deployed
based
> on version 1 of a language, those applications may differ in their
ability
> to deal with contrstucts that are introduced later.
> 
> I think the drafts jump a bit too quickly to proposals like "a
> substitution mechanism" and "mustIgnore", and thus obscure important
> issues relating to partial understanding.  Indeed, I'm not convinced
that
> simple substitution mechanisms are the right framework for dealing
with
> partial interoperation.
> 
> By accurately modeling a more variable notion of compatibility, it
also
> becomes possible to explore a question that the schema WG has been
> considering in detail:  how can a schema language help an application
to
> sort out its different levels of understanding of particular content
(e.g.
> what the application should store, what it should print, which content
> should be processed with what conventions)?  Various options have been
> suggested, including:  (a) because W3C XML schemas uniquely attribute
each
> element in an instance to a particle in a schema content model, you
can
> tell which elements were validated by wildcards -- that might suggest
> content you can tolerate but don't fully understand; (b) validate
various
> subsets of the document (different substitutions) against multiple
schemas
> or in various forms of fallback mode when content is not found to be
fully
> valid.  The point is that, to explore such questions, you have to be
very
> careful with assumptions about what it means for an application to
> "process" an instance, and how such assumptions relate to schema
validity.
> 
> 
> Thus, I think the finding should more carefully deal with partial
> understanding of language constructs, and the relationship to schemas.
> 
> IV. Need general guidelines for XML and Schema solutions
> 
> I think it's healthy to set up goals and success criteria separately
from
> proposed solutions.  The draft does some of this, insofar as it makes
the
> case that flexible extensibility is a goal.  I think there are some
more
> detailed goals that should be set out or considered before getting
into
> particular XML and Schema idioms.  Some that occurred to me are in the
> white paper I wrote last year [10,11], including:
> 
> * The same vocabulary may be versioned or fixed repeatedly.
Accordingly,
> any general approach should be convenient to use even after 20 or 30
such
> revisions.  Both instances and schemas of the later versions should be
> easy to create and use.
> 
> * The versioning mechanisms should (in most cases) not presume
particular
> instance constructions such as <extension> elements.
> 
> * In some but not in all cases, some degree of forward and/or backward
> compatibility is be required:  I.e. it should be possible but not
> essential to write early schemas that will somehow accept content that
is
> not fully defined until later, and schemas for later versions will
often
> but not always validate earlier forms of the vocabulary.  (The draft
does
> cover this one, I think.)
> 
> * Conversely, breaking changes should not in all cases be forbidden.
For
> example, it may be that an early construct is deprecated at some later
> time, and perhaps completely disallowed eventually.  Likewise, later
> versions may introduce constructs that are rejected outright by
earlier
> ones.
> 
> * It should be possible to check for or force various sorts of forward
or
> backward compatibility when desired (this is the notion of partial
> recognition and processing, mentioned in III above).
> 
> * Schemas for versions of a vocabulary may but need not form a
sequence or
> tree, in which later versions somehow directly reference particular
schema
> documents for earlier versions.  This flexibility allows for possible
> redefinition of the same vocabulary by multiple organizations or in
more
> than one schema (e.g. there's a debug schema and a production schema,
> neither based explicitly on the other).
> 
> * A consequence of the point above is that the schema for version x is
not
> necessarily expressed as a delta on or by direct reference to the
schema
> for version x-1, if in fact the versions form a sequence at all.  Such
> incremental definition schemes are convenient, but do not necessarily
> scale to the case where the same vocabulary is revised 20 or 30 times.
In
> such a case one would need up to 30 schema documents to assemble the
> effective schema.   Thus, such incremental schemes should be allowed
where
> useful, but not presumed in all cases.
> 
> * No unnecessary assumptions should be made regarding the
relationships
> between vocabularies and XML Namespaces.   Often, a vocabulary will be
> expressed primarily as a single XML namespace.  Often, to maintain
forward
> and backward compatibility, that same namespace will be used in
subsequent
> versions as well.  Nothing in the overall XML mechanisms to support
> versioning (e.g. schema language constructs) should prohibit the use
or
> coordinated evolution of multiple namespaces to define one or more
> languages, the addition of new namespaces in subsequent versions of a
> language, etc. (Here I admit I'm staking out a personal position on
the
> Namespaces question raised in II above).
> 
> The above is NOT necessarily the right list, but I think the finding
would
> make a contribution if it set out such principles separately from any
> proposed solutions.  If we do retain a Part 2 that discusses
particular
> extensibility idioms, then they should each be rated against explicit
> goals such as the examples listed above.
> 
> V. The relationship between syntax and semantics
> 
> Though it mentions other options in passing, the finding deals
primarily
> with examples in which the syntax of the XML more or less directly
models
> the evolving semantics of the underlying data or application.  For
> example, a given parent element may allow for elements or attributes
to be
> introduced to express features of the language as it evolves.  This is
> indeed a common idiom, and it's appropriate that the drafts explore
it.
> 
> Nonetheless, such approaches do not cover the full spectrum of common
> mechanisms for versioning XML vocabularies.  Perhaps, as in SOAP
encoding
> or RDF, the XML is a serialization for a higher level model,
versioning of
> which is not well expressed at the element and attribute level.  We
should
> go into more detail about the implications for XML and schemas, I
think.
> Sometimes new versions of a language specify coordinated updates to
the
> use of or constraints on the contents of elements or attributes
scattered
> throughout a document.  Perhaps an attribute changes the meaning of a
> legacy element (e.g. currency="peso").  Perhaps the specification of a
> SOAP header requires that it be used with other headers (which may be
> interspersed with other headers).  In all these cases, it becomes
> difficult to tell the versioning story entirely in terms of XML
elements
> and attributes, and it's often problematic to do a useful job of
> expressing the pertinent constraints in XML Schema languages.
> 
> In such systems, the extensibility of semantics is only indirectly
related
> to the syntactic structure of the XML.  If the finding is to achieve
its
> goal of exploring the versioning of XML vocabularies, then it's as
> important to either deal with such approaches, or to make the case
that
> they are not important.  I think they will be common and are
important.
> (BTW: I suspect that "mustIgnore" at the XML level does not cover such
> higher level versioning particularly well.)
> 
> Summary
> -------
> 
> Taken together, the above represent a proposal to focus the finding
less
> on the details of particular XML constructions, and more on the
general
> versioning and evolution strategies that are likely to be essential to
the
> Web's and XML's continued success.  Indeed, there's some question as
to
> whether the most useful finding would continue to focus only on XML,
or
> also might introduce some general principles applicable to many media
> types, and then apply those to XML (or RDF, etc.) in particular.  I do
> recognize that issue XMLVersioning-41 [12] is currently scoped
> specifically to XML.
> 
> In general, following the precedent of the Architecture Document [13],
we
> should explore high-level tradeoffs and principles, somewhat in
preference
> to making detailed recommendations on syntactic mechanisms.  While
there's
> lots of good work on in the drafts on XML Schema specifics, especially
in
> Part 2, I think those are only the purview of the TAG insofar as they
are
> necessary to motivate the broader themes and principles, or are truly
> central to the Web's success.
> 
> Other details of ensuring that W3C XML Schema is usable to support
> versioning scenarios are explicitly in the charter of the XML Schema
WG
> [14];  indeed, I'm delighted that the TAG and Schema WG are now
working
> more closely together.  I think the general balance should be that the
> Schema WG handles the schema-language-specific parts of the problem,
with
> help from the TAG, and the TAG discusses the broader architectural
issues,
> with help from (among others) the Schema WG.
> 
> There remains a question of whether the TAG will choose to do a formal
> finding in this area at all.  I am cautiously optimistic that we can
and
> should, but I do feel that our focus should be more on broader themes,
> perhaps including those discussed above.  I certainly think it's worth
> continued effort in the coming weeks to see whether we can do
something
> that the community would value.
> 
> My recent rereading of the drafts has reminded me once again what a
> careful and diligent job Dave has done to take us to this point, and
> speaking for myself it is much appreciated!  This start will prove to
be
> very valuable, regardless of how we proceed, or whether any of the
> suggestions made above are adopted.  I look forward to helping Dave
and
> Norm in any way that I can to improve the drafts.
> 
> Thank you all for your patience with this long note.
> 
> Noah
> 
> 
> [1] http://lists.w3.org/Archives/Public/www-tag/2004Nov/0071.html
> [2] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0018.html
> [3] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0019.html
> [4] http://lists.w3.org/Archives/Public/www-tag/2005Jan/0020.html
> [5] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att-
> 0071/versioning-part1.html#identify
> [6] http://www.w3.org/TR/1998/REC-xml-19980210#sec-origin-goals
> [7] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att-
> 0071/versioning-part1.html#terminology
> [8] http://lists.w3.org/Archives/Public/www-tag/2004Nov/att-
> 0071/versioning-part1.html#div250901096
> [9] http://www.w3.org/TR/soap12-part1/#soaprelay
> [10] http://lists.w3.org/Archives/Public/www-tag/2004Aug/0010.html
> [11] http://lists.w3.org/Archives/Public/www-tag/2004Aug/att-
> 0010/NRMVersioningProposal.html
> [12] http://www.w3.org/2001/tag/issues.html?type=1#XMLVersioning-41
> [13] http://www.w3.org/TR/webarch/
> [14] http://www.w3.org/2003/09/xmlap/xml-schema-wg-
> charter.html#Deliverables
> 
> 
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
> 
> 

Received on Monday, 21 February 2005 19:49:47 UTC