[Editorial Draft] Extending and Versioning Languages: Terminology With comments from Noah

Draft TAG Finding 18 May 2007

This version:

http://www.w3.org/2001/tag/doc/versioning-20070518.html ( xml )

Latest version:


Previous versions:

Unapproved Editors Drafts: http://www.w3.org/2001/tag/doc/versioning-20070326.html, http://www.w3.org/2001/tag/doc/versioning-20061212.html, http://www.w3.org/2001/tag/doc/versioning-20060726.html, http://www.w3.org/2001/tag/doc/versioning-20060717.html, http://www.w3.org/2001/tag/doc/versioning-20060710.html, http://www.w3.org/2001/tag/doc/versioning-20031116.htmlhttp://www.w3.org/2001/tag/doc/versioning-20031003.html


David Orchard, BEA Systems, Inc. mailto:David.Orchard@BEA.com


This document provides terminology for discussing language versioning. Separate documents contains versioning strategies and XML language specific discussion.

Status of this Document

This version includes comments from Noah on the first few pages.  I’m circulating this now because it may be several weeks before I get to transcribe more of my comments.

Comments with more asterisks (***) indicate particularly important points or global issues.  Fewer asterisks (*) or none indicate correspondingly less important points.

This document has been developed for discussion by the W3C Technical Architecture Group. It does not yet represent the consensus opinion of the TAG.

Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.

Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Introduction
    1.1 Terminology
        1.1.1 Compatibility
        1.1.2 Partial Understanding
        1.1.3 Divergent Understanding and Compatibility
        1.1.4 Open or Closed systems
        1.1.5 Compatibility of languages vs compatibility of applications
2 Conclusion
3 References
4 Acknowledgements


A Change Log (Non-Normative)


1 Introduction

The evolution of languages by adding, deleting, or changing syntax or semantics is called versioning. Making versioning work in practice is one of the most difficult problems in computing. Arguably, the Web rose dramatically in popularity because evolution and versioning were built into HTML and HTTP provide effective support for extensibility and versioning. Both systems provide explicit extensibility points and rules for understanding extensions that enable their decentralized extension and versioning.

This finding describes terminology of languages and their versioning.

1.1 Terminology

The Suggested terminology for describing languages, producers, consumers, information, constraints, syntax, evolvability etc. follows. Let us consider an example. Two or more systems need to exchange name information about peoples’ names. Names may not be the perfect choice of example because of internationalization reasons, but it resonates strongly with a very large audience[NRM2] . The Name Language is created to be exchanged [NRM3] . [Definition: A producer is an agent that creates text.] Continuing our example, Fred is a producer of Name Language text. [Definition: An Act of Production is the creation of text. [NRM4] ]. A producer produces text for the intent of conveying information. When Fred does the actual creation of the text[NRM5] , that is an act of production. [Definition: A consumer is an agent that consumes text.] We will use Barney and Wilma as consumers of text. [Definition: An Act of Consumption [NRM6] is the processing of text of a language.] Wilma and Barney consume the text separately from each other, each of these [NRM7] being a consumption event. A consumer is impacted by the instance that it consumes. That is, it interprets that instance and bases future processing, in part, on the information that it believes was present in that instance. Text can be consumed many times, by many consumers, and have many different impacts.

[Definition: A Language consists of a set of text, any syntactic constraints on the text[NRM8] , a set of information, any semantic constraints on the information, and the mapping between texts and information.][Definition: Text is a specific, discrete sequence of characters]. [NRM9] Given that there are constraints on a language, aAny particular text may or may not have membership in a language. Indeed, a particular string [NRM10] of characters may be a member of many languages, and there may typically will be be many different strings of characters that are members of a given language. The texts of the a language are the units of exchange between a producer and consumer. [Definition: When a text is the outermost unit of exchange, we call it a document] (documents, in turn may employ use smaller languages internally:  so, for example, a document language might use a number language to represent integer values as strings of digts),

Documents are texts of a language. The Name Language consists of text set that have 3 terms and specifies syntactic constraints: that a name consists of a given and a family. [Definition: A language has a set of constraints that apply to the set of strings in the language.] These constraints can be defined in machine processable syntactic constraint languages such as XML Schema, microformats, human readable textual descriptions such as HTML descriptions, or are embodied in software. Languages may or may not be defined by a schema in any particular schema language. The constraints on a language determine the strings that qualify for membership in the language. Vocabulary terms contribute to the set of strings, but they are not the only source of characters to the set of strings in a given language. The language strings may include characters outside of terms, such as punctuation. One reason for additional characters is to distinguish or separate terms, such as whitespace and markup.






name="Dave Orchard"


<span class="fn">Dave Orchard</span>



The set of information in a language almost always has semantics. [NRM11] In the Name Language, given and family have the semantics of given and family names of people. The language also has the binding from the items in the information set [NRM12] to the text set. Any potential act of interpretation, that is any consumption or production, conveys information from text according to the language's binding. The language is designed for acts of interpretation, that being the purpose of languages[NRM13] . In our example, this mapping is obvious and trivial, but many languages it is not. Two languages may have the exact same strings but different meanings for them. In general, the intended meaning of a vocabulary term is scoped by the language in which the term is found. However, there is some expectation that terms drawn from a given vocabulary will have a consistent meaning across all languages in which they are used. Confusion often arises when terms have inconsistent meaning across language. The Name terms might be used in other languages, but it is generally expected that they will still be "the same" in some meaningful sense.



These terms and their relationships are shown below

Diagram of language terms

We say that Fred engages in an Act of Production that results in a Name Instance with respect to Name Language V1. The Name Instance is in the set of Name V1 Texts, that is the set of strings in the Name Language V1. The production of the Name Instance has the intent of conveying Information, which we call Information 1. This is shown below:

Production instance

We say that Barney engages in an Act of Consumption of a Name Instance with respect to Name Language V1. The consumption of the Name Instance has the impact of conveying Information 1. This is shown below:

Production and consumption instance

Versioning is an issue that effects almost all applications eventually. Whether it's a processor styling documents in batch to produce PDF files, Web services engaged in financial transactions, HTML browsers, the language and instances will likely change over time. The versioning policies for a language, particularly whether the language is mutable or immutable, should be specified by the language owner. Versioning is closely related to extensibility as extensible languages may allow different versions of instances than those known by the language designer. Applications may receive versions of a language that they aren't expecting.

If a Name Language V2 exists, with its set of strings and Information set, Wilma may consume the same Name Instance but with respect to the Name Language V2 and have impact of Information 2. Name Language V2 relates to V1 by relationship r2, which is forwards compatible comparing language V1 to V2 instances, and backwards compatible comparing language V2 to V1 instances. Similarly, Information 2 - as conveyed by Consumption 2 - relates to Information 1 - as conveyed by Consumption 1 - by relationship r1.

Production and 2 Consumptions Instance

Extensibility is a property that enables evolvability of software. It is perhaps the biggest contributor to loose coupling in systems as it enables the independent and potentially compatible evolution of languages. Languages are defined to be [Definition: Extensible if the syntax of a language allows information that is not defined in the current version of the language.]. The Name Language is extensible if it can include terms that aren't defined in the language, like a new middle term.

1.1.1 Compatibility

As languages evolve, it is possible to speak of backwards and forwards compatibility. A language change is backwards compatible if newer processors can process all instances of the old language. Backwards compatibility means that a newer version of a consumer can be rolled out in a way that does not break existing producers. A producer can send an older version of a message to a consumer that understands the new version and still have the message successfully processed. A software example is a word processor at version 5 being able to read and process version 4 documents. A schema example is a schema at version 5 being able to validate version 4 documents. This means that a producer can send an old version of a message to a consumer that understands the new version and still have the message successfully processed. In the case of Web services, this means that new Web services consumers, ones designed for the new version, will be able to process all instances of the old language.

A language change is forwards compatible if older processors can process all instances of the newer language. Forwards compatibility means that a newer version of a producer can be deployed in a way that does not break existing consumers. Of course the older consumer will not implement any new behavior, but a producer can send a newer version of an instance and still have the instance successfully processed. An example is a word processing software at version 4 being able to read and process version 5 documents. A schema example is a schema at version 4 being able to validate version 5 documents. This means that a producer can send a newer version of a message to an existing consumer and still have the message successfully processed. In the case of Web services, this means that existing Web service consumers, designed for a previous version of the language, will be able to process all instances of the new language.

In general, backwards compatibility means that existing texts can be used by updated consumers, and forwards compatibility means that newer texts can be used by existing consumers. Another way of thinking of this is in terms of message exchanges. Backwards compatibility is where the consumer is updated and forwards compatibility is where the producer is updated, as shown below:

Example 2: Evolution of Producers and/or Consumers

Versioning Graphic

With respect to consumers and producers, backwards compatibility means that newer consumers can continue to use existing producers, and forwards compatibility means that existing consumers can be used by newer producers.

We need to be more precise in our definitions of what parts of our definitions are compatible with what other parts. Every language has a Defined Text set, which contains only Texts that contain the texts explicitly defined by the language constraints. Typically, a language will define a mapping from each of the definitions to information. Each language has an Accept Text set, which contains texts that are allowed by the language constraints. Typically, the Accept Text set contains Texts that are not in the Defined Text set and do not have a mapping to information. For example, a language that has a syntax that says names consists of given followed by family followed by anything. A text that consists of a name with only a given and a family falls in the Defined and Accept Text set. A text that consists of a name with a given, a family and an extension such as a middle falls in the Accept Text set but not the Defined text set. By definition, the Accept Text set is a superset of the Defined Text set.

We have discussed backwards and forwards compatibility in general, but there other flavours of compatibility, based upon compatibility between the Accept Text set, Defined Text set and Information conveyed. Syntactic compatibility is compatibility that is wrt the Texts only, not the information conveyed. Because languages have Accept and Defined Text sets, some producers will adhere to the Defined Text set, and others may generate extensions that fall in the Accept Text set. Compatibility with Producers that produce only Defined Text sets is called "strict" compatibility. Compatibility with Producers that may produce Texts in the Accept Text Set that are not in the Defined Text Set is called "full" compatibility.

A more precise definition of compatibility is with respect to the texts, that is whether all the texts in one language are also texts in another language. Another precise form of compatibility is with respect to the information conveyed, that is whether the information conveyed by a text in one language is conveyed by the same text interpreted in another language. The texts could be compatible but the information conveyed is not compatible. For example, the same text could mean different and incompatible things in the different language. Most systems have different layers of software, each of which can view a text differently and affect compatibility. For example, the XML Schema PSVI view is different from the actual text. We can also differentiate between language compatibility and application compatibility. While it is often the case that they are directly related, sometimes they are not, that is 2 languages may be compatible but an application might be incompatible with one of them.

We provide mathematical definitions of a text's compatibility based up on our terminology.

··     Let L1 and L2 be Languages, where L2 is introduced "after" L1.

··     Let T be a text.

··     T is in L1 iff (T is valid per L1 | T is in L1's set of Texts).

··     Let I1 be the information conveyed by Text T1 per language L1.

··     Let I2 be the information conveyed by Text T per language L2.

··     Text T is "fully compatible" with language L2 if and only if I1 is compatible with I2 and (T is valid per L2 | T is in L2's set of Texts).

··     Text T is incompatible if any of the information in I2 is wrong (I.e. replaces a value in I1 with a different one) | (T is invalid per L2 | T is not in L2's set of Texts).

We can also provide mathematical definitions of language compatibility:

··     L2 is "fully backwards compatible" with L1 if every text in L1 Accept Text set is fully compatible with L2.

··     L2 is "strictly backwards compatible" with L1 if every text in L1 Defined Text set is fully compatible with L2.

··     L2 is "strictly backwards incompatible" with L1 if any text in L1 Defined Text set is incompatible with L2.

··     L1 is "fully forwards compatible" with L2 if every text in L2 Accept Text set is fully compatible with L1.

··     L1 is "strictly forwards compatible" with L2 if every text in L2 Defined Text set is fully compatible with L1.

··     L1 is "forwards incompatible with L2" if any text in L2 is incompatible with L1.

··     And combined together is: L1 is strictly compatible with L2 if every text in L2 Defined Text set is fully compatible with L1 AND if every text in L1 Defined Text set is fully compatible with L2.

We can draw a few conclusions. Given L2 is strictly backwards incompatible with L1 if any text in L1 Defined Text set is incompatible with L2, the only way that L2 can be backwards compatible with L1 is if the L2 Defined Text Set is a superset of L1 Defined Text set. Roughly, that means the addition of optional items in L2. Given L1 is "fully forwards compatible" with L2 if every text in L2 Accept Text set is fully compatible with L1, the only way that L1 can be forwards compatible with L2 is if the L1 Accept Text is is a superset of the L2 Accept Text set. Roughly, that means L1 allows all of L2 and more. It is this superset relationship that is a key to forwards compatibility, the allowing of texts by L1 that will become defined in L2.

Compatibility can be restated in terms of superset/subset relationships.

··     Language L2 is strictly backwards compatible with Language L1 if L2 Defined Text set > (superset) L1 Defined Text Set AND every text in L1 Defined Text set is compatible with L2.

··     Language L1 is strictly forwards compatible with Language L2 if L1 Accept Text set > (superset) Language L2 Accept Text set AND every text in L2 Accept Text set is compatible with L1.

··     Language L2 is fully strictly compatible with Language L1 if L1 Accept Text set > (superset) Language L2 Accept Text set > (superset) L2 Defined Text set > (superset) L1 Defined Text Set AND every text in L1 Defined Text set is compatible with L2 AND every text in L2 Accept Text set is compatible with L1.

We have shown that forwards and backwards compatibility is only achievable through extensibility, and compatible versioning is a process of gradually increasing the Defined Text Set, reducing or not changing the Accept Text Set, and ensuring the information conveyed is compatible, If ever the set relationships defined earlier do not hold, then the versions are not compatible.


An article on xml.com describes this theory of compatibility and provides graphical representation of the set relationships, at http://www.xml.com/pub/a/2006/12/20/a-theory-of-compatible-versions.html. Should part/all of the article be included in this document? Composition

Many languages are compound languages consisting of multiple languages. For example, a purchase order language could use the name language for names. The forwards, backwards and full compatibility definitions account for composition of languages because the used languages defined and accept sets are incorporated into the language. For example, the purchase order language Accept Set is the Accept Set of all the items defined OR used by the Purchase Order language, which includes the Accept Set of the name language.

1.1.2 Partial Understanding

So far, we have defined compatibility over all possible expressions of a language and we’ve been discussing full compatibility. However, there are many scenarios where a consumer may consume only part of the information set. Such partial understanding affects the Text set used and the Information conveyed. Partial understanding usually results in a subset of the information being conveyed, because only part of the information is understood by the consumer. Interestingly, such partial understanding consists of an increase or supersetting of the Accept Text Set and a parallel decrease or subsetting of the Defined Text Set. This is because the process of extracting a part of the text means that extra content, even that which was illegal under the earlier version’s syntax, becomes part of the Accept Text Set.

An example is application that only looks at given names and ignores everything else. My favourite example of this is a "Baby Name" Wizard. The application might use a simple XPath expression to extract the given name from inside a name. The result is effectively a different version of the Name Language, which we will call the Given Name Language. The Accept Text set for the Given Name Language is anything, given, anything. The Defined Text set for the Given Name Language is given. The information set for the Given Name language is given. Because the Given Name Language syntax set is more relaxed that the Name Language V1, an addition of the middle name between the given and family is a compatible change for the Given Name Language. There are a variety of other now acceptable names in the Given Name Language.

The principles of compatibility and language with respect to versioning need not change to deal with partial understanding. Partial understanding of a language can be thought of as the creation of a new Language L1' that is compatible with Language L1. This is true if L1' Accept Text set > (superset) Language L1 Accept Text set > (superset) L1 Defined Text set > (superset) L1' Defined Text Set AND every text in L1' Defined Text set is compatible with L1 AND every text in L1' Accept Text set is compatible with L1'.

Interestingly, partially understanding a language is euivalent to creating a language V1', such that the V1 language is a compatible change with the V1'. There may be many different versions that are all partial understandings of a language. We call these related languages "flavours". It may be very difficult for a language designer to know how many different language flavours are in existence. However, a language designer can sometimes use the different flavours to their advantage in designing for a mixture of compatible and incompatible changes. Some changes could be compatible with some flavours but incompatible with others. It may be very useful to have some changes be compatible with some flavours, since consumers of those flavours do not need to be updated or changed.

It is crucial to point out that any consumer of the language does not produce a partially version of the language. A client may have relaxed the restrictions on the consuming side, but no producer should do so on the production side of the language. If a flavour of a language was also used for production, it should have to create an instance that is valid according to the Language V1 rules, not the Language V1'. Perhaps the only exceptions are if they are guaranteed that they will be producing for compatible flavours. Typically this is not the case and hard to determine, so the safest course is to produce according to the Language V1 rules.

We have shown how relaxing the constraints on a language when consuming instances of it can turn an otherwise incompatible change into a compatible change. We have also shown that abiding by the language constraints when producing instances is the safest course. Said more eloquently is the internet robustness principle, "be conservative in what you do, be liberal in what you accept from others" from [tcp].

We will call this style of versioning the "liberal" style of versioning. The "liberal" style of versioning is codified in:

Good Practice

Use Least Partial Languages for "liberal" versioning: Consumers should use a flavour of a language that has the least amount of understanding.

The flavor of a language that implies the smallest amount of understanding will be also be the most liberal and have the largest possible Accept set.

The "liberal" style of versioning has a drawback, however. It can lead to fragile software that is hard to evolve software because the "liberal"ness is difficult to code. In addition, it does not force producers to be correct in what they produce and can lead to a vicious cycle of complexity.


More Information on pros/cons needed. Perhaps change best practices to positions, then make best practices for the different scenarios, ie. Use Conservative Versioning for languages that are only processed by machines, Use Liberal Versioning for languages processed typically by humans or....

There is an opposite style of versioning that says the most effective way of evolving is to force producers to be correct by having strict consumers. We will call this the "conservative" style of versioning. The "conservative" style of versioning is codified in:

Good Practice

Use Only Full Languages for "conservative" versioning: Consumers should fully use and validate a language.

The greatest amount of understanding of a language will find the largest number of errors in producers.

Whether "liberal" or "conservative" versioning is in use by consumers, the advice to producers is always the same:

Good Practice

Produce only full languages: Producers should produce the complete version of a language. They should never produce partial flavors.

"Liberal" consumers may allow correct operation with producers that aren't fully compatible with a full language. "Conservative" consumers will be less tolerant of faults in producers.

EdNote: I think related to principle of least power (http://www.w3.org/2001/tag/doc/leastPower) . The lower the power of the language, the easier to have partial understanding? For example, XPath is "lower power" than Java processing the DOM.

Compatibility is defined between the producer and consumer of an individual text. Most messaging specifications, such as Web Services, utilise both inputs and outputs. Using our definitions of compatibility, a Web service that updates its output message language is considered to be a newer producer, because it is sending a newer version of the message. Conversely, updating the input message language makes a service a newer consumer because it is consuming a newer version of the message. All systems that include inputs and outputs must consider both when making changes and determining compatibility. For full compatibility, any output message changes must be forwards compatible. This means that older consumers can receive them successfully. Similarly, input message changes must be backwards compatible, so that they can be received successfully from older producers.

1.1.3 Divergent Understanding and Compatibility

Our treatise so far has described a fairly straightforward evolution of a language, from a first version to a next version. However, extensibility and interoperability are usually directly related. It is an axiom in computing that the lower the optionality (which includes extensibility), the higher the chance of interoperability. Each and every place where extensibility is allowed in a language is also a place where a lack of interoperability can arise Interoperability problems can arise, for example, when producers and consumers do not agree on which version of a language is being used in a text.

Even though a language has a formal definition and extensibility model, there is no guarantee that software that processes it will implement it exactly. Differences in understanding between different software agents is a significant source of divergence in understanding. A classic example of this is the so-called "TAG soup" problem in HTML. Much of the applications commonly used to process HTML, and particularly browsers, have an Accept Text Set that is larger than the formal definition of the HTML language. For example, many situations of interleaved opening and closing of elements in HTML are processed without generating an error. This ensures the user experience, at least in the short term, is of higher quality. However, it does suffer long term problems with interoperability when the illegal texts are copied by mechanisms such as "view source". The reason is that the more undocumented strings that are in an Accept Text Set, the more difficult it is to achieve interoperability. The more liberal an agent in accepting texts by increasing the Accept Text Set through expanding the definition of the language, the more difficult interoperability is because not every agent may have the same Accept Text Set.

At the other extreme is XML. XML allows almost no extensibility in its constructs. Name characters, Tag closures, attribute quoting and attribute allowed values are all very fixed. This has increased interoperability between implementations of XML. However, it has also made it very difficult to move to XML 1.1 because almost all changes are incompatible because of the lack of extensibility. The XML language design was very specifically trying to avoid the "HTML TAGSoup" problem, and it has arguably done that, at a cost of inability to version. These two extremes of design of extensibility exist because of well-thought design. The trade-off between extensibility, interoperability and the Accept Set was planned in advance. Language designers should do the same with their languages.


Which references to support this? TAG issue: http://www.w3.org/2001/tag/issues#TagSoupIntegration-54

Good Practice

Analyze Trade-offs for Language: Language designers should analyze the trade-offs between extensibility, interoperability, and actual language Accept Set.

1.1.4 Open or Closed systems

The cost of changes that are not backward or forward compatible is often very high. All the software that uses the affected language must be updated to support the newer version. The magnitude of the associated cost is directly related to whether the system in question is open or closed.

[Definition: A closed system is one in which all of the producers and consumers are more-or-less tightly connected and under the control of a single organization.] Closed systems can often provide integrity constraints across the entire system. A traditional database is a good example of a closed system: all of the database schemas are known at once, all of the tables are known to conform to the appropriate schema, and all of the elements in each row are known to be valid for the schema to which the table conforms.

From a versioning perspective, it might be practical, in a closed system, to say that a new version of a particular language is being introduced into the system at a specific time. At that time, all of the data that conforms to the previous version of the language is migrated to conform to the new version as part of the upgrade process.

[Definition: An open system is one in which some producers and consumers are loosely connected or are not controlled by the same organization. The internet is a good example of an open system.]

In an open system, it's simply not practical to handle language evolution with universal, simultaneous, atomic upgrades to all of the affected software components. Existing producers and consumers, who are outside the immediate control of the organization that is publishing an updated language, may continue to use the previous version for a considerable period of time.

Finally, it's important to remember that systems evolve over time and have different requirements at different stages in their life cycle. Often, early versions of systems will operate as if they are closed. During initial development, when the earliest versions of a language are under construction, it may be valuable to pursue a much more aggressive, draconian versioning strategy. Once a system is more widely deployed, in production it tends to behave more as an open system. There is likely to be an expectation of stability in the language it provides. Consequently, it may be necessary to proceed with more caution and to be prepared to provide forwards and backwards compatibility for changes.

1.1.5 Compatibility of languages vs compatibility of applications

From NoahM:The draft is on pretty firm ground when it talks about the information that can be determined from a given input text per some particular language L. I think there are important compatibility statements we can and should make at just that level (see suggestions above), and we should separate them from statements about the compatibility of a particular pair of applications that may communicate using the language. Both are important to include, I think, but they should be in separate chapters, one building on the other. Once you've cleanly told a story about which information can be reliably communicated when sender and receiver interpret using different language versions, you can go on to tell a separate story about whether the applications can indeed work well together. To illustrate what I mean, here are examples at each of the two levels.

Language level incompatibility: Consider a situation in which the same input connotes different information in one version of a language or another. Without reference to any particular application, we can say that the languages are in that respect incompatible. For example, we might imagine a version of a language in which array indexing is 1-based, and a later version in which 0-based indexing is used; the information conveyed by any particular array reference is clearly in some sense incompatible, regardless of the consuming application's needs.

Application-level incompatibility: Now consider two applications designed render the same version of the HTML language. The same tags are supported, with the same layout semantics, etc. One of the applications, however, has a sub-optimal design. Its layout engine has overhead that grows geometrically with the number of layout elements. If you give it a table with 50 rows, it takes 3 seconds to run on some procesor. If you give it a table with 5000 rows it runs for 3 days. Question: is the second application "compatible" with the 5000 row input? In some ways yes, and in some no. It will eventually produce the correct output, but in practice a user would consider it incompatible. This illustrates that compatibility of applications ultimately has to be documented in terms meaningful to the applications. In this case, rendering time is an issue. I think we should not try in this finding to document specific levels of compatibility at the application level and we should especially not fall into the trap of trying to claim it's a Boolean compatible/incompatible relation; in the performance example, it's a matter of degree. So, the terminology needs to be specific to the application and its domain. I do think we can talk about some meta-mechanisms that work at the application level, such as mustUnderstand, but they should be in a section that's separate from the exposition of texts, information, and the degree to which information may be safely extracted from a given text when sender and receiver operate under differing specifications.

From Rhys: Clearly this is important, but I’m not sure we necessarily need to go to this level of completeness in a TAG finding. Feels like there is a book in this!

The current draft tries to take the approach that we will model application compatibility by defining a new language that is the flavor of (in this case HTML) that a particular consumer will successfully process, but the point is that "success" is sometimes a fuzzy concept. Do we have two languages for this example, one for the documents that completely break application #2 and another for those that just make it run slowly? That seems to be what the finding is doing today, and I'm not convinced it's the right approach. My proposal would be that we just point out the distinction and say: "This first section of the finding for the most part restricts its analysis to the limited question of: what information can be reliably conveyed when a producer and a consumer operate using different versions of what purport to be the same or similar languages? The later sections explore some techniques that can be used by applications to negotiate means of safe interoperation when sender and receiver are written to differing versions of a language specification."

From Rhys: I like Noah’s suggested approach because this is really the problem we are setting out to solve for specific types of XML usage

2 Conclusion

This Finding is intended to provide a terminology basis for further versioning findings.

3 References


Free Online Dictionary of Computing. (See http://wombat.doc.ic.ac.uk/foldoc/.)


Flexible XML Processing Profile. (See http://www.upnp.org/download/draft-goland-fxpp-01.txt.)


RFC 793, TCP (See http://www.ietf.org/rfc/rfc793.txt.)


RFC 1521, MIME. (See http://www.ietf.org/rfc/rfc1521.txt.)

HTML 2.0

RFC 1866, HTML 2.0. (See http://www.ietf.org/rfc/rfc1866.txt.)

WebDAV XMLIgnore post

Yaron GolandXML Ignore proposed for WebDAV (See http://lists.w3.org/Archives/Public/w3c-dist-auth/1997AprJun/0190.html.)


RFC 2518, WebDAV (See http://www.ietf.org/rfc/rfc2518.txt.)


RFC 2616, HTTP (See http://www.ietf.org/rfc/rfc2616.txt.)

HTML 4.0

HTML 4.0. (See http://www.w3.org/TR/1998/REC-html40-19980424/.)

TBL Mandatory Extensions

Berners-Lee. Web Architecture: Mandatory extensions. (See http://www.w3.org/DesignIssues/Mandatory.html.)

TBL Extensible languages

Berners-Lee. Web Architecture: Extensible languages. (See http://www.w3.org/DesignIssues/Extensible.html.)

TBL Evolution

Berners-Lee. Web Architecture: Evolvability. (See http://www.w3.org/DesignIssues/Evolution.html.)

Web Architecture: Extensible Languages

Berners-Lee and Connolly, ed. Web Architecture: Extensible Languages World Wide Web Consortium, 1998. (See http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210.)

HTML Document types

Connolly, ed. HTML Document dialects World Wide Web Consortium, 1996. (See http://www.w3.org/MarkUp/WD-doctypes.)

SOAP 1.2

W3C Recommendation, SOAP 1.2 Part 1: Messaging Framework (See http://www.w3.org/TR/SOAP/.)

WSDL 1.1

W3C Note, WSDL 1.1 (See http://www.w3.org/TR/WSDL/.)

WS-Policy 1.2

W3C Note, WS-Policy 1.2 (See http://www.w3.org/Submissions/WS-Policy/.)

XML 1.0

W3C Recommendation, XML 1.0 (See http://www.w3.org/TR/REC-xml.)


W3C Working Draft, XML Inclusions (See http://www.w3.org/TR-Xinclude.)

XML Namespaces

W3C Recommendation, XML Namespaces (See http://www.w3.org/TR/REC-xml-names.)

XML Schema Part 2

W3C Recommendation, XML Schema, Part 2 (See http://www.w3.org/TR/xmlschema-2.)

XML Schema Wildcard Test Collection

XML Schema Wildcard Test collection (See http://www.w3.org/XML/2001/05/xmlschema-test-collection/result-ms-wildcards.htm.)

XFront Schema Best Practices

XFront Schema Best Practices (See http://www.xfront.com/BestPracticesHomepage.html.)

XML.com Schema Design Patterns

Dare ObasanjoXML.com Schema design patterns (See http://www.xml.com/pub/a/2002/07/03/schema_design.html.)

Dave Orchard writings on Extensibility and Versioning

Dave Orchard writings on extensibility and versioning (See http://www.pacificspirit.com/Authoring/Compatibility.)

4 Acknowledgements

The author thanks Norm Walsh for many contributions as co-editor until 2005. Also thanks the many reviewers that have contributed to the document particularly David Bau, William Cox, Ed Dumbill, Chris Ferris, Yaron Goland, Rhys Lewis, Hal Lockhart, Mark Nottingham, Jeffrey Schlimmer, Cliff Schmidt, and Norman Walsh.

A Change Log (Non-Normative)







Incorporated Rhys' comments, added version identifier story to forwards compatible evolution, split part 1 into terminology and strategies documents.


 [NRM1]General comments


·        **Not all languages are at the full document level.  We should be clear that the finding applies to any language consisting of sets of texts, whether whole document, subtree in XML, just the text content of some tag (e.g. the format of a floating point number), or plain text files.

·        Do you want to reference RFC 2119?

·        * I think you need to define “TEXT” before this.  There is a formal definition, but it comes after this first use

·        **The relationship between information sets and semantics seems to be unclear, yet this one or both of them is crucial to the story you’re telling about versioning.


 [NRM2]I still don’t think it’s the best example, but I think we’ve agreed to disagree on that.  So, I’ll assume it stays.

* [NRM3]No.  The language is not exchanged.  Maybe: “Documents conformant with the name language are intended for exchange between computer applications.”

 [NRM4]Define TEXT first.

 [NRM5]What do we mean by “act of creation”?  Encoding as a string of bits? Deciding what the content is going to be?

 [NRM6]Sounds too much like it relates to tuberculosis.

 [NRM7]Antecedent of “these” is ambiguous.  Furthermore, the sentence parses to suggest that something is an event, but the nouns and noun phrases earlier in the sentence are Wilma, Barney and “each other”, none of which could be an event.

*** [NRM8]No, no, no.  We keep disagreeing on this, and others (Stuart?) have disagreed as well.  I strongly believe we’ll do better to say: The language is a set of texts and the mapping of texts to information.  Once you know the texts, the constraints are redundant.   Same for information resulting from the mappings, and constraints that those results happen to obey.


I would buy saying:  “one way to specify the set of texts that comprise a language is intensionally, using a constraint language such as regular expressions, W3C XML schemas, etc.  In other cases, a language may be defined extensionally, by just listing the texts that are legal (this is often practical with very small languages, such as “red” “yellow” “green”, which might be legal texts for a language capturing the state of a traffic light.

 [NRM9]Definition of TEXT is needed earlier.

 [NRM10]Why have we switched from text to string of chars?

 [NRM11]***Whoa!  You’re saying “information has semantics”?  Either that’s wrong, or it needs more careful explanation.  I think I know what you’re trying to say, but first of all it’s unclear, and secondly I’m not sure we want to get into semantics.  If we can just formulate an explanation of which documents convey the same information (the last name value is Mendelsohn) I think that’s enough to tell the versioning story.  Going into semantics (in Western societies, one part of the name is traditionally taken from the father, so saying that the last name field is Mendelsohn suggests that I actually have such a family name) we should most avoid, I think.  Semantics is important, but I think that if we show how to convey information reliably, others can build semantics and reasoning on top of that.

*** [NRM12]What’s an Information Set? It seems to be a key abstraction, but you[‘re using it without introducing it.

 [NRM13]First of all, this seems a bit smug in tone.  Secondly , it’s not entirely correct.  Languages are intended, I think, as a means by which information can be set down or encoded, as well as to support interpretation.    I’d kill the whole sentence.