[Editorial Draft] Extending and Versioning Languages: Terminology With
comments from Noah
This version:
http://www.w3.org/2001/tag/doc/versioning-20070518.html
( xml )
Latest version:
http://www.w3.org/2001/tag/doc/versioning
Previous versions:
Unapproved Editors Drafts: http://www.w3.org/2001/tag/doc/versioning-20070326.html,
http://www.w3.org/2001/tag/doc/versioning-20061212.html,
http://www.w3.org/2001/tag/doc/versioning-20060726.html,
http://www.w3.org/2001/tag/doc/versioning-20060717.html,
http://www.w3.org/2001/tag/doc/versioning-20060710.html,
http://www.w3.org/2001/tag/doc/versioning-20031116.htmlhttp://www.w3.org/2001/tag/doc/versioning-20031003.html
Editor:
David Orchard, BEA Systems, Inc. mailto:David.Orchard@BEA.com
Copyright
© 2003 W3C® (MIT, INRIA,
Keio), All Rights Reserved. W3C liability,
trademark,
document
use, and software
licensing rules apply.
This document provides terminology for
discussing language versioning. Separate documents contains versioning
strategies and XML language specific discussion.
This version includes comments from Noah on the
first few pages. I’m circulating this
now because it may be several weeks before I get to transcribe more of my
comments.
Comments with more asterisks
(***) indicate particularly important points or global
issues. Fewer asterisks (*) or none
indicate correspondingly less important points.
This document has been developed for discussion by the W3C Technical Architecture Group. It
does not yet represent the consensus opinion of the TAG.
Publication
of this finding does not imply endorsement by the W3C Membership. This is a
draft document and may be updated, replaced or obsoleted by other documents at
any time.
Additional TAG findings, both
approved and in draft state, may also be available. The TAG expects to
incorporate this and other findings into a Web Architecture Document that will
be published according to the process of the W3C Recommendation
Track.
Please
send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).
1 Introduction
1.1 Terminology
1.1.1 Compatibility
1.1.1.1
Composition
1.1.2 Partial
Understanding
1.1.3 Divergent
Understanding and Compatibility
1.1.4 Open
or Closed systems
1.1.5 Compatibility
of languages vs compatibility of applications
2 Conclusion
3 References
4 Acknowledgements
A Change Log
(Non-Normative)
The evolution of languages by adding,
deleting, or changing syntax or semantics is called versioning. Making
versioning work in practice is one of the most difficult problems in computing.
Arguably, the Web rose dramatically in popularity because evolution and
versioning were built into HTML and HTTP provide effective
support for extensibility and versioning. Both systems provide
explicit extensibility points and rules for understanding extensions that
enable their decentralized extension and versioning.
This finding describes terminology of
languages and their versioning.
The Suggested
terminology for describing languages, producers, consumers,
information, constraints, syntax, evolvability etc. follows. Let us consider an
example. Two or more systems need to exchange name information
about peoples’ names. Names may not be the perfect choice of
example because of internationalization reasons, but it resonates strongly with
a very large audience[NRM2]. The Name Language is created to be exchanged [NRM3]. [Definition: A producer
is an agent that creates text.] Continuing our example, Fred is a producer of
Name Language text. [Definition: An Act
of Production is the creation of text. [NRM4]]. A producer
produces text for the intent of conveying information. When Fred does the
actual creation of the text[NRM5], that is an
act of production. [Definition: A consumer
is an agent that consumes text.] We will use Barney and Wilma as consumers of
text. [Definition: An Act of
Consumption [NRM6]is the
processing of text of a language.] Wilma and Barney consume the text separately
from each other, each of these [NRM7]being a
consumption event. A consumer is impacted by the instance that it consumes.
That is, it interprets that instance and bases future processing, in part, on
the information that it believes was present in that instance. Text can be
consumed many times, by many consumers, and have many different impacts.
[Definition: A Language consists of a
set of text, any
syntactic constraints on the text[NRM8], a set of
information, any semantic constraints on the information, and the mapping
between texts and information.][Definition:
Text is a specific, discrete sequence of characters]. [NRM9]Given
that there are constraints on a language, aAny
particular text may or may not have membership in a language. Indeed, a
particular string [NRM10]of
characters may be a member of many languages, and there may typically
will be be many different strings of characters that are members
of a given language. The texts of the
a language are the units of exchange
between a producer and consumer. [Definition: When a
text is the outermost unit of exchange, we call it a document] (documents,
in turn may employ use smaller
languages internally: so, for example, a
document language might use a number language to represent integer values as
strings of digts),
Documents
are texts of a language. The Name Language consists of text set
that have 3 terms and specifies syntactic constraints: that a name consists of
a given and a family. [Definition: A
language has a set of constraints that apply to the set of strings in
the language.] These constraints can be defined in machine processable
syntactic constraint languages such as XML Schema, microformats, human readable
textual descriptions such as HTML descriptions, or are embodied in software.
Languages may or may not be defined by a schema in any particular schema
language. The constraints on a language determine the strings that qualify for
membership in the language. Vocabulary terms contribute to the set of strings,
but they are not the only source of characters to the set of strings in a given
language. The language strings may include characters outside of terms, such as
punctuation. One reason for additional characters is to distinguish or separate
terms, such as whitespace and markup.
<name>
<given>Dave</given>
<family>Orchard</family>
</name>
name="Dave
Orchard"
<span
class="fn">Dave Orchard</span>
urn:namescheme:given:Dave:family:Orchard
The set of information in a language almost
always has semantics. [NRM11]In the Name Language, given and family have
the semantics of given and family names of people. The language also has the
binding from the items in the information set [NRM12]to the text set. Any potential act of
interpretation, that is any consumption or production, conveys information from
text according to the language's binding. The language is designed for acts of
interpretation, that
being the purpose of languages[NRM13]. In our example, this mapping is obvious and
trivial, but many languages it is not. Two languages may have the exact same strings
but different meanings for them. In general, the intended meaning of a
vocabulary term is scoped by the language in which the term is found. However,
there is some expectation that terms drawn from a given vocabulary will have a
consistent meaning across all languages in which they are used. Confusion often
arises when terms have inconsistent meaning across language. The Name terms
might be used in other languages, but it is generally expected that they will
still be "the same" in some meaningful sense.
*****TRANSCRIBED COMMENTS END HERE FOR NOW*********
These terms and their relationships are
shown below
We say that Fred engages in an Act of
Production that results in a Name Instance with respect to Name Language V1.
The Name Instance is in the set of Name V1 Texts, that is the set of strings in
the Name Language V1. The production of the Name Instance has the intent of
conveying Information, which we call Information 1. This is shown below:
We say that Barney engages in an Act of
Consumption of a Name Instance with respect to Name Language V1. The consumption
of the Name Instance has the impact of conveying Information 1. This is shown
below:
Versioning is an issue that effects
almost all applications eventually. Whether it's a processor styling documents
in batch to produce PDF files, Web services engaged in financial transactions,
HTML browsers, the language and instances will likely change over time. The
versioning policies for a language, particularly whether the language is
mutable or immutable, should be specified by the language owner. Versioning is
closely related to extensibility as extensible languages may allow different
versions of instances than those known by the language designer. Applications
may receive versions of a language that they aren't expecting.
If a Name Language V2 exists, with its
set of strings and Information set, Wilma may consume the same Name Instance
but with respect to the Name Language V2 and have impact of Information 2. Name
Language V2 relates to V1 by relationship r2, which is forwards compatible
comparing language V1 to V2 instances, and backwards compatible comparing
language V2 to V1 instances. Similarly, Information 2 - as conveyed by
Consumption 2 - relates to Information 1 - as conveyed by Consumption 1 - by
relationship r1.
Extensibility is a property that
enables evolvability of software. It is perhaps the biggest contributor to
loose coupling in systems as it enables the independent and potentially
compatible evolution of languages. Languages are defined to be [Definition: Extensible
if the syntax of a language allows information that is not defined in the
current version of the language.]. The Name Language is extensible if it can
include terms that aren't defined in the language, like a new middle term.
As languages evolve, it is possible to
speak of backwards and forwards compatibility. A language change is backwards
compatible if newer processors can process all instances of the old language.
Backwards compatibility means that a newer version of a consumer can be rolled
out in a way that does not break existing producers. A producer can send an
older version of a message to a consumer that understands the new version and
still have the message successfully processed. A software example is a word
processor at version 5 being able to read and process version 4 documents. A
schema example is a schema at version 5 being able to validate version 4
documents. This means that a producer can send an old version of a message to a
consumer that understands the new version and still have the message
successfully processed. In the case of Web services, this means that new Web
services consumers, ones designed for the new version, will be able to process
all instances of the old language.
A language change is forwards
compatible if older processors can process all instances of the newer language.
Forwards compatibility means that a newer version of a producer can be deployed
in a way that does not break existing consumers. Of course the older consumer
will not implement any new behavior, but a producer can send a newer version of
an instance and still have the instance successfully processed. An example is a
word processing software at version 4 being able to read and process version 5
documents. A schema example is a schema at version 4 being able to validate
version 5 documents. This means that a producer can send a newer version of a
message to an existing consumer and still have the message successfully
processed. In the case of Web services, this means that existing Web service
consumers, designed for a previous version of the language, will be able to
process all instances of the new language.
In general, backwards compatibility
means that existing texts can be used by updated consumers, and forwards
compatibility means that newer texts can be used by existing consumers. Another
way of thinking of this is in terms of message exchanges. Backwards
compatibility is where the consumer is updated and forwards compatibility is
where the producer is updated, as shown below:
Example 2: Evolution of Producers and/or Consumers
With
respect to consumers and producers, backwards compatibility means that newer
consumers can continue to use existing producers, and forwards compatibility
means that existing consumers can be used by newer producers.
We
need to be more precise in our definitions of what parts of our definitions are
compatible with what other parts. Every language has a Defined Text set, which
contains only Texts that contain the texts explicitly defined by the language
constraints. Typically, a language will define a mapping from each of the
definitions to information. Each language has an Accept Text set, which
contains texts that are allowed by the language constraints. Typically, the
Accept Text set contains Texts that are not in the Defined Text set and do not
have a mapping to information. For example, a language that has a syntax that
says names consists of given followed by family followed by anything. A text
that consists of a name with only a given and a family falls in the Defined and
Accept Text set. A text that consists of a name with a given, a family and an
extension such as a middle falls in the Accept Text set but not the Defined
text set. By definition, the Accept Text set is a superset of the Defined Text
set.
We
have discussed backwards and forwards compatibility in general, but there other
flavours of compatibility, based upon compatibility between the Accept Text
set, Defined Text set and Information conveyed. Syntactic compatibility is
compatibility that is wrt the Texts only, not the information conveyed. Because
languages have Accept and Defined Text sets, some producers will adhere to the
Defined Text set, and others may generate extensions that fall in the Accept
Text set. Compatibility with Producers that produce only Defined Text sets is
called "strict" compatibility. Compatibility with Producers that may
produce Texts in the Accept Text Set that are not in the Defined Text Set is
called "full" compatibility.
A
more precise definition of compatibility is with respect to the texts, that is
whether all the texts in one language are also texts in another language.
Another precise form of compatibility is with respect to the information
conveyed, that is whether the information conveyed by a text in one language is
conveyed by the same text interpreted in another language. The texts could be
compatible but the information conveyed is not compatible. For example, the
same text could mean different and incompatible things in the different
language. Most systems have different layers of software, each of which can
view a text differently and affect compatibility. For example, the XML Schema
PSVI view is different from the actual text. We can also differentiate between
language compatibility and application compatibility. While it is often the
case that they are directly related, sometimes they are not, that is 2 languages
may be compatible but an application might be incompatible with one of them.
We
provide mathematical definitions of a text's compatibility based up on our
terminology.
··
Let L1 and L2 be Languages, where L2 is
introduced "after" L1.
··
Let T be a text.
··
T is in L1 iff (T is valid per L1 | T
is in L1's set of Texts).
··
Let I1 be the information conveyed by
Text T1 per language L1.
··
Let I2 be the information conveyed by
Text T per language L2.
··
Text T is "fully compatible" with
language L2 if and only if I1 is compatible with I2 and (T is valid per L2 | T
is in L2's set of Texts).
··
Text T is incompatible if any of the
information in I2 is wrong (I.e. replaces a value in I1 with a different one) |
(T is invalid per L2 | T is not in L2's set of Texts).
We
can also provide mathematical definitions of language compatibility:
··
L2 is "fully backwards
compatible" with L1 if every text in L1 Accept Text set is fully
compatible with L2.
··
L2 is "strictly backwards
compatible" with L1 if every text in L1 Defined Text set is fully
compatible with L2.
··
L2 is "strictly backwards
incompatible" with L1 if any text in L1 Defined Text set is incompatible
with L2.
··
L1 is "fully forwards
compatible" with L2 if every text in L2 Accept Text set is fully
compatible with L1.
··
L1 is "strictly forwards
compatible" with L2 if every text in L2 Defined Text set is fully
compatible with L1.
··
L1 is "forwards incompatible with
L2" if any text in L2 is incompatible with L1.
··
And combined together is: L1 is strictly
compatible with L2 if every text in L2 Defined Text set is fully compatible
with L1 AND if every text in L1 Defined Text set is fully compatible with L2.
We
can draw a few conclusions. Given L2 is strictly backwards incompatible with L1
if any text in L1 Defined Text set is incompatible with L2, the only way that
L2 can be backwards compatible with L1 is if the L2 Defined Text Set is a
superset of L1 Defined Text set. Roughly, that means the addition of optional
items in L2. Given L1 is "fully forwards compatible" with L2 if every
text in L2 Accept Text set is fully compatible with L1, the only way that L1
can be forwards compatible with L2 is if the L1 Accept Text is is a superset of
the L2 Accept Text set. Roughly, that means L1 allows all of L2 and more. It is
this superset relationship that is a key to forwards compatibility, the
allowing of texts by L1 that will become defined in L2.
Compatibility
can be restated in terms of superset/subset relationships.
··
Language L2 is strictly backwards
compatible with Language L1 if L2 Defined Text set > (superset) L1 Defined
Text Set AND every text in L1 Defined Text set is compatible with L2.
··
Language L1 is strictly forwards
compatible with Language L2 if L1 Accept Text set > (superset) Language L2
Accept Text set AND every text in L2 Accept Text set is compatible with L1.
··
Language L2 is fully strictly
compatible with Language L1 if L1 Accept Text set > (superset) Language L2
Accept Text set > (superset) L2 Defined Text set > (superset) L1 Defined
Text Set AND every text in L1 Defined Text set is compatible with L2 AND every
text in L2 Accept Text set is compatible with L1.
We
have shown that forwards and backwards compatibility is only achievable through
extensibility, and compatible versioning is a process of gradually increasing
the Defined Text Set, reducing or not changing the Accept Text Set, and
ensuring the information conveyed is compatible, If ever the set relationships
defined earlier do not hold, then the versions are not compatible.
ed-note
An article on xml.com describes this theory of compatibility and provides
graphical representation of the set relationships, at
http://www.xml.com/pub/a/2006/12/20/a-theory-of-compatible-versions.html.
Should part/all of the article be included in this document?
Many languages are compound languages
consisting of multiple languages. For example, a purchase order language could
use the name language for names. The forwards, backwards and full compatibility
definitions account for composition of languages because the used languages
defined and accept sets are incorporated into the language. For example, the
purchase order language Accept Set is the Accept Set of all the items defined
OR used by the Purchase Order language, which includes the Accept Set of the
name language.
So far, we have defined compatibility
over all possible expressions of a language and we’ve been discussing full compatibility.
However, there are many scenarios where a consumer may consume only part of the
information set. Such partial understanding affects the Text set used and the
Information conveyed. Partial understanding usually results in a subset of the
information being conveyed, because only part of the information is understood
by the consumer. Interestingly, such partial understanding consists of an
increase or supersetting of the Accept Text Set and a parallel decrease or
subsetting of the Defined Text Set. This is because the process of extracting a
part of the text means that extra content, even that which was illegal under
the earlier version’s syntax, becomes part of the Accept Text Set.
An example is application that only
looks at given names and ignores everything else. My favourite example of this
is a "Baby Name" Wizard. The application might use a simple XPath
expression to extract the given name from inside a name. The result is
effectively a different version of the Name Language, which we will call the
Given Name Language. The Accept Text set for the Given Name Language is
anything, given, anything. The Defined Text set for the Given Name Language is
given. The information set for the Given Name language is given. Because the
Given Name Language syntax set is more relaxed that the Name Language V1, an
addition of the middle name between the given and family is a compatible change
for the Given Name Language. There are a variety of other now acceptable names
in the Given Name Language.
The principles of compatibility and
language with respect to versioning need not change to deal with partial
understanding. Partial understanding of a language can be thought of as the
creation of a new Language L1' that is compatible with Language L1. This is true
if L1' Accept Text set > (superset) Language L1 Accept Text set >
(superset) L1 Defined Text set > (superset) L1' Defined Text Set AND every
text in L1' Defined Text set is compatible with L1 AND every text in L1' Accept
Text set is compatible with L1'.
Interestingly, partially understanding
a language is euivalent to creating a language V1', such that the V1 language
is a compatible change with the V1'. There may be many different versions that
are all partial understandings of a language. We call these related languages
"flavours". It may be very difficult for a language designer to know
how many different language flavours are in existence. However, a language
designer can sometimes use the different flavours to their advantage in
designing for a mixture of compatible and incompatible changes. Some changes
could be compatible with some flavours but incompatible with others. It may be
very useful to have some changes be compatible with some flavours, since
consumers of those flavours do not need to be updated or changed.
It is crucial to point out that any
consumer of the language does not produce a partially version of the language.
A client may have relaxed the restrictions on the consuming side, but no
producer should do so on the production side of the language. If a flavour of a
language was also used for production, it should have to create an instance
that is valid according to the Language V1 rules, not the Language V1'. Perhaps
the only exceptions are if they are guaranteed that they will be producing for
compatible flavours. Typically this is not the case and hard to determine, so
the safest course is to produce according to the Language V1 rules.
We have shown how relaxing the
constraints on a language when consuming instances of it can turn an otherwise
incompatible change into a compatible change. We have also shown that abiding
by the language constraints when producing instances is the safest course. Said
more eloquently is the internet robustness principle, "be conservative in
what you do, be liberal in what you accept from others" from [tcp].
We
will call this style of versioning the "liberal" style of versioning.
The "liberal" style of versioning is codified in:
Good Practice
Use Least Partial Languages for "liberal"
versioning: Consumers should use a flavour of a language that has the least
amount of understanding.
The
flavor of a language that implies the smallest amount of understanding will be
also be the most liberal and have the largest possible Accept set.
The
"liberal" style of versioning has a drawback, however. It can lead to
fragile software that is hard to evolve software because the
"liberal"ness is difficult to code. In addition, it does not force
producers to be correct in what they produce and can lead to a vicious cycle of
complexity.
ednote
More Information on pros/cons needed. Perhaps change best practices to
positions, then make best practices for the different scenarios, ie. Use
Conservative Versioning for languages that are only processed by machines, Use
Liberal Versioning for languages processed typically by humans or....
There
is an opposite style of versioning that says the most effective way of evolving
is to force producers to be correct by having strict consumers. We will call
this the "conservative" style of versioning. The
"conservative" style of versioning is codified in:
Good Practice
Use Only Full Languages for "conservative" versioning:
Consumers should fully use and validate a language.
The
greatest amount of understanding of a language will find the largest number of
errors in producers.
Whether
"liberal" or "conservative" versioning is in use by
consumers, the advice to producers is always the same:
Good Practice
Produce only full languages: Producers should produce the
complete version of a language. They should never produce partial flavors.
"Liberal"
consumers may allow correct operation with producers that aren't fully compatible
with a full language. "Conservative" consumers will be less tolerant
of faults in producers.
EdNote:
I think related to principle of least power
(http://www.w3.org/2001/tag/doc/leastPower) . The lower the power of the
language, the easier to have partial understanding? For example, XPath is
"lower power" than Java processing the DOM.
Compatibility
is defined between the producer and consumer of an individual text. Most
messaging specifications, such as Web Services, utilise both inputs and outputs.
Using our definitions of compatibility, a Web service that updates its output
message language is considered to be a newer producer, because it is sending a
newer version of the message. Conversely, updating the input message language
makes a service a newer consumer because it is consuming a newer version of the
message. All systems that include inputs and outputs must consider both when
making changes and determining compatibility. For full compatibility, any
output message changes must be forwards compatible. This means that older
consumers can receive them successfully. Similarly, input message changes must
be backwards compatible, so that they can be received successfully from older
producers.
Our treatise so far has described a
fairly straightforward evolution of a language, from a first version to a next
version. However, extensibility and interoperability are usually directly
related. It is an axiom in computing that the lower the optionality (which includes
extensibility), the higher the chance of interoperability. Each and every place
where extensibility is allowed in a language is also a place where a lack of
interoperability can arise Interoperability problems can arise, for example,
when producers and consumers do not agree on which version of a language is
being used in a text.
Even though a language has a formal
definition and extensibility model, there is no guarantee that software that processes
it will implement it exactly. Differences in understanding between different
software agents is a significant source of divergence in understanding. A
classic example of this is the so-called "TAG soup" problem in HTML.
Much of the applications commonly used to process HTML, and particularly
browsers, have an Accept Text Set that is larger than the formal definition of
the HTML language. For example, many situations of interleaved opening and
closing of elements in HTML are processed without generating an error. This
ensures the user experience, at least in the short term, is of higher quality.
However, it does suffer long term problems with interoperability when the
illegal texts are copied by mechanisms such as "view source". The
reason is that the more undocumented strings that are in an Accept Text Set,
the more difficult it is to achieve interoperability. The more liberal an agent
in accepting texts by increasing the Accept Text Set through expanding the
definition of the language, the more difficult interoperability is because not
every agent may have the same Accept Text Set.
At the other extreme is XML. XML allows
almost no extensibility in its constructs. Name characters, Tag closures,
attribute quoting and attribute allowed values are all very fixed. This has
increased interoperability between implementations of XML. However, it has also
made it very difficult to move to XML 1.1 because almost all changes are
incompatible because of the lack of extensibility. The XML language design was
very specifically trying to avoid the "HTML TAGSoup" problem, and it
has arguably done that, at a cost of inability to version. These two extremes
of design of extensibility exist because of well-thought design. The trade-off
between extensibility, interoperability and the Accept Set was planned in
advance. Language designers should do the same with their languages.
ed-note
Which references to
support this? TAG issue:
http://www.w3.org/2001/tag/issues#TagSoupIntegration-54
Good Practice
Analyze Trade-offs for Language:
Language designers should analyze the trade-offs between extensibility,
interoperability, and actual language Accept Set.
The cost of changes that are not
backward or forward compatible is often very high. All the software that uses
the affected language must be updated to support the newer version. The
magnitude of the associated cost is directly related to whether the system in
question is open or closed.
[Definition: A closed
system is one in which all of the producers and consumers are more-or-less
tightly connected and under the control of a single organization.] Closed
systems can often provide integrity constraints across the entire system. A
traditional database is a good example of a closed system: all of the database
schemas are known at once, all of the tables are known to conform to the
appropriate schema, and all of the elements in each row are known to be valid
for the schema to which the table conforms.
From
a versioning perspective, it might be practical, in a closed system, to say
that a new version of a particular language is being introduced into the system
at a specific time. At that time, all of the data that conforms to the previous
version of the language is migrated to conform to the new version as part of
the upgrade process.
[Definition: An open system is
one in which some producers and consumers are loosely connected or are not
controlled by the same organization. The internet is a good example of an open
system.]
In
an open system, it's simply not practical to handle language evolution with
universal, simultaneous, atomic upgrades to all of the affected software
components. Existing producers and consumers, who are outside the immediate
control of the organization that is publishing an updated language, may
continue to use the previous version for a considerable period of time.
Finally,
it's important to remember that systems evolve over time and have different
requirements at different stages in their life cycle. Often, early versions of
systems will operate as if they are closed. During initial development, when
the earliest versions of a language are under construction, it may be valuable
to pursue a much more aggressive, draconian versioning strategy. Once a system
is more widely deployed, in production it tends to behave more as an open
system. There is likely to be an expectation of stability in the language it
provides. Consequently, it may be necessary to proceed with more caution and to
be prepared to provide forwards and backwards compatibility for changes.
From NoahM:The draft is on pretty firm
ground when it talks about the information that can be determined from a given
input text per some particular language L. I think there are important
compatibility statements we can and should make at just that level (see
suggestions above), and we should separate them from statements about the
compatibility of a particular pair of applications that may communicate using
the language. Both are important to include, I think, but they should be in
separate chapters, one building on the other. Once you've cleanly told a story
about which information can be reliably communicated when sender and receiver
interpret using different language versions, you can go on to tell a separate
story about whether the applications can indeed work well together. To
illustrate what I mean, here are examples at each of the two levels.
Language level incompatibility:
Consider a situation in which the same input connotes different information in
one version of a language or another. Without reference to any particular
application, we can say that the languages are in that respect incompatible.
For example, we might imagine a version of a language in which array indexing
is 1-based, and a later version in which 0-based indexing is used; the
information conveyed by any particular array reference is clearly in some sense
incompatible, regardless of the consuming application's needs.
Application-level incompatibility: Now
consider two applications designed render the same version of the HTML
language. The same tags are supported, with the same layout semantics, etc. One
of the applications, however, has a sub-optimal design. Its layout engine has
overhead that grows geometrically with the number of layout elements. If you
give it a table with 50 rows, it takes 3 seconds to run on some procesor. If
you give it a table with 5000 rows it runs for 3 days. Question: is the second
application "compatible" with the 5000 row input? In some ways yes,
and in some no. It will eventually produce the correct output, but in practice
a user would consider it incompatible. This illustrates that compatibility of
applications ultimately has to be documented in terms meaningful to the applications.
In this case, rendering time is an issue. I think we should not try in this
finding to document specific levels of compatibility at the application level
and we should especially not fall into the trap of trying to claim it's a
Boolean compatible/incompatible relation; in the performance example, it's a
matter of degree. So, the terminology needs to be specific to the application
and its domain. I do think we can talk about some meta-mechanisms that work at
the application level, such as mustUnderstand, but they should be in a section
that's separate from the exposition of texts, information, and the degree to
which information may be safely extracted from a given text when sender and
receiver operate under differing specifications.
From Rhys: Clearly this is important,
but I’m not sure we necessarily need to go to this level of completeness in a
TAG finding. Feels like there is a book in this!
The current draft tries to take the
approach that we will model application compatibility by defining a new
language that is the flavor of (in this case HTML) that a particular consumer
will successfully process, but the point is that "success" is
sometimes a fuzzy concept. Do we have two languages for this example, one for
the documents that completely break application #2 and another for those that
just make it run slowly? That seems to be what the finding is doing today, and
I'm not convinced it's the right approach. My proposal would be that we just
point out the distinction and say: "This first section of the finding for
the most part restricts its analysis to the limited question of: what
information can be reliably conveyed when a producer and a consumer operate
using different versions of what purport to be the same or similar languages?
The later sections explore some techniques that can be used by applications to
negotiate means of safe interoperation when sender and receiver are written to
differing versions of a language specification."
From Rhys: I like Noah’s suggested
approach because this is really the problem we are setting out to solve for
specific types of XML usage
This Finding is intended to provide a
terminology basis for further versioning findings.
Free
Online Dictionary of Computing. (See
http://wombat.doc.ic.ac.uk/foldoc/.)
Flexible XML Processing Profile.
(See http://www.upnp.org/download/draft-goland-fxpp-01.txt.)
RFC
793, TCP (See http://www.ietf.org/rfc/rfc793.txt.)
RFC
1521, MIME. (See http://www.ietf.org/rfc/rfc1521.txt.)
RFC
1866, HTML 2.0. (See http://www.ietf.org/rfc/rfc1866.txt.)
Yaron GolandXML Ignore proposed for WebDAV (See
http://lists.w3.org/Archives/Public/w3c-dist-auth/1997AprJun/0190.html.)
RFC
2518, WebDAV (See http://www.ietf.org/rfc/rfc2518.txt.)
RFC
2616, HTTP (See http://www.ietf.org/rfc/rfc2616.txt.)
HTML 4.0. (See
http://www.w3.org/TR/1998/REC-html40-19980424/.)
TBL Mandatory Extensions
Berners-Lee. Web Architecture: Mandatory extensions.
(See http://www.w3.org/DesignIssues/Mandatory.html.)
TBL Extensible languages
Berners-Lee.
Web Architecture: Extensible languages.
(See http://www.w3.org/DesignIssues/Extensible.html.)
TBL Evolution
Berners-Lee.
Web Architecture: Evolvability.
(See http://www.w3.org/DesignIssues/Evolution.html.)
Web Architecture: Extensible Languages
Berners-Lee and
Connolly, ed. Web Architecture: Extensible Languages
World Wide Web Consortium, 1998. (See
http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210.)
HTML Document types
Connolly, ed. HTML Document dialects World Wide Web Consortium,
1996. (See http://www.w3.org/MarkUp/WD-doctypes.)
W3C Recommendation, SOAP 1.2 Part 1: Messaging
Framework (See http://www.w3.org/TR/SOAP/.)
W3C Note, WSDL 1.1 (See
http://www.w3.org/TR/WSDL/.)
WS-Policy 1.2
W3C Note, WS-Policy 1.2 (See
http://www.w3.org/Submissions/WS-Policy/.)
XML 1.0
W3C
Recommendation, XML 1.0 (See http://www.w3.org/TR/REC-xml.)
W3C Working Draft, XML Inclusions
(See http://www.w3.org/TR-Xinclude.)
W3C
Recommendation, XML Namespaces (See http://www.w3.org/TR/REC-xml-names.)
W3C
Recommendation, XML Schema, Part 2 (See
http://www.w3.org/TR/xmlschema-2.)
XML Schema Wildcard Test Collection
XML Schema Wildcard Test collection
(See
http://www.w3.org/XML/2001/05/xmlschema-test-collection/result-ms-wildcards.htm.)
XFront Schema Best Practices (See
http://www.xfront.com/BestPracticesHomepage.html.)
XML.com Schema Design Patterns
Dare
ObasanjoXML.com Schema design patterns (See
http://www.xml.com/pub/a/2002/07/03/schema_design.html.)
Dave Orchard
writings on Extensibility and Versioning
Dave Orchard writings on extensibility and versioning
(See http://www.pacificspirit.com/Authoring/Compatibility.)
The author thanks Norm Walsh for many
contributions as co-editor until 2005. Also thanks the many reviewers that have
contributed to the document particularly David Bau, William Cox, Ed Dumbill,
Chris Ferris, Yaron Goland, Rhys Lewis, Hal Lockhart, Mark Nottingham, Jeffrey
Schlimmer, Cliff Schmidt, and Norman Walsh.
Changes |
||
Who |
When |
What |
DBO |
20070518 |
Incorporated Rhys' comments, added version
identifier story to forwards compatible evolution, split part 1 into
terminology and strategies documents. |
[NRM1]General comments
·
**Not all languages
are at the full document level. We
should be clear that the finding applies to any language consisting of sets of
texts, whether whole document, subtree in XML, just
the text content of some tag (e.g. the format of a floating point number), or
plain text files.
·
Do you want to
reference RFC 2119?
·
* I think you need to
define “TEXT” before this. There is a
formal definition, but it comes after this first use
·
**The relationship
between information sets and semantics seems to be unclear, yet this one or
both of them is crucial to the story you’re telling about versioning.
[NRM2]I still don’t think it’s the best example, but I think
we’ve agreed to disagree on that. So,
I’ll assume it stays.
* [NRM3]No. The language is
not exchanged. Maybe: “Documents
conformant with the name language are intended for exchange between computer
applications.”
[NRM4]Define TEXT first.
[NRM5]What do we mean by “act of creation”? Encoding as a string of bits? Deciding what
the content is going to be?
[NRM6]Sounds too much like it relates to tuberculosis.
[NRM7]Antecedent of “these” is ambiguous. Furthermore, the sentence parses to suggest
that something is an event, but the nouns and noun phrases earlier in the
sentence are Wilma, Barney and “each other”, none of which could be an event.
*** [NRM8]No, no, no. We keep
disagreeing on this, and others (Stuart?) have disagreed as well. I strongly believe we’ll do better to say: The
language is a set of texts and the mapping of texts to information. Once you know the texts, the constraints are
redundant. Same for information resulting from the
mappings, and constraints that those results happen to obey.
I would buy saying: “one way to specify the set of texts that
comprise a language is intensionally, using a
constraint language such as regular expressions, W3C XML schemas, etc. In other cases, a language may be defined
extensionally, by just listing the texts that are legal (this is often
practical with very small languages, such as “red” “yellow” “green”, which
might be legal texts for a language capturing the state of a traffic light.
[NRM9]Definition of TEXT is needed earlier.
[NRM10]Why have we switched from text to string of chars?
[NRM11]***Whoa! You’re
saying “information has semantics”?
Either that’s wrong, or it needs more careful explanation. I think I know what you’re trying to say, but
first of all it’s unclear, and secondly I’m not sure we want to get into
semantics. If we can just formulate an
explanation of which documents convey the same information (the last name value
is Mendelsohn) I think that’s enough to tell the
versioning story. Going into semantics
(in Western societies, one part of the name is traditionally taken from the
father, so saying that the last name field is Mendelsohn
suggests that I actually have such a family name) we should most avoid, I
think. Semantics is important, but I
think that if we show how to convey information reliably, others can build
semantics and reasoning on top of that.
*** [NRM12]What’s an Information Set? It seems to be a key
abstraction, but you[‘re using it without introducing
it.
[NRM13]First of all, this seems a bit smug in tone. Secondly , it’s not
entirely correct. Languages are
intended, I think, as a means by which information can be set down or encoded,
as well as to support interpretation.
I’d kill the whole sentence.