Comments on Extending and Versioning Languages: Strategies

In preparation for the F2F, I'm doing another readthrough of the 
Versioning Strategies document [1].  Here are some comments, in document 
not priority order:

--------

Section 1.2:

> Some typical backwards- and forwards-compatible changes:
> 
>     *  adding optional components ( in XML, this is generally 
> elements and/or attributes)

Optional elements are only compatible changes if their presence doesn't 
modify or conflict with the semantics of an existing element.  If I add a 
component that changes the semantics of some other component (e.g. ignore 
this, mustUnderstand, currencyUnit=Pesos) then the change is often not 
compatible (and especially not forwards compatible), even if the new 
component is syntactically optional.

--------

Section 2:

> In broad terms, the strategies to versioning fall into a numberof 
classes

(editorial) I think that should be:  "In broad terms, strategies for 
versioning fall into a number of classes" 

-or-

"In broad terms, strategies for versioning may be grouped into a number of 
classes" 


--------

Section 2:

> Different application domains will choose different approaches. 

(Editorial) I don't think a domain is something that typically makes a 
choice.  Suggest something along the lines of:

"Different choices may be appropriate for different applications"

-or-

"Different choices may be appropriate for different application domains."


--------

Section 2:

> The dependencies makes it imperative to plan for versioning from the 
start.

Two concerns:

1) It's not clear what dependencies you're talking about,

2) (grammar) "dependencies" is plural, so I think the phrase should be 
"The dependencies >make< it imperative

--------
Section 2.1:

> "Big bang" is a very coarse-grained approach to versioning. It 
> establishes a single version identifier, either a version 
> number or namespace name, for an entire text. 

I'm confused about this at a few levels.  First, it seems to imply that 
using a single namespace for all of the content in one particular instance 
document (a text) is a mistake.  I don't think that's what you meant.  I 
suspect that the point you're making really doesn't have to do directly 
with namespaces or version identifiers, but has to do with policies for 
knowing how much of a document a consumer can continue to safely process 
given that some of the content is not what was expected.  That issue can 
arise, and there are solutions that are possible, without any reference to 
either namespaces or version identifiers, I think.  Just as one example, 
most any syntactic marker that indicates that processing of contents is 
optional (such as mustUnderstand="false" in SOAP) seems to achieve 
non-big-bang versioning without any mention of either namespaces or 
version ids.

If I'm right about this, then the remainder of 2.1 needs to be rewritten 
to define big bang in a way that's less focussed on version identifiers.

-----------
Section 2.2 starts with a hyperlink to "Forwards compatible".  Shouldn't 
that hyperlink be where the term is first used?  Note that there is text 
that seems quite close to another definition of forwards and backward 
compatible in the bullets immediately under the start of section 2.

-----------
Section 2.2.2:

> Forwards compatible evolution of a language means that 
> producers of texts in language should be able 

(typo) should be "in >a< language"

----------
Section 2.2.2:

> A supreme example of the benefits of extensibility is HTML. 

Seems a bit strong.  Suggest:

"A good example of the benefits of extensibility is HTML. "

----------
Section 2.2.2.1:

Formatting bug.  The header is indendented under the Good Practice Note 
above.  This is a bit in either the XSL or CSS stylesheets.  I've found 
that this happens if a GPN is the last thing in a section.  You can beat 
it by putting an empty paragraph in the XML just after the GPN, and ahead 
of the header for 2.2.2.1. 

---------
Section 2.2.2.1:

> "By the definition of Extensibility, there is a mapping from 
> all texts with additional syntax to texts without."

This formulation keeps coming up, and I still believe it's much too 
limiting.  Version 1 need not know how to map all extension content to 
some text in which the extension content doesn't appear:  it must have 
some default rule for interpreting the extension content.  That rule need 
not be to ignore it entirely for all applications.  For example, there's 
no reason a version 1 storage application shouldn't store the entire 
contents of a document it receives, including markup it doesn't fully 
understand.  Of course, it will need to understand any markup that 
controls the storage operation, but other content it can blindly store. 
Maybe it does more than that:  maybe by default it removes whitespace from 
extension content and stores the results.  Many important and interesting 
applications do extensibility this way.  I'm really reluctant to imply 
that the only default processing rule is a mapping that makes the 
extension content disappear.

------------
Section 2.2.2.1:

> "Must Accept Unknowns Rule: Consumers MUST accept any text 
> portion that they do not recognize."

This is way to strong in my opinion.  Let's say I decide to make an 
extension to XML that allows attributes to be quoted with backquotes as 
well as forward quotes:

        <element a=`backquotedattr`/>

Are you really saying that XML violates good practice because XML 
processors will reject this "text that it does not recognize".  Taken to 
its extreme, this GPN seems to imply that an XML processor should quietly 
accept a FORTRAN program as just one giant "text that it doesn't 
recognize".

In fact, almost all extensible languages have syntactic rules that are 
fixed and that cannot be changed without breaking compatibility.  Within 
those rules, extensible languages encourage processors to accept and 
provide some default interpretation to content that wasn't specified in 
detail in the original specification.  (Note that the first versions of 
HTML do allow for <img> just as they allow for <frog> and <banana>; what 
they don't do is call out those tags specifically or given them any 
distinguished interpretation).

-----------
Section 2.2.2.1:

> Preserve existing information Rule: An Extensible Language MUST
> require that any texts with extensions MUST be compatible with 
> a text without the extensions.

For the reasons stated above, I don't think this is right in all cases 
either.  I think an extensible language must provide for a default 
interpretation of any extensions that may be present.  Why that 
interpretation should be equivalent to some text without the extension 
isn't motivated in the finding at all.  In fact, I think SOAP is a counter 
example, if you're willing to grant that headers are extensions.  I'm 
fairly sure that if in SOAP I have a header that's mustUnderstand="false", 
I need not process it locally, but I think that if I'm an intermediary I 
in general MUST relay that header downstream, even though I don't 
understand it.  If the GPN requires that the message be equivalent to one 
without the header, how can I do that?  So, that's another example of a 
default processing rule for extension content:  don't ignore it, relay it.

Ah... I see now that you acknowledge this use case:

> Must Accept and Preserve Unknowns Rule (Must Accept variant 2):
> Consumers MUST accept and preserve any text portion that they 
> do not recognize.

I feel pretty strongly that this flat out contradicts the first GPN quoted 
above.  Either the text is equivalent to one without the extension OR you 
must preserve it.  I don't think you can have it both ways.  The HTTP 
proxy in your example is not acting as if the document had some equivalent 
without the extensions.  I would drop that first GPN, probably replacing 
it with one that requires a default processing rule for extension content.

-----------
Section 2.2.2.1

> In tree based languages, which includes all markup languages,

I can't invent a markup language that would, for example, represent graphs 
more general than trees?

-----------
Section 2.2.2.1: GENERAL COMMENT

I think that we usually use Good Practice Notes for things that you 
usually SHOULD do.  Some of the GPN's in this finding seem to be 
alternatives that are advertised as being more or less equally good 
choices.  I think you need to do some explaining as to which GPNs are to 
be followed as good practice more or less all the time, and which are 
intended as alternatives.  For example "Must Accept ALL" vs. "Must accept 
container" seem to be just two choices, and they are mutually exclusive. 

----------
Section 2.2.2.3

> Languages MUST provide a substitution model for version identifiers for 
forwards-compatible evolution.

(do we put MUST's in GPN's?   Uusually, SHOULD feels better in a GPN. 
Maybe it's OK, not sure)

Anyway, as an example of this you give:

> There could be an algorithmic approach. For version numbers, 
> one could say that version numbers will only have a "major" 
> change if there is an incompatible change. For example, version
> 1.1 of a language is by definition compatible with version 1.0 
> and version 2.0 is incompatible. 

I understand why this is a common strategy and sometimes a good one.  I'm 
less clear on why it's an example of a substitution rule.  What's being 
substituted for what when I say "gee, I can't process this document 
because it's got a newer major version number than I understand?"

--------
Section 3.0:

> there are some key requirements the language designer consider 
> in choosing a strategy and design.

(typo) there are some key requirements the language designer >should< 
consider in choosing a strategy and design.

--------
Section 3.0:

> It is sometimes desirable to prevent 3rd parties from extending
> languages, but it does happen. 

> An example may be a tightly constrained security environment 
> where distributed authoring is considered a "bug" rather than a feature.

The first sentence doesn't parse unambiquously.  Is it "preventing" that 
does happen, or "extending".  On first reading, I assumed you meant 
extending, then when I saw the next sentence I thought probably not. 
Either way, I think the sentence should be reworded.  How about:

"Allowing 3rd parties to extend a language can  be extremely valuable for 
applying the language to new use cases, but in other situations such 
extensibility may be undesirable.  For example, there may be situations in 
which security concerns dictate that the language specification be 
centrally maintained."

--------
Section 3.3:

> If so, a substitution mechanism is required for forwards compatibility.

Suggest:  "If so, a default processing rule is required for forwards 
compatibility."  (same reason as discussed earlier)

------

I've skimmed the rest of the draft, but not reviewed it in as much detail. 
 I think the general nature of my comments would be similar.  I hope they 
are helpful.  Thank you.

Noah

[1]    http://www.w3.org/2001/tag/doc/versioning-strategies-20070917.html 

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Tuesday, 26 February 2008 01:43:19 UTC