Re ISSUE-41 versioning (again)

On Fri, Oct 16, 2009 at 3:30 AM, Larry Masinter <masinter@adobe.com> wrote:
> In response to an administrative request to review issues, I thought
> I would send this more publicly: ...
> ISSUE-41: "designing extensible languages and for
>    handling versioning?"
>  I'd put Noah's overview of DE and the JAR and LMM (and Raman?)
>    work on versioning and payoffs as further actions.
>
> Larry
> --
> http://larry.masinter.net

I know this is probably tiring given the previous messages I've sent
on the subject, but I am sending it so that it is in
the email archive, both in case I want to refer to it at the F2F, and
for the amusement of those who have a perverse fascination with this
kind of thing. Also to get it out of my system.  -Jonathan

The following is a *model*, a caricature of language change.  Any
resemblance it bears to reality is just lucky.

I talk about "meanings" without depending on their having any
particular properties.  The narrative works with almost anything else
substituted, such as "number" or "command".

````

Communication happens when a message correctly conveys a meaning,
i.e. when a consumer understands what a producer "means" by a message.

That is, the producer P wants to convey a meaning Q, so it composes a
message M = P(Q) whose meaning is Q.  The consumer C receives M and
assigns it a meaning Q' = C(M).  The outcome is a payoff to producer
and consumer (whose interests I assume to be aligned) based on how
well the intended meaning Q matches the understood meaning Q'.  Call
this payoff z(Q, Q').

(Note that a language designer or specification writer might influence
P and C, but payoff is determined by uncontrolled factors.)

  If z(Q, Q) is not positive, then a self-interested producer will
  never voluntarily send a message M for which C(M) = Q: such messages
  will be unused (def: "used" = in the image of P).

  There might be meanings Q that are not in the image of C, i.e. no
  message elicits them.  These meanings might be called
  "inexpressible".  If the payoff z(Q, Q) of an inexpressible meaning
  is high, there will be pressure to modify the language so that the
  meaning becomes expressible.

  One definition of "language" (or "language version") might be: a
  factoring of the identity function through a message space, i.e. a
  pair of functions (P, C) such that C(P(Q)) = Q.  (P is necessarily
  injective.)

Consider a language-change scenario in which producers and consumers
change one at a time.  When a producer changes it assigns a new
message to some meaning: P(Q) = M becomes P'(Q) = M'.  When a consumer
changes it assigns a new meaning to some message: C(M) = Q becomes
C'(M) = Q'.  When both change in concert we might end up with
 z(Q', Q') > z(Q, Q), in which case everyone will be happier,
especially if Q' had formerly been inexpressible.

Ideally a language change does not entail intermediate states in which
some producers or consumers are "disenfranchised" in the sense that a
formerly available nonnegative payoff decreases, even ephemerally.

If the consumer changes first, i.e. Q' = C'(M) != C(M) = Q for some
message M, we are OK as long as either (a) M is not used by old
producers, or (b) z(Q, Q') >= z(Q, Q), i.e. the new meaning is somehow
"not any worse" than the one that was expected.  This might be called
a "backward compatible" change or "extension".  (Language
specification bug fixes might be related to this?)

If the producer changes first, i.e. M' = P'(Q) != P(Q) = M for some
meaning Q (n.b. if Q was inexpressible there is no M), we are OK as
long as z(Q, C(M')) >= 0, i.e. the producer is not punished for
sending M' instead of M.  Since it's the producer that's changing, it
should expect that it might be talking to old consumers, so it has no
right to be particularly disappointed with M' being interpreted as
having its old meaning C(M'), and it will prepare accordingly.  But if
the payoff z(Q, C(M')) when an old consumer C receives M' weren't
nonnegative, the new producer might be punished (negative payoff) for
sending to old consumers, and that would have an inhibiting effect.
A change of this kind has been called "forward compatible".

...

Language change can occur only through use of a communication channel.
Somehow producer and consumer must decide, or learn, in concert, new
message/meaning assignments.  The special channel could be a side
channel using a completely different language (e.g. a language spec
sent to a software developer), or it could be a feedback channel to be
used for reinforcement learning (e.g. rejection of a deprecated
message), or it could be a special way to use the language itself
(e.g. a message whose meaning is to change the state of the consuming
agent so that it replaces its meaning function C with C').

...

Examples:

- A new message/meaning assignment should be expected to spread only
  when it confers some substantial benefit (the meaning is both new
  and useful)

  E.g. <video> is likely to spread; <hrule> as a synonym for <hr> wouldn't.

- Neutral changes can spread as "parasites" of beneficial changes (as
  when changes are batched in a software release).

- Languages that permit multiplexed communication to different
  consumer populations ought to best promote the propagation of
  changes (e.g.  #ifdef in C - "if you are an old consumer, treat this
  message as M, but if you are new one, then treat it as M' ") [but
  features like #ifdef have other costs!]

- Languages for which P is surjective (no unused messages) and payoff
  is positive (z(Q, Q) > 0 for all Q in the image of C) are extremely
  difficult to change, as the use of every possible message will be
  claimed or entrenched by the language community.  An example would
  be ASCII.

- If C(M) is "an error" (must lead to an error report or something
  similar, as opposed to being processed in a useful way), how would a
  population resist a language change that converts C(M) to some
  useful behavior?  Answer: It can't, unless "cheating" (making
  changes) is somehow more painful (or less beneficial, lower payoff)
  than the error behavior.  If there's no cheating, it's only because
  the outcome of doing so is perceived to be even worse than being
  punished by an error report.

...

Suppose the old and new languages support communication of a "language
version indicator".  That is, there is a third language that is the
disjoint sum of two others: a message M'' consists of a version 1
indicator followed by a message M in language version 1, or a version
2 indicator followed by a message M' in language version 2.  (This is
similar to the #ifdef example above.)

An indicator could be used in a variety of ways:

  . A version 2 indicator could be a signal to version 1 consumers
    that they won't understand what's coming.  This won't enable
    communication in any way, but when a language change is
    incompatible it might prevent errors of misinterpretation (<0
    payoff) by forcing the message to be discarded entirely (0
    payoff).

  . A version 1 indicator could be a signal to version 2 consumers
    that version 1 meaning assignments should be chosen for incoming
    messages.  This would be useful if (and only if) the language
    change were incompatible (lower payoffs).

  . An indicator might serve a function at a different niche in the
    ecology.  For example, a transforming consumer/producer (e.g. an
    editor) could use a version 1 indicator as a signal of limited
    capabilities of some downstream consumer, and avoid introducing
    version 2 features into messages.  (thanks Larry)

...

The unused messages constitute a frontier that can be colonized
through language changes.  The pressure to colonize will be constant
in any language population, as there will always be inexpressible
meanings striving to find expression.  Poorly coordinated changes risk
conflict, e.g. two independently arising changes might define
conflicting behavior for messages containing "<circle>", thence punishing
payoffs.

Change coordination is a sociotechnological problem with a variety of
solutions.  The simplest strategy is to eliminate the frontier by
defining beneficial consumer behavior for all messages and encouraging
investment in the entire territory.  A central coordination point is
another strategy; hierarchical delegation is yet another;
randomization is yet another.

Received on Sunday, 29 November 2009 00:38:22 UTC