Raman's review of versioning strategies from Dave Orchard on 2008-05-22 (www-tag@w3.org from May 2008)

From: Dave Orchard <orchard@pacificspirit.com>
Date: Thu, 22 May 2008 12:50:31 -0700
To: <raman@google.com>, <www-tag@w3.org>
Message-ID: <000b01c8bc45$17fed0c0$6a01a8c0@amer.bea.com>
Hi Raman,
 
This was a most excellent review.  I've basically done changes based upon all of your comments, except one.  I put my proposals
inline to help with any subsequent reviews.  
 
Thanks again,
Dave


On Mon, May 19, 2008 at 6:41 PM, T.V Raman <raman@google.com> wrote:


Hi Dave,

congratulations on all the progress you've made with your work on
the versioning document over the last few months.

I'm attaching my review of sections 3,4 and 5 as promised, hope
you get time to discuss these during the F2F.




Comments On Finding On Versioning


1 Comments On Finding On Versioning <http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies> 


Getting these in in time for the F2F. 


1.1 Comments On Section 3: Incompatible


No Comments on present content. 


1.1.1 Suggestions: 


*	Perhaps add a paragraph about how such incompatibilities 

might be prevented in the context of distributed extensibility using techniques such as modularized namespaces? Compare and contrast
such distributed extensions with centralized vocabularies. 

DBO>>There's now the following in section 2 "Additionally, the design can facilitate the use of a particular strategy. For example,
XML Namespaces are designed for distributed extensibility and allow some language changes or extensions to be compatible changes
rather than incompatible changes.".  


1.2 Comments On Section 4


1.2.1 4.1: Producers And Consumers


Might be worthwhile addressing how a new producer should behave when the user requests content that is explicitly for an old
consumer. It would also be worthwhile to distinguish between evolution strategies where authors are insulated from knowing the
distinction between old vs new (implicit creation?) vs those scenarios where authors are explicitly aware of the distinction, and
therefore are given control on whether they produce content with a new producer that specifically avoids all new features. 

DBO>>I added "Note that a newer producer may be informed by the consumer as to which versions of a language it prefers. Also, the
producer may explicityly provide metadata about the languages it supports, allowing a consumer to deliberately choose which version
of the language the producer will produce. This can give the consumer the opportunity to select a version of the language that
avoids any new features. There are two significant ways that backward compatibility can be supported.</


1.3 4.1 (Replacement):


This section could do with an example. 

DBO>> Added.


1.4 Comments On 4.2:


Minor nit: there is a repeated period --- search for "..". The example used --- URI for a resource that is http or https --- raises
the complicated issue of identity that might perhaps complicate the present issue where we dont need such complexity. The issue of
URI for resource identity --- and the unfortunate consequences that result in this context from the URI being tied to a protocol is
unfortunate, and in my opinion a profound architectural issue; I'd just like to keep it out of this document for now. 

DBO>> I changed the example..


2 Comments On 5 (Forward Compatible)


Top-level generic comment: this section reads well. However, we might be giving ourselves too much credit with respect to pointing
at the addition of the img tag to HTML1 as a shining example of extensibility. Looking at it in the cold light of the mess that is
the Web today, I postulate that extensibility in 1991 was easy because there wasn't much content to extend. To see this, let's ask
ourselves what would happen if one tried to view the Web through a lens today that cannot handle the img tag. I assert that it's far
more complex than it just works without the images --- this happens because of the large amount of significant content and behavior
that has been stuck away behind the img tag. In 1992, if you browsed the Web with a user-agent that didn't understand the img tag,
the losage would have been minimal; not so in 2008. 

Another point to remember about the img tag; it looks simple when described as an empty element whose appearance it dictated by one
of two attributes --- an src attribute for the image source, and an alt attribute for fallback text. Yet, more than 14 years after
its invention, introduction of the data: scheme in the value of the src attribute causes browsers that dont understand that idiom to
display a black hole in place of the image, i.e., there is more to the img tag than meets the eye! 

DBO>> I see your point, and I added "Note that HTML does not have perfect extensibility either, for example the use of data: scheme
in the IMG src attribute causes browsers that don't understand that use of the src attribute to display a black hole in place of the
image. ".   


2.1 5.1 Must Accept Unknowns


The term text glosses over whether the unknowns appear at the tokenization phase or during the post-tokenization phase. I believe
this is a valuable distinction to expound on --- especially from the view of designing parsers. In paragraph that starts with 

Another way of looking at this combination is that there are two

languages. The browser renderer understands HTML which involves ignoring

unknown elements or attributes. By our language definitions, the renderer's
the jump to HTML browsers is both surprizing and jarring. I'd suggest refactoring or rearranging this and the preceding paragraphs
so that we move sequentially through arguments that cover tokenization, serialization and DOM structures, with examples from XML
tokenization, HTTP header handling and HTML DOM going hand-in-hand with each. 
 
DBO>> those are good points, but I'd like to stick with just HTML.  How about "Many languages are consumed in multiple stages, each
of which may have different variants of these models. An example is a web browser processing HTML. It will typically tokenize HTML
first, and HTML specifies that unknown markup is tokenized to nothing. The results of tokenization are used as the basis for
rendering. The same markup is still placed in the DOM and is available for CSS or other DOM related technologies. Another way of
looking at this combination is that there are two languages. The browser tokenizer understands HTML which involves ignoring unknown
elements or attributes. By our language definitions, the renderer's Defined Text set does not include unknown elements or
attributes, though the Accept Text Set does. The browser DOM understands any HTML elements or attributes. The DOM Defined and Accept
Text set includes any elements or attributes."



Not sure if I believe the argument about the script element with respect to must ignore children --- from memory -- and this still
shows in the bizarre tricks authors trying to validate use to hide scripts --- the issue was a mixture of: 

*	A desire to make pre-script browsers ignore script. 

*	A reluctance to use CData sections for holding the scripts. 

*	Problems that arise when using -- in scripts inside an HTML/SGML comment. 

*	Only content in the head was ignored by older browsers. 

DBO>>I had hoped my solution covered this situation for pre-script browsers ignoring script.  What am I missing?

2.2 Comments on 5.2:


In the context of MIME providing fallback support, it is interesting to observe the resulting chaos that gets created in the
following usage scenarios: 

*	Web mail agent sends out email encoded as HTML. 

*	Attaches a text/plain alternative. 

*	Recipiant opens mail with a preference for text/plain 

*	Replies with in-line quoting. 

*	Response opened by someone else on the mail thread -- who then follows up in HTML. 

Consequence: the two parts quickly fall out of sync, and one is left with a mush that makes TAG-Soup look appetizing. 

DBO>> that is a really great point, that fallbacks can cause problems too.  How about "There can be difficulties with fallbacks when
the consumer chooses one alternative, modifies it and then a subsequent or different consumer chooses a different alternative. In
the multipart/alternative example, an email could be offered as HTML and plain text. If the consumer chooses plain text, replies
(becoming a producer) with in-line quoting, then a next consumer chooses to reply to the new message using HTML, then the
alternatives are not synchronized and are not true alternatives any more. "


2.3 Comments On 5.3:


>From the text, it sounds like we're equating minor versions in a manner similar to how we want to handle backward/forward
compatibility for unversioned languages. It might be worth pointing this out explicitly if that is indeed what we're saying --- AKA,
reduce to previously unsolved problem. 

DBO>> Sorry, I didn't quite parse this.  I don't recall talking about compatibility of unversioned languages. 

With respect to supporting additional functionality it might be worth noting that once the author has tested for the availability of
a given functionality, she is back in the situation of having to provide fallback content for cases where the test returns false. 

DBO>>Good point, I put in "A language can provide a mechanism for explicit testing, and if so, the language also needs a mechanism
for the text to contain a <specref ref="fallbackprovided"/> for when the test returns false."

Author: T.V Raman  <mailto:raman@google.com> <raman@google.com> 

Date: 2008/05/19 10:39:10





David Orchard writes:
 > I sent a message to www-tag on tuesday night with a link to the latest
 > version of the versioning and I listed you as needing to do a review...
 >
 > The link is
 > http://www.w3.org/2001/tag/doc <http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies>
/versioning-compatibility-strategies
 >
 > Thanks in advance for the review..
 >
 > Cheers,
 > Dave
 >
 > On Fri, May 16, 2008 at 4:14 PM, T.V Raman <raman@google.com> wrote:
 >
 > > weren't you supposed to have sent me a pointer to poke me into
 > > reviewing action?
 > >
 > > --
 > >  Best Regards,
 > > --raman
 > >
 > > Title:  Research Scientist
 > > Email:  raman@google.com
 > > WWW:    http://emacspeak.sf.net/raman/
 > > Google: tv+raman
 > > GTalk:  raman@google.com, tv.raman.tv <http://tv.raman.tv/> @gmail.com <http://gmail.com/> 
 > > PGP:    http://emacspeak.sf.net/raman <http://emacspeak.sf.net/raman/raman-almaden.asc> /raman-almaden.asc
 > >
 > >
Received on Thursday, 22 May 2008 19:51:22 UTC