Default Semantics, Banana Elements, and Versioning

First a caveat:  I have not yet read Dave's new drafts on versioning.  I'm 
hoping to do so on the plane on Sunday night, so this is not a comment on 
them.

Anyway, I see that we are planning to have some discussion of Versioning 
at the upcoming F2F meeting in Southampton, and I wanted to remind 
everyone of the issues that I summarized in [1], which has as its subject 
"Defined Sets, Accept Sets, and Banana Elements."  I hope we can spend a 
bit of time at the F2F exploring these issues, and getting comfortable (or 
not) with how our drafts address them.

As a reminder, [1] explores in detail a use case I raised at the Google 
F2F, and sets out what I understood to be Tim BL's analysis of it.  In 
short, the issue centers around HTML-like languages and instances such as:

        <PHTML>
         <BODY>
           <P>
            <BANANA>Versioning is hard.</BANANA>
           </P>
         </BODY>
        </PHTML>

We assume that the PHTML spec V1 says nothing special about <BANANA> 
elements, except that like all other tags not explicitly named, they are 
to be ignored, but with one exception.  Other languages, such as CSS, are 
not prohibited from discovering their presence, so:

        BANANA {font-weight:bold}

will cause "Versioning is hard" to be boldfaced.  Furthermore.  We may 
assume that some future version of PHTML, say V2, provides an explicit 
semantic for <BANANA>, perhaps indicating that it causes content to be 
rendered in parenthesis:

        (Versioning is hard.)
 
My question is: what do our defined-set/accept-set models provide to 
explain the evolution of PHTML from V1 to V2?   If I understood Tim 
correctly (see [1]), his view is that even documents with <BANANA> are in 
the defined set for the combined language PHTML V1+CSS.  That makes sense, 
insofar as we've shown that the presence of the <BANANA>'s can't be 
ignored.  What concerns me is that this seems to imply that our model adds 
very little value to any versioning system in which the default semantic 
is not: ignore completely.  What does this architecture say is happening 
when PHTML V2 comes out?  All the documents are in the defined sets of all 
language versions.  How does that help us?

So, with the caveat that I'd like to read Dave's drafts to see what they 
say (I know he said he was picking up some stuff from my earlier notes), 
I'd like to suggest that we discuss this question a bit at the F2F.  The 
reason I keep raising this is that I think it's common, perhaps the norm, 
for languages to have semantics other than "ignore completely" even for 
the content that they don't specify in detail.  On a purchase order, 
you'll probably find a way to print the fields that you don't otherwise 
understand.  Almost surely you'll store them in your database, and 
probably you'll apply digital signatures to them.  We've already seen that 
in the presence of CSS, every possible element can contribute to the 
rendering of an HTML document -- does that the presence of CSS prevents us 
from explaining that something very interesting happens when a new version 
of HTML provides explicit semantics for, say <BANANA>?   I think this is 
all very important.

Noah

[1] http://lists.w3.org/Archives/Public/www-tag/2007Jun/0092.html

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Tuesday, 4 September 2007 23:47:52 UTC