Defined sets, accept sets, and <banana> elements

During our discussions of versioning at the June 2007 TAG F2F I raised 
concerns about our notions of "defined text set" and "accept text set" 
[1].  If you're reading the minutes at [2], look for the bit that begins: 
"NM: I think by the way that the defined set and the accept set is less 
useful than we thought...".  At the meeting, I challenged the group with a 
sketch of an example, asking how it would be handled by the 
defined-set/accept-set model.  I didn't hear an answer that satisfied me 
in the room, but Tim and I happened to sit together on the flight home. He 
expressed to me some support for the defined-set/accept-set formulation, 
and he suggested how he would apply it to the example I had in mind.  The 
purpose of this note is to record the example and Tim's explanation, as I 
understood it, and then to comment a bit.  I assume he'll correct any 
misunderstandings on my part.

The Example Language: PHTML
---------------------------

The example is motivated by HTML, as styled by CSS.  To avoid ratholes 
relating to particular historical details of those languages, I'll here 
use two mythical languages PTHMTL, and PCSS (pretend HTML and pretend 
CSS), pertinent details of which are as follows:

Assume that PTHML is a language of well formed XML documents that must 
have a root tag <PHTML>.  A few HTML-like tags, such as <P> for paragraph 
and <BODY> for body are defined in the PTHML version 1 specification.  As 
with HTML, PHTML allows for the appearance of arbitrary tags such as 
<BANANA> not named explicitly in the specification;  it defines any 
document containing such an extension tag to have the the same semantics 
as a similar document from which that tag has been deleted.  Thus, per the 
PHTML spec., the following two documents have the same meaning:

  doc1.phtml:

        <PHTML>
         <BODY>
          <P>Versioning is hard.</P>
         </BODY>
        </PHTML>

-and-

  doc2.phtml:

        <PHTML>
         <BODY>
          <BANANA>
           <P>Versioning is hard.</P>
          </BANANA>
         </BODY>
        </PHTML>

This is basically the example language we discussed at the F2F.  (The 
application of CSS to this language is discussed later in this note.)

Application of defined text sets and accept text sets to PHTML
--------------------------------------------------------------

The invariant in the defined-set/accept-set formulation is that every 
document in the accept-set conveys the same meaning as some particular 
document in the defined set.  doc1.phtml above is in the defined set for 
PHTML, because all of its content has a meaning supplied directly by the 
specification.  doc2.phtml is in the accept set;  that document too is in 
the PHTML language, but the semantics of doc2.phtml are defined by means 
of its equivalence to a defined-set document, doc1.phtml.

So far, so good.  All of this makes sense to me.  Defined sets and accept 
sets do a good job of explaining this extensibility.

The challenge
-------------

Now we come to the interesting part of the example.  We allow our pretend 
CSS language to style the markup in PHTML documents, and crucially, the 
styles can be applied to <BANANA> elements as well as to paragraphs. 

<PHTML>
 <HEAD>
  <STYLE type="text/pcss">
   P {font-size: 120%}
   BANANA {color:yellow}
  </STYLE>
 </HEAD>
 <BODY>
  <BANANA>
   <P>Versioning is hard.</P>
  </BANANA>
 </BODY>
</PHTML>

The paragraph will have a large font and will be yellow.  My challenge to 
the TAG was: how do defined and accept sets explain this sort of 
extensibility?  (Note that the equivalent is allowed for real CSS applied 
to real HTML.) 

My understaning of Tim's preferred answer is:  "PTHML as redefined by PCSS 
is a different language than PHTML on its own, and all of the legal 
strings (texts) in that new language are in its defined set -- the accept 
set is the empty set.  The PCSS specification is the one that gives a 
non-vacuous meaning to <BANANA> elements, and indeed in the presence of 
PCSS, a document with a <BANANA> is no longer equivalent to one without. 
Thus, according to PHTML as redefined by PCSS, all PHTML documents are in 
the defined set, and none are in the accept set."

Some Comments
-------------

I found this analysis to be tremendously helpful.  It certainly meets my 
challenge at the F2F, which was to show how the defined-set/accept-set 
model can be coherently applied to this example.  I hope Tim can confirm 
that I've correctly captured the essence of his analysis, and I'll be very 
curious to see whether others who've been advocating the 
defined-set/accept-set approach would apply it the same way.

As to my own position, I want to give it some more thought.  My initial 
concern (Dave and I discussed this at great length at dinner in Mountain 
View) was that I didn't really see how to apply the concepts to my 
example, and Tim has resolved that concern.  What remains is a worry that 
by putting everything into the defined-set, our versioning model is no 
longer saying much about the sense in which the PHTML+PCSS V1 language is 
indeed extensible.  When PHTML+PCSS version 2 comes along and defines 
semantics for some elements like <BANANA>, I'm not sure how the model's 
going to help us explain what happened, because everything was in the V1 
defined set to begin with.  Many languages in fact provide nontrivial 
default semantics for their extension content --  our mythical PTHML/PCSS 
allowed the extension content to be styled, but a PDOM might well have 
allowed scripts to address the extensions, and many, many non-HTML 
languages provide interesting default semantics for extension content 
(store it, print it, etc.).  I think these are really interesting use 
cases, and I'll be disappointed if our formal models don't explain them 
well.

Nonetheless, I think Tim and others are making a strong point that, in the 
interesting sub-case where extension content is truly and completely 
ignored, the defined-set/accept-set model gives a nice, clean, 
set-oriented explanantion.  I buy that.  So, I think this is progress. 
Thanks to Tim for being patient in working through this with me.

Noah

[1] http://www.w3.org/2001/tag/doc/versioning-20070518
[2] http://www.w3.org/2001/tag/2007/05/30-minutes#item06

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Wednesday, 20 June 2007 16:06:40 UTC