- From: <noah_mendelsohn@us.ibm.com>
- Date: Wed, 20 Jun 2007 12:06:52 -0400
- To: www-tag@w3.org, Tim Berners-Lee <timbl@w3.org>
- Cc: "David Orchard" <dorchard@bea.com>
During our discussions of versioning at the June 2007 TAG F2F I raised
concerns about our notions of "defined text set" and "accept text set"
[1]. If you're reading the minutes at [2], look for the bit that begins:
"NM: I think by the way that the defined set and the accept set is less
useful than we thought...". At the meeting, I challenged the group with a
sketch of an example, asking how it would be handled by the
defined-set/accept-set model. I didn't hear an answer that satisfied me
in the room, but Tim and I happened to sit together on the flight home. He
expressed to me some support for the defined-set/accept-set formulation,
and he suggested how he would apply it to the example I had in mind. The
purpose of this note is to record the example and Tim's explanation, as I
understood it, and then to comment a bit. I assume he'll correct any
misunderstandings on my part.
The Example Language: PHTML
---------------------------
The example is motivated by HTML, as styled by CSS. To avoid ratholes
relating to particular historical details of those languages, I'll here
use two mythical languages PTHMTL, and PCSS (pretend HTML and pretend
CSS), pertinent details of which are as follows:
Assume that PTHML is a language of well formed XML documents that must
have a root tag <PHTML>. A few HTML-like tags, such as <P> for paragraph
and <BODY> for body are defined in the PTHML version 1 specification. As
with HTML, PHTML allows for the appearance of arbitrary tags such as
<BANANA> not named explicitly in the specification; it defines any
document containing such an extension tag to have the the same semantics
as a similar document from which that tag has been deleted. Thus, per the
PHTML spec., the following two documents have the same meaning:
doc1.phtml:
<PHTML>
<BODY>
<P>Versioning is hard.</P>
</BODY>
</PHTML>
-and-
doc2.phtml:
<PHTML>
<BODY>
<BANANA>
<P>Versioning is hard.</P>
</BANANA>
</BODY>
</PHTML>
This is basically the example language we discussed at the F2F. (The
application of CSS to this language is discussed later in this note.)
Application of defined text sets and accept text sets to PHTML
--------------------------------------------------------------
The invariant in the defined-set/accept-set formulation is that every
document in the accept-set conveys the same meaning as some particular
document in the defined set. doc1.phtml above is in the defined set for
PHTML, because all of its content has a meaning supplied directly by the
specification. doc2.phtml is in the accept set; that document too is in
the PHTML language, but the semantics of doc2.phtml are defined by means
of its equivalence to a defined-set document, doc1.phtml.
So far, so good. All of this makes sense to me. Defined sets and accept
sets do a good job of explaining this extensibility.
The challenge
-------------
Now we come to the interesting part of the example. We allow our pretend
CSS language to style the markup in PHTML documents, and crucially, the
styles can be applied to <BANANA> elements as well as to paragraphs.
<PHTML>
<HEAD>
<STYLE type="text/pcss">
P {font-size: 120%}
BANANA {color:yellow}
</STYLE>
</HEAD>
<BODY>
<BANANA>
<P>Versioning is hard.</P>
</BANANA>
</BODY>
</PHTML>
The paragraph will have a large font and will be yellow. My challenge to
the TAG was: how do defined and accept sets explain this sort of
extensibility? (Note that the equivalent is allowed for real CSS applied
to real HTML.)
My understaning of Tim's preferred answer is: "PTHML as redefined by PCSS
is a different language than PHTML on its own, and all of the legal
strings (texts) in that new language are in its defined set -- the accept
set is the empty set. The PCSS specification is the one that gives a
non-vacuous meaning to <BANANA> elements, and indeed in the presence of
PCSS, a document with a <BANANA> is no longer equivalent to one without.
Thus, according to PHTML as redefined by PCSS, all PHTML documents are in
the defined set, and none are in the accept set."
Some Comments
-------------
I found this analysis to be tremendously helpful. It certainly meets my
challenge at the F2F, which was to show how the defined-set/accept-set
model can be coherently applied to this example. I hope Tim can confirm
that I've correctly captured the essence of his analysis, and I'll be very
curious to see whether others who've been advocating the
defined-set/accept-set approach would apply it the same way.
As to my own position, I want to give it some more thought. My initial
concern (Dave and I discussed this at great length at dinner in Mountain
View) was that I didn't really see how to apply the concepts to my
example, and Tim has resolved that concern. What remains is a worry that
by putting everything into the defined-set, our versioning model is no
longer saying much about the sense in which the PHTML+PCSS V1 language is
indeed extensible. When PHTML+PCSS version 2 comes along and defines
semantics for some elements like <BANANA>, I'm not sure how the model's
going to help us explain what happened, because everything was in the V1
defined set to begin with. Many languages in fact provide nontrivial
default semantics for their extension content -- our mythical PTHML/PCSS
allowed the extension content to be styled, but a PDOM might well have
allowed scripts to address the extensions, and many, many non-HTML
languages provide interesting default semantics for extension content
(store it, print it, etc.). I think these are really interesting use
cases, and I'll be disappointed if our formal models don't explain them
well.
Nonetheless, I think Tim and others are making a strong point that, in the
interesting sub-case where extension content is truly and completely
ignored, the defined-set/accept-set model gives a nice, clean,
set-oriented explanantion. I buy that. So, I think this is progress.
Thanks to Tim for being patient in working through this with me.
Noah
[1] http://www.w3.org/2001/tag/doc/versioning-20070518
[2] http://www.w3.org/2001/tag/2007/05/30-minutes#item06
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Wednesday, 20 June 2007 16:06:40 UTC