RE: What "ignore" means (was: RE: Defined sets, accept sets, and <banana> elements)

David Orchard writes:

> perhaps "Must Accept Unknowns".

I'm not convinced the word "unknown" is right either.   For example, the 
HTML 2.0 specification says [1]:

-------
"A name consists of a letter followed by letters, digits, periods, or 
hyphens. The length of a name is limited to 72 characters by the `NAMELEN' 
parameter in the SGML declaration for HTML, section SGML Declaration for 
HTML. Element and attribute names are not case sensitive, but entity names 
are. For example, `<BLOCKQUOTE>', `<BlockQuote>', and `<blockquote>' are 
equivalent, whereas `&amp;' is different from `&AMP;'."
-------

So, in terms of defining legal tags, <banana> is "known" as well as <p>. 
Not just any tag is legal though:  <2bananas> is not.  So, in what sense 
exactly is <banana> "unknown"?  Well, it's part of a very large (but 
bounded, due to the 72 character limit) set of tags which are 
differentiated from each other only by their names.  In other respects, 
their differences are not called out by the specification.  I don't think 
"unknown" is particularly suggestive of that.  What you're trying to 
capture, obviously, is that <p> and <blockquote> are distinguished by more 
than the spelling of their tags in V1:  particular and differing semantics 
are explicitly supplied for each.

By the way, the quote above also happens to illustrate the 
uppercase/lowercase equivalence mentioned in my note of earlier today. The 
specification makes absolutely clear that "<BLOCKQUOTE>', `<BlockQuote>', 
and `<blockquote>' are equivalent", yet surely at least one of them is 
"known" in the sense you mean.  Are the others?

I'm increasingly convinced that the key to the sort of versioning we're 
talking about is not so much what's known and unknown, but which 
differences in V1 input are significant, and which differences aren't. 
When we go to version 2, we tend to provide more explicit semantics for 
things that were distinguished in V1 only by their tags (or the contents 
of their text.)  Thus, in your name language, V2 tells us that <middle> 
really has the semantic of being a middle name.  In v1, all we knew was 
that the tag <middle> was spelled differently from <last> or <banana>.   I 
have some vague ideas about how one might formulate a story, but probably 
no time to think them through before vacation.  I think your proposal to 
start improvising around integrating the HTML <banana> use case into the 
draft is a good next step.   Thanks!

Noah

[1] http://www.w3.org/MarkUp/html-spec/html-spec_3.html#SEC3.2.3

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"David Orchard" <dorchard@bea.com>
Sent by: www-tag-request@w3.org
06/25/2007 02:25 PM
 
        To:     <noah_mendelsohn@us.ibm.com>
        cc:     "Tim Berners-Lee" <timbl@w3.org>, <www-tag@w3.org>
        Subject:        RE: What "ignore" means (was: RE: Defined sets, 
accept sets, and <banana> elements)



As I proposed on the TAG call, perhaps "Must Accept Unknowns".

I'll think about your usecase a bit..

Cheers,
Dave

> -----Original Message-----
> From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] 
> Sent: Monday, June 25, 2007 11:18 AM
> To: David Orchard
> Cc: Tim Berners-Lee; www-tag@w3.org
> Subject: What "ignore" means (was: RE: Defined sets, accept 
> sets, and <banana> elements)
> 
> Dave,
> 
> > I agree that the definition of "ignore" needs elaboration.
> 
> As we discussed on the TAG call a few minutes ago, I think 
> you're focussing on just the right question, though I remain 
> unsure that "ignore" 
> will in most cases be the most appropriate word.  Repeating 
> what we said there, my intuition is that "accept" is about 
> the right word to describe a text that is consumable by some 
> application or conformant with some 
> specification.   I think many languages provide generic semantics for 
> handling content that is not explicitly described 
> (whitespace, comments, extension elements, diffferences in 
> text that are not significant, etc.) 
> 
> Here's a use case I mentioned in passing.  I think it's a 
> common one, and a good test of our terminology.  Consider a 
> name language in which, in version 1, the case of the input 
> (I.e. upper or lower case) is no more (or
> less) relevant than <banana> elements in HTML.  So:
> 
>         name1.nam:
> 
>         bob smith
> 
> is for most purposes semantically equivalent to 
> 
>         name2.nam:
> 
>         Bob Smith
> 
> I say for the most part, because just as the HTML DOM allows 
> you to see <banana> elements, the V1 specification for my 
> language says:
> 
> "Case is in general not significant in version 1 of this 
> language.  A keyword such as APPLE is generally treated the 
> same way as apple. However, when displaying, printing or 
> storing the text of a document, applications SHOULD preserve 
> the supplied case.  Note:  it is possible that subsequent 
> versions of this language specification will consider case to 
> be significant.  Accordingly, version 1 documents SHOULD be 
> created as lower case only (indeed, applications receiving 
> mixed or uppercase documents may assume that they were 
> created by software written to later versions of this specification.)"
> 
> So, I'd say that's a good example of future proofing.  If the 
> defined set is lowercase, then each acceptable document 
> either is in or has an equivalent document in the defined 
> set.  The interesting question is: 
> what's being ignored?  It's certainly not any of the 
> characters.  While it so happens that in some encodings, case 
> is a separate bit, we can't assume that. 
> 
> Note that when we talk about generic semantics, the story 
> gets easier than if we're looking for something to ignore. 
> The generic semantics is: 
> retain case for printing, storage, etc., and otherwise treat 
> as if uppercase is mapped to lowercase. 
> 
> I think it would be good if our terminology were suitable for 
> such forms. 
> First of all, they're simple and quite common.  Secondly, 
> it's a good way to make sure we haven't slipped in 
> assumptions we didn't intend. 
> 
> Anyway:  I'm really glad we're starting to look more 
> carefully at the assumptions behind "ignore".  I've always 
> felt that to be where the meat of the problem is.  Thanks!
> 
> Noah
> 
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
> 
> 
> 
> 
> 
> 
> 
> 
> "David Orchard" <dorchard@bea.com>
> 06/21/2007 04:59 PM
> 
>         To:     <noah_mendelsohn@us.ibm.com>
>         cc:     "Tim Berners-Lee" <timbl@w3.org>, <www-tag@w3.org>
>         Subject:        RE: Defined sets, accept sets, and <banana> 
> elements
> 
> 
> I agree that the definition of "ignore" needs elaboration.  I think
> there are at least 2 major flavours: ignore and delete, and ignore and
> retain. 
> 
> Given that you agree that "weaving" is a good next step, what do you
> think about "weaving" by reference to a micro-finding rather than
> weaving into the text? 
> 
> Cheers,
> Dave 
> 
> > -----Original Message-----
> > From: noah_mendelsohn@us.ibm.com 
> [mailto:noah_mendelsohn@us.ibm.com] 
> > Sent: Thursday, June 21, 2007 1:01 PM
> > To: David Orchard
> > Cc: Tim Berners-Lee; www-tag@w3.org
> > Subject: RE: Defined sets, accept sets, and <banana> elements
> > 
> > David Orchard wrote:
> > 
> > > I had an action item, either official or unofficial, to 
> > weave a story 
> > > like this into the finding.  I suggested to you and the tag 
> > that your 
> > > material could be either incorporated into the finding or as a 
> > > separate micro-finding, and that I'd do any extra work 
> > required.  You 
> > > didn't support either of those options,
> > 
> > To be clear, you are welcome to "weave" what I wrote into the 
> > finding if you think that's the right next step.  I was 
> > merely suggesting a direction that I thought would be 
> > interesting, and that would involve doing a bit more 
> > investigation and consensus building before we decide what to 
> > put in the finding.  One way or the other, I strongly believe 
> > that we need to think hard about, and probably tell a story 
> > in the finding about, languages which have semantics other 
> > than "ignore" for extension content. 
> > By all means, if you think my email is the right basis for 
> > telling that story, then integrate it, and we'll see what the 
> > reaction is.  Sorry if my original email was confusing.
> > 
> > --------------------------------------
> > Noah Mendelsohn
> > IBM Corporation
> > One Rogers Street
> > Cambridge, MA 02142
> > 1-617-693-4036
> > --------------------------------------
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > "David Orchard" <dorchard@bea.com>
> > 06/21/2007 03:17 PM
> > 
> >         To:     <noah_mendelsohn@us.ibm.com>
> >         cc:     "Tim Berners-Lee" <timbl@w3.org>, <www-tag@w3.org>
> >         Subject:        RE: Defined sets, accept sets, and <banana> 
> > elements
> > 
> > 
> > I had an action item, either official or unofficial, to 
> weave a story
> > like this into the finding.  I suggested to you and the tag 
> that your
> > material could be either incorporated into the finding or as 
> > a separate
> > micro-finding, and that I'd do any extra work required.  You didn't
> > support either of those options, so I'm not interested in 
> duplicating
> > such work by completing my action using separate material.
> > 
> > Cheers,
> > Dave
> > 
> > > -----Original Message-----
> > > From: noah_mendelsohn@us.ibm.com 
> > [mailto:noah_mendelsohn@us.ibm.com] 
> > > Sent: Thursday, June 21, 2007 12:14 PM
> > > To: David Orchard
> > > Cc: Tim Berners-Lee; www-tag@w3.org
> > > Subject: RE: Defined sets, accept sets, and <banana> elements
> > > 
> > > David Orchard writes:
> > > 
> > > > I had an action item, either official or unofficial, to 
> > > weave a story 
> > > > like this into the finding.  I think that action is now closed..
> > > 
> > > Well, I've suggested a direction for deciding what to do, but 
> > > it's just my opinion.  Are you suggesting that you are going 
> > > to go through the steps I suggested and update the finding 
> > > depending on what results?  Thanks.
> > > 
> > > Noah
> > > 
> > > --------------------------------------
> > > Noah Mendelsohn
> > > IBM Corporation
> > > One Rogers Street
> > > Cambridge, MA 02142
> > > 1-617-693-4036
> > > --------------------------------------
> > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> 
> 
> 

Received on Monday, 25 June 2007 21:55:48 UTC