Re: Draft text for summary attribute definition from Smylers on 2009-03-01 (public-html@w3.org from March 2009)

From: Smylers <Smylers@stripey.com>
Date: Sun, 1 Mar 2009 16:59:22 +0000
To: public-html@w3.org
Message-ID: <20090301165922.GF16780@stripey.com>
Philip TAYLOR writes:

> Smylers wrote:
> 
> > Philip TAYLOR writes:
> 
> > > [...] I believe that the specification should set out to address
> > > what is, and what is not, syntactically valid HTML; I do not think
> > > that it should attempt to define what is /semantically/ valid
> > > HTML,
> > 
> > I'm struggling to see how that'd be possible: how can we define, say,
> > the <h1> element without saying that its contents will be interpreted as
> > a heading?
> 
> Agreed.
> 
> > That's saying that the contents of <h1> have the semantics of a
> > heading -- and therefore that to put something inside <h1> that
> > isn't a heading will cause an HTML-5-compliant interpretation of the
> > document to ascribe to it a different meaning from that intended by
> > its author.  In other words, the document with the meaning the
> > author intended isn't valid according to HTML 5.
> 
> Not agreed :-)

I'm presuming that of the two sentences of mine you quote above you only
disagree with the second.

> The specification must indicate the semantics of each element, but if
> an author chooses to use (or erroneously uses) the element with
> different semantics, then that cannot make the document invalid.

Well it means that the meaning of the document (so far as its author is
concerned) can't be obtained by following the specification.  A reader
would have to explicitly violate the standard in order to get the
author's interpretation.

That seems asymmetric to me: we'd be assigning semantics to elements and
telling consumers to use them when interpreting documents, but not
telling producers they'd have to use those semantics to generate
documents, that it'd be acceptable for producers to make up their own
semantics.

That'd would mean that given a valid HTML 5 document, an HTML 5 parser
isn't necessarily sufficient to be able to interpret it.

In which case we'd need another word to describe a document which is
both syntactically _and_ semantically valid -- one that can usefully and
interoperably be interpreted according to the spec.

> I'm not even sure what adjective does describe such a document, but I
> am convinced that "invalid" is inappropriate here.

I'd say it's syntactically valid -- where the modifier before "valid"
indicates that this is a subset of being completely valid.  Or possibly
"machine checkably valid" -- automated tools aren't able to spot any
invalidity (though that's possibly unnecessarily ruling out advances in
validator technology).

> The classical example is as follows :
> 
>    1. Colorless green ideas sleep furiously.
>    2. Furiously sleep ideas green colorless.
> 
> (1) is a sentence; it follows all the syntactic rules of the English
> language, yet makes no sense.

OK, I follow the analogy.  But I think what matters is what the author
of that sentences intended its meaning to be[*1].  If the author
intended it to be confusing gibberish then readers can attempt to
interpret it according to the normal rules of English and correctly will
discover it to be gibberish.

However if the author intended it to convey a particular meaning then
it's unlikely that any of his audience would correctly divine that,
without any out-of-band assistance.  In which case it fails as English,
since it has not communicated its intent; the author needs to be told
that for successful communication he needs to reformulate his sentence.

Similarly with HTML, if an author has failed to express semantics which
will be correctly understood by her audience then it isn't really HTML.

> much as I am on the side of those who want tables to be used solely to
> present tabular data, I am not convinced that there do not exists
> borderline cases in which the author believes that what he or she is
> communicating is tabular data whilst others would assert that the
> table is being (ab)used to communicate layout.

I agree there are borderline cases; I think it's almost inevitable that
there will be documents where it's arguably whether their semantics are
within those defined in HTML 5.  But (since HTML 5 is going to define
the meaning of elements anyway) that's going to be the case anyway,
regardless of whether HTML 5 proscribes documents that fail to use those
semantics.

So I don't see what avoiding such proscription gains us.

Smylers

[*1]  Yes, I'm aware of the author of that particular sentence and that
he intended it as an example.  But let's pretend somebody had used them
earnestly in prose.
Received on Sunday, 1 March 2009 17:00:04 UTC