- From: Smylers <Smylers@stripey.com>
- Date: Sun, 1 Mar 2009 16:59:22 +0000
- To: public-html@w3.org
Philip TAYLOR writes: > Smylers wrote: > > > Philip TAYLOR writes: > > > > [...] I believe that the specification should set out to address > > > what is, and what is not, syntactically valid HTML; I do not think > > > that it should attempt to define what is /semantically/ valid > > > HTML, > > > > I'm struggling to see how that'd be possible: how can we define, say, > > the <h1> element without saying that its contents will be interpreted as > > a heading? > > Agreed. > > > That's saying that the contents of <h1> have the semantics of a > > heading -- and therefore that to put something inside <h1> that > > isn't a heading will cause an HTML-5-compliant interpretation of the > > document to ascribe to it a different meaning from that intended by > > its author. In other words, the document with the meaning the > > author intended isn't valid according to HTML 5. > > Not agreed :-) I'm presuming that of the two sentences of mine you quote above you only disagree with the second. > The specification must indicate the semantics of each element, but if > an author chooses to use (or erroneously uses) the element with > different semantics, then that cannot make the document invalid. Well it means that the meaning of the document (so far as its author is concerned) can't be obtained by following the specification. A reader would have to explicitly violate the standard in order to get the author's interpretation. That seems asymmetric to me: we'd be assigning semantics to elements and telling consumers to use them when interpreting documents, but not telling producers they'd have to use those semantics to generate documents, that it'd be acceptable for producers to make up their own semantics. That'd would mean that given a valid HTML 5 document, an HTML 5 parser isn't necessarily sufficient to be able to interpret it. In which case we'd need another word to describe a document which is both syntactically _and_ semantically valid -- one that can usefully and interoperably be interpreted according to the spec. > I'm not even sure what adjective does describe such a document, but I > am convinced that "invalid" is inappropriate here. I'd say it's syntactically valid -- where the modifier before "valid" indicates that this is a subset of being completely valid. Or possibly "machine checkably valid" -- automated tools aren't able to spot any invalidity (though that's possibly unnecessarily ruling out advances in validator technology). > The classical example is as follows : > > 1. Colorless green ideas sleep furiously. > 2. Furiously sleep ideas green colorless. > > (1) is a sentence; it follows all the syntactic rules of the English > language, yet makes no sense. OK, I follow the analogy. But I think what matters is what the author of that sentences intended its meaning to be[*1]. If the author intended it to be confusing gibberish then readers can attempt to interpret it according to the normal rules of English and correctly will discover it to be gibberish. However if the author intended it to convey a particular meaning then it's unlikely that any of his audience would correctly divine that, without any out-of-band assistance. In which case it fails as English, since it has not communicated its intent; the author needs to be told that for successful communication he needs to reformulate his sentence. Similarly with HTML, if an author has failed to express semantics which will be correctly understood by her audience then it isn't really HTML. > much as I am on the side of those who want tables to be used solely to > present tabular data, I am not convinced that there do not exists > borderline cases in which the author believes that what he or she is > communicating is tabular data whilst others would assert that the > table is being (ab)used to communicate layout. I agree there are borderline cases; I think it's almost inevitable that there will be documents where it's arguably whether their semantics are within those defined in HTML 5. But (since HTML 5 is going to define the meaning of elements anyway) that's going to be the case anyway, regardless of whether HTML 5 proscribes documents that fail to use those semantics. So I don't see what avoiding such proscription gains us. Smylers [*1] Yes, I'm aware of the author of that particular sentence and that he intended it as an example. But let's pretend somebody had used them earnestly in prose.
Received on Sunday, 1 March 2009 17:00:04 UTC