Re: Write-up about semantics in HTML5 from A List Apart

Hi Martin,

On Jan 7, 2009, at 12:17 PM, Martin Atkins wrote:

>
> Julian Reschke wrote:
>> Ian Hickson wrote:
>>> ...
>>>> But in XML based languages you can extend the vocabulary, and  
>>>> this you can't in HTML. At least not the way it's currently  
>>>> defined.
>>>
>>> Can you name a single unilateral extension to the HTML element  
>>> vocabulary that was a positive step forward in the development of  
>>> HTML? HTML is almost 20 years old now, and (despite this being non- 
>>> conforming) it has had its element name vocabulary unilaterally  
>>> extended many times. If being able to do this was ever going to be  
>>> a good thing, we'd have seen it by now. Have we?
>>> ...
>> Again, the "who" is not the problem. We discussed this before.
>> The problem is that even *if* a future W3C Working Group wants to  
>> add new elements, a change to the HTML parsing spec will be required.
>
> I think this is the most important point.
>
> It would be ideal if future versions of HTML would be parsable by  
> todays parsers, even if they ultimately ignore elements they don't  
> understand.
>
> The best example of this is void elements that get parsed as non- 
> void by legacy parsers; it is therefore not possible to use new void  
> elements without breaking software that employs legacy parsers,  
> since the entire tree after the new void element will be incorrect.
>
> A solution to this has been offered in the form of having the  
> <element/> form be treated as void for all unknown elements.
>
> I get the impression that Ian thinks this cure is worse than the  
> disease. It is true that some authors will go on writing <img>  
> (without the slash) and then be surprised when HTML5's new void  
> element <foo> requires a slash, or when <script /> doesn't work. It  
> seems like this sort of thing could be address with warnings from a  
> validator, though I concede that many authors don't employ  
> validators and would go on producing funky markup.


The use of a slash for unknown and newly introduced elements doesn't  
really affect the parsing of <img> and <script /> since those are both  
excepted from the slash void rule. True those would be document  
conformance errors for authors to include <img> without the slash or  
<script> with the slash. But HTML5 will have clear error-recovery  
mechanisms specified that is compatible with exiting content, so  
that's not a problem.

The other side of the coin that has not been discussed is also the  
current HTML5 parsing problems with unknown non-void head elements. We  
should also be specifying future parsing to allow for these, but the  
current parsing algorithm does not allow for that. With current  
parsing, void elements sucha s eventsource or command could be head  
only but by adding the slash void element parsing rule future HTML  
specifications would have greater flexibility in introducing new  
elements (and solving future problems in the future).

Take care,
Rob

Received on Wednesday, 7 January 2009 18:35:29 UTC