Re: Write-up about semantics in HTML5 from A List Apart

Hi Thomas,

On Jan 10, 2009, at 8:53 PM, Thomas Broyer wrote:

> On Fri, Jan 9, 2009 at 11:46 PM, Robert J Burns wrote:
>>
>> On Jan 9, 2009, at 3:46 AM, Thomas Broyer wrote:
>>
>>> On Thu, Jan 8, 2009 at 11:25 PM, Robert J Burns wrote:
>>>>
>>>> That is not true. IE parses unknown elements as void. <element></ 
>>>> element>
>>>> becomes:
>>>>
>>>> element
>>>> /element
>>>>
>>>> where two void elements are created as siblings of one another,  
>>>> one with
>>>> the name prefixed with a solidus.
>>>
>>> It actually depends whether you've used some script before or not:
>>> http://blog.whatwg.org/supporting-new-elements-in-ie
>>
>> Of course. Using DOM scripting the elements can be added to the DOM
>> correctly regardless of their content model. Likewise using the XML
>> serialization makes all of this work. The thrust of this thread as I
>> understand it however, is the text/html serialization and what can  
>> be done
>> to improve the parsing and document conformance norms for that.
>
> And that's the point of the linked article: if you do a
> document.createElement() in IE, then you change the parser's behavior
> for that element!

Yes, I understand but resorting to javascript is not what I had in  
mind. Ideally we don't want to make changes to HTML's vocabulary  
dependent on javascript.

>
>
>>> You are proposing introducing a discrepency in the language  
>>> ("legacy"
>>> void elements do not require a trailing slash and "legacy" "block
>>> elements" imply a </P>; while "new" void elements require a trailing
>>> slash and "new" "block elements" require you to explicitly close  
>>> your
>>> paragraphs if you don't want the element to become a child of it!),
>>> which will last for years; while there are workarounds (include  
>>> "new"
>>> void elements at the end of an element –eventually inserting it  
>>> within
>>> a <span>– as Ian proposed, or use an end tag for the void element;  
>>> and
>>> explicitly close your elements before opening a "new" "block  
>>> element")
>>> to have elements parsed appropriately by "legacy" parsers (HTML5
>>> parsers when you author an HTML6 document; HTML4 parsers when you
>>> author an HTML5 document, though it's a bit different in this case  
>>> as
>>> HTML4 parsers have no "standard" behavior), which are only needed
>>> short-term to mid-term (depending on how fast new browser versions  
>>> are
>>> adopted, and your audience).
>>>
>>> I'd expect those workarounds to be part of test suites.
>>
>> Actually, I'm not proposing introducing a discrepancy.
>
> "legacy" void elements would still have to be parsed as void elements
> in the absence of a trailing slash. If the same parsing rule isn't
> used for "new" void elements, I call it a discrepancy.

I wouldn't call that a discrepancy. Every element in the text/html  
serialization has unique treatment. They can often be grouped in to  
like elements, but with my suggestion going forward we would have an  
end to such discrepancies. The only discrepancies are the ones that  
already exist for the legacy support of existing elements (and  
existing content in particular).

>
>
>> Rather I'm proposing
>> making the language self-consistent permanently in a way that will  
>> not
>> require these silly  work-arounds in the future (only for the HTML5
>> transition). I'm suggesting that for document conformance void  
>> elements
>> always have a slash in text/html to indicate they are void elements  
>> (e.g.,
>> <img/> and <eventsource/>). Similarly I'm suggesting that for  
>> document
>> conformance the P element always be explicitly closed (<p>some  
>> paragraph
>> text</p>). So the document conformance norms I'm proposing have no
>> discrepancy whatsoever. They are also quite consistent and easy to
>> understand. In terms of document conformance we have only void and  
>> non-void
>> elements (no more distinction between phrase which implicitly close a
>> paragraph and structure which do not). Will authors learn the real  
>> secrets
>> of HTML5 and no precisely when they can omit </p> tags or that they  
>> can omit
>> the "/" from <img/>. Certainly, but they will also be producing non- 
>> document
>> conforming HTML5 (with what I'm suggesting). So the document  
>> conformance
>> rules are quite simple and quite consistent with no discrepancies.  
>> The
>> parsing on the other hand is complicated and riddled with special  
>> cases and
>> exceptions. However as I said before, that's the case no matter how  
>> we
>> specify the document conformance.
>
> So a conforming HTML4 document cannot ever be a conforming HTML5
> document (as soon as it uses a void element), and the other way
> around; because the "/" in SGML would end the tag, and the ">" would
> end up being character data (which causes problems in HEAD).
> As a test, take any text/html document with a <link /> or <meta /> and
> give it to the W3C Validator (and force validation as HTML4 if the
> DOCTYPE says XHTML).

Right it would not necessarily be valid HTMl 4.01 (though it could be  
made valid 4.02 with a simple change to the DTD). SGML serializations  
should not be our concern since an SGML processing application will  
expect changes to the schema and be able to process those changes from  
any new DTD (4.02, 5 or otherwise). Our only concern then needs to be  
with text/html specific serialization processing applications. Those  
will treat the existing void elements as void even without the slash  
and can be updated to treat unknown elements with a slash as void as  
well. Problem solved.

>
>
> And conformance is one thing, but what matters to authors is also
> consistent parsing behavior. So you propose that the workarounds be
> needed only for the tagsoup-to-HTML5 transition, at the cost of making
> parsing less lenient (and quite more complex for implementors; which
> is probably not a good thing, as adoption of HTML5 moslty depends on
> implementors, not only authors), instead of between each later HTML
> revision (but then only for new void and non-phrasing elements)?
> Well, my preference would still go for the latter.

I'm not following you here at all. If authors can use a transitional  
serialization that works in old browsers or current browsers, but we  
can help make future changes to HTML easier, then I don't see the  
problem. As I said before everything is consistent for authors. Void  
elements have a slash. Non-void elements do not have a slash. If our  
goal is to make this work with SGML too, then we simply need to add a  
DTD (say one for HTML 4.02 as well). No inconsistencies and no  
discrepancies (other than the discrepancies HTML parsing is all ready  
riddled with like rearranging table descendant elements and the like).

Take care,
Rob

Received on Sunday, 11 January 2009 03:44:12 UTC