Re: Write-up about semantics in HTML5 from A List Apart from Thomas Broyer on 2009-01-11 (public-html@w3.org from January 2009)

From: Thomas Broyer <t.broyer@ltgt.net>
Date: Sun, 11 Jan 2009 03:53:10 +0100
To: public-html <public-html@w3.org>
Message-ID: <a9699fd20901101853u6b7afbe1vd8997ff4d991081b@mail.gmail.com>
On Fri, Jan 9, 2009 at 11:46 PM, Robert J Burns wrote:
>
> On Jan 9, 2009, at 3:46 AM, Thomas Broyer wrote:
>
>> On Thu, Jan 8, 2009 at 11:25 PM, Robert J Burns wrote:
>>>
>>> That is not true. IE parses unknown elements as void. <element></element>
>>> becomes:
>>>
>>> element
>>> /element
>>>
>>> where two void elements are created as siblings of one another, one with
>>> the name prefixed with a solidus.
>>
>> It actually depends whether you've used some script before or not:
>> http://blog.whatwg.org/supporting-new-elements-in-ie
>
> Of course. Using DOM scripting the elements can be added to the DOM
> correctly regardless of their content model. Likewise using the XML
> serialization makes all of this work. The thrust of this thread as I
> understand it however, is the text/html serialization and what can be done
> to improve the parsing and document conformance norms for that.

And that's the point of the linked article: if you do a
document.createElement() in IE, then you change the parser's behavior
for that element!

>> You are proposing introducing a discrepency in the language ("legacy"
>> void elements do not require a trailing slash and "legacy" "block
>> elements" imply a </P>; while "new" void elements require a trailing
>> slash and "new" "block elements" require you to explicitly close your
>> paragraphs if you don't want the element to become a child of it!),
>> which will last for years; while there are workarounds (include "new"
>> void elements at the end of an element –eventually inserting it within
>> a <span>– as Ian proposed, or use an end tag for the void element; and
>> explicitly close your elements before opening a "new" "block element")
>> to have elements parsed appropriately by "legacy" parsers (HTML5
>> parsers when you author an HTML6 document; HTML4 parsers when you
>> author an HTML5 document, though it's a bit different in this case as
>> HTML4 parsers have no "standard" behavior), which are only needed
>> short-term to mid-term (depending on how fast new browser versions are
>> adopted, and your audience).
>>
>> I'd expect those workarounds to be part of test suites.
>
> Actually, I'm not proposing introducing a discrepancy.

"legacy" void elements would still have to be parsed as void elements
in the absence of a trailing slash. If the same parsing rule isn't
used for "new" void elements, I call it a discrepancy.

> Rather I'm proposing
> making the language self-consistent permanently in a way that will not
> require these silly  work-arounds in the future (only for the HTML5
> transition). I'm suggesting that for document conformance void elements
> always have a slash in text/html to indicate they are void elements (e.g.,
> <img/> and <eventsource/>). Similarly I'm suggesting that for document
> conformance the P element always be explicitly closed (<p>some paragraph
> text</p>). So the document conformance norms I'm proposing have no
> discrepancy whatsoever. They are also quite consistent and easy to
> understand. In terms of document conformance we have only void and non-void
> elements (no more distinction between phrase which implicitly close a
> paragraph and structure which do not). Will authors learn the real secrets
> of HTML5 and no precisely when they can omit </p> tags or that they can omit
> the "/" from <img/>. Certainly, but they will also be producing non-document
> conforming HTML5 (with what I'm suggesting). So the document conformance
> rules are quite simple and quite consistent with no discrepancies. The
> parsing on the other hand is complicated and riddled with special cases and
> exceptions. However as I said before, that's the case no matter how we
> specify the document conformance.

So a conforming HTML4 document cannot ever be a conforming HTML5
document (as soon as it uses a void element), and the other way
around; because the "/" in SGML would end the tag, and the ">" would
end up being character data (which causes problems in HEAD).
As a test, take any text/html document with a <link /> or <meta /> and
give it to the W3C Validator (and force validation as HTML4 if the
DOCTYPE says XHTML).

And conformance is one thing, but what matters to authors is also
consistent parsing behavior. So you propose that the workarounds be
needed only for the tagsoup-to-HTML5 transition, at the cost of making
parsing less lenient (and quite more complex for implementors; which
is probably not a good thing, as adoption of HTML5 moslty depends on
implementors, not only authors), instead of between each later HTML
revision (but then only for new void and non-phrasing elements)?
Well, my preference would still go for the latter.



-- 
Thomas Broyer
Received on Sunday, 11 January 2009 02:53:46 UTC