Re: Write-up about semantics in HTML5 from A List Apart from Robert J Burns on 2009-01-09 (public-html@w3.org from January 2009)

From: Robert J Burns <rob@robburns.com>
Date: Fri, 9 Jan 2009 16:46:59 -0600
To: HTML WG <public-html@w3.org>
Message-Id: <2E7C3830-DECF-4ACB-8A22-4726CEFB221A@robburns.com>
Hi Thomas,

On Jan 9, 2009, at 3:46 AM, Thomas Broyer wrote:

> On Thu, Jan 8, 2009 at 11:25 PM, Robert J Burns wrote:
>> Hi Thomas,
>>
>> On Jan 7, 2009, at 7:08 PM, Thomas Broyer wrote:
>>>
>>> On Wed, Jan 7, 2009 at 7:17 PM, Martin Atkins wrote:
>>>>
>>>> A solution to this has been offered in the form of having the  
>>>> <element/>
>>>> form be treated as void for all unknown elements.
>>>
>>> It wouldn't solve anything short-term, as current browsers parse
>>> unknown elements as non-void, whatever the "/>" syntax; only
>>> <element></element> would (and could lead to different DOMs being
>>> produced if an author uses <element>foo</element>: "foo" as a  
>>> child of
>>> <element> in HTML5 but "foo" as a sibling and </element> ignored in
>>> HTMLx).
>>
>> That is not true. IE parses unknown elements as void. <element></ 
>> element>
>> becomes:
>>
>> element
>> /element
>>
>> where two void elements are created as siblings of one another, one  
>> with the
>> name prefixed with a solidus.
>
> It actually depends whether you've used some script before or not:
> http://blog.whatwg.org/supporting-new-elements-in-ie

Of course. Using DOM scripting the elements can be added to the DOM  
correctly regardless of their content model. Likewise using the XML  
serialization makes all of this work. The thrust of this thread as I  
understand it however, is the text/html serialization and what can be  
done to improve the parsing and document conformance norms for that.

>>> There cannot be a single rule that would allow compatibility all  
>>> over
>>> the place (HTML6 docs in HTML5 UA, HTML5 in HTML6 UA; with the same
>>> DOM being produced); except not introducing any new void element
>>> and/or non-phrasing element; which is probably worse than having
>>> authors use middle-term workarounds.
>>
>> The use of the slash would solve the problem (along with the other  
>> proposed
>> solutions). In other words in both HTML5 and HTML6 parsers, the UAs  
>> would
>> know that any unknown (unknown as of right now) element with a  
>> slash is to
>> be parsed as a void element and any unknown element without the  
>> slash is to
>> be parsed as a non-void element.
>>
>> The difference between phrasing and structure elements is simply  
>> that the
>> latter implicitly closes a P element. I see no reason to make any  
>> more
>> elements that need to implicitly close a p element
>
> You are proposing introducing a discrepency in the language ("legacy"
> void elements do not require a trailing slash and "legacy" "block
> elements" imply a </P>; while "new" void elements require a trailing
> slash and "new" "block elements" require you to explicitly close your
> paragraphs if you don't want the element to become a child of it!),
> which will last for years; while there are workarounds (include "new"
> void elements at the end of an element –eventually inserting it within
> a <span>– as Ian proposed, or use an end tag for the void element; and
> explicitly close your elements before opening a "new" "block element")
> to have elements parsed appropriately by "legacy" parsers (HTML5
> parsers when you author an HTML6 document; HTML4 parsers when you
> author an HTML5 document, though it's a bit different in this case as
> HTML4 parsers have no "standard" behavior), which are only needed
> short-term to mid-term (depending on how fast new browser versions are
> adopted, and your audience).
>
> I'd expect those workarounds to be part of test suites.

Actually, I'm not proposing introducing a discrepancy. Rather I'm  
proposing making the language self-consistent permanently in a way  
that will not require these silly  work-arounds in the future (only  
for the HTML5 transition). I'm suggesting that for document  
conformance void elements always have a slash in text/html to indicate  
they are void elements (e.g., <img/> and <eventsource/>). Similarly  
I'm suggesting that for document conformance the P element always be  
explicitly closed (<p>some paragraph text</p>). So the document  
conformance norms I'm proposing have no discrepancy whatsoever. They  
are also quite consistent and easy to understand. In terms of document  
conformance we have only void and non-void elements (no more  
distinction between phrase which implicitly close a paragraph and  
structure which do not). Will authors learn the real secrets of HTML5  
and no precisely when they can omit </p> tags or that they can omit  
the "/" from <img/>. Certainly, but they will also be producing non- 
document conforming HTML5 (with what I'm suggesting). So the document  
conformance rules are quite simple and quite consistent with no  
discrepancies. The parsing on the other hand is complicated and  
riddled with special cases and exceptions. However as I said before,  
that's the case no matter how we specify the document conformance.

>
>
>> (the implied close of a p was probably a bad idea in retrospect to  
>> avoid a
>> very short tag).
>
> HTML was supposed to be SGML-based, where an UA would use the
> appropriate DTD (from its catalog), as given in the DOCTYPE, to
> determine how to parse a document.

Yes, I understand that. That's why I said "in retrospect" once we know  
that implementors would ignore, eschew and denigrate DTDs.

Take care,
Rob
Received on Friday, 9 January 2009 22:47:36 UTC