- From: Robert J Burns <rob@robburns.com>
- Date: Sat, 10 Jan 2009 21:43:25 -0600
- To: HTML WG <public-html@w3.org>
Hi Thomas, On Jan 10, 2009, at 8:53 PM, Thomas Broyer wrote: > On Fri, Jan 9, 2009 at 11:46 PM, Robert J Burns wrote: >> >> On Jan 9, 2009, at 3:46 AM, Thomas Broyer wrote: >> >>> On Thu, Jan 8, 2009 at 11:25 PM, Robert J Burns wrote: >>>> >>>> That is not true. IE parses unknown elements as void. <element></ >>>> element> >>>> becomes: >>>> >>>> element >>>> /element >>>> >>>> where two void elements are created as siblings of one another, >>>> one with >>>> the name prefixed with a solidus. >>> >>> It actually depends whether you've used some script before or not: >>> http://blog.whatwg.org/supporting-new-elements-in-ie >> >> Of course. Using DOM scripting the elements can be added to the DOM >> correctly regardless of their content model. Likewise using the XML >> serialization makes all of this work. The thrust of this thread as I >> understand it however, is the text/html serialization and what can >> be done >> to improve the parsing and document conformance norms for that. > > And that's the point of the linked article: if you do a > document.createElement() in IE, then you change the parser's behavior > for that element! Yes, I understand but resorting to javascript is not what I had in mind. Ideally we don't want to make changes to HTML's vocabulary dependent on javascript. > > >>> You are proposing introducing a discrepency in the language >>> ("legacy" >>> void elements do not require a trailing slash and "legacy" "block >>> elements" imply a </P>; while "new" void elements require a trailing >>> slash and "new" "block elements" require you to explicitly close >>> your >>> paragraphs if you don't want the element to become a child of it!), >>> which will last for years; while there are workarounds (include >>> "new" >>> void elements at the end of an element –eventually inserting it >>> within >>> a <span>– as Ian proposed, or use an end tag for the void element; >>> and >>> explicitly close your elements before opening a "new" "block >>> element") >>> to have elements parsed appropriately by "legacy" parsers (HTML5 >>> parsers when you author an HTML6 document; HTML4 parsers when you >>> author an HTML5 document, though it's a bit different in this case >>> as >>> HTML4 parsers have no "standard" behavior), which are only needed >>> short-term to mid-term (depending on how fast new browser versions >>> are >>> adopted, and your audience). >>> >>> I'd expect those workarounds to be part of test suites. >> >> Actually, I'm not proposing introducing a discrepancy. > > "legacy" void elements would still have to be parsed as void elements > in the absence of a trailing slash. If the same parsing rule isn't > used for "new" void elements, I call it a discrepancy. I wouldn't call that a discrepancy. Every element in the text/html serialization has unique treatment. They can often be grouped in to like elements, but with my suggestion going forward we would have an end to such discrepancies. The only discrepancies are the ones that already exist for the legacy support of existing elements (and existing content in particular). > > >> Rather I'm proposing >> making the language self-consistent permanently in a way that will >> not >> require these silly work-arounds in the future (only for the HTML5 >> transition). I'm suggesting that for document conformance void >> elements >> always have a slash in text/html to indicate they are void elements >> (e.g., >> <img/> and <eventsource/>). Similarly I'm suggesting that for >> document >> conformance the P element always be explicitly closed (<p>some >> paragraph >> text</p>). So the document conformance norms I'm proposing have no >> discrepancy whatsoever. They are also quite consistent and easy to >> understand. In terms of document conformance we have only void and >> non-void >> elements (no more distinction between phrase which implicitly close a >> paragraph and structure which do not). Will authors learn the real >> secrets >> of HTML5 and no precisely when they can omit </p> tags or that they >> can omit >> the "/" from <img/>. Certainly, but they will also be producing non- >> document >> conforming HTML5 (with what I'm suggesting). So the document >> conformance >> rules are quite simple and quite consistent with no discrepancies. >> The >> parsing on the other hand is complicated and riddled with special >> cases and >> exceptions. However as I said before, that's the case no matter how >> we >> specify the document conformance. > > So a conforming HTML4 document cannot ever be a conforming HTML5 > document (as soon as it uses a void element), and the other way > around; because the "/" in SGML would end the tag, and the ">" would > end up being character data (which causes problems in HEAD). > As a test, take any text/html document with a <link /> or <meta /> and > give it to the W3C Validator (and force validation as HTML4 if the > DOCTYPE says XHTML). Right it would not necessarily be valid HTMl 4.01 (though it could be made valid 4.02 with a simple change to the DTD). SGML serializations should not be our concern since an SGML processing application will expect changes to the schema and be able to process those changes from any new DTD (4.02, 5 or otherwise). Our only concern then needs to be with text/html specific serialization processing applications. Those will treat the existing void elements as void even without the slash and can be updated to treat unknown elements with a slash as void as well. Problem solved. > > > And conformance is one thing, but what matters to authors is also > consistent parsing behavior. So you propose that the workarounds be > needed only for the tagsoup-to-HTML5 transition, at the cost of making > parsing less lenient (and quite more complex for implementors; which > is probably not a good thing, as adoption of HTML5 moslty depends on > implementors, not only authors), instead of between each later HTML > revision (but then only for new void and non-phrasing elements)? > Well, my preference would still go for the latter. I'm not following you here at all. If authors can use a transitional serialization that works in old browsers or current browsers, but we can help make future changes to HTML easier, then I don't see the problem. As I said before everything is consistent for authors. Void elements have a slash. Non-void elements do not have a slash. If our goal is to make this work with SGML too, then we simply need to add a DTD (say one for HTML 4.02 as well). No inconsistencies and no discrepancies (other than the discrepancies HTML parsing is all ready riddled with like rearranging table descendant elements and the like). Take care, Rob
Received on Sunday, 11 January 2009 03:44:12 UTC