- From: <bugzilla@jessica.w3.org>
- Date: Sat, 29 Jan 2011 12:50:32 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11909 --- Comment #5 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-01-29 12:50:31 UTC --- Following the discussion with David, I would reformulate and expand the my suggested principles section like so: ]] Section I: Principle and base rules HTML-compatible XHTML documents are, syntactically, XML documents that are authored according to conditions that are set by the HTML DOM and scripted according the limitatations defined by XML and where the HTML-parser is triggered to use the most XML equivalent rendering mode (no-quirks mode) and the same CSS can be used in both XML-mode and HTML-mode. Thus HTML-compatibility means equivalence in the fields of DOM, CSS and scripting, irrespective of HTML-parsing or XML-parsing. Conformance (validity) of an HTML-compatible XHTML document is governed by the HTML-standard that the author has followed - this document examplifies how to create HTML5-conforming polyglot markup. The above leads to the following sentences about what HTML-compatible XHTML is: Polyglot Markup 1) is about how to replicate HTML's automatic DOM in XML; 2) follows a subset of well-formed XML where, HTML-conformance notwithstanding, it is the resulting HTML DOM which defines the XML-syntax rules. 3) is scripted according to the rules of XML (no document.write) 4) triggers non-quirks mode in HTML parsers since this is most equivalent to how XML mode rendering both with regard to DOM and CSS; 5) has some exceptions w.r.t. DOM-equivalence on attribute level due to some required XML namespace attributes. 6) rules out some HTML-elements because they are impossible to replicate in a XML parser; 7) results in the same encoding and the same language in both HTML-mode and XML-mode. 8) is validated for conformance according to an applicable HTML-standard - the HTML-conformance rules impacts on the DOM exceptionts w.r.t. what inequality, that is tolerable. 9) does not not need to be XML-valid. XML-validity requires a DTD, but HTML (in particular HTML5) seeks to avoid DTDs as they have no effect in HTML-parsers. DTD-authoring advice. <-- then I would outline those sentences/principles before, finally, describing HTML5-conforming polyglots: --> == 1. Replicating HTML's automatic DOM in XML == Extra rules from HTLM's point of view - but also from XML's point of view: In HTML, it is permitted to drop lots of syntax - as it get autocreated in the DOM. In XML there is no such automation, thus the code must be written explicitly. Thus one must use the "</p>", one muste use <hmtl>, <body>, <head>, <colgroup> etc. [Provide a list over the automated DOM-productions that HTML offers - this list can be updated as HTML6 is specced and so on.] Extra rule from HTML's point of view: Attribute normalization belongs here. Links to relevant sections in XML1 and HTML5. == 2. Subset of well-formed XML - governed by the resulting HTML-DOM == Describe exceptions from XML's POV: when <foo/> can be used and when <foo></foo> must be used. Etc. Without mixing conformance into the issue. Descripe the (most important) extra rules from HTML's POV: escaping '<' and '&' etc. == 3. Scripting == Document.write is forbidden - etc. == 4. No-quirks mode == Only no-quirks doctypes are permitted. Or else the page is rendered differently in HTML vs XML. A no-quirks triggering doctype is also, for this reason, required (except inside the @srcdoc attribute). Also, say that in some legacy HTML-parsers, then <?xml version="1.0' ?> triggers quirks. The same also happens (in IE6,IE7,IE8) if there is a <!--comment-->before the DOCTYPE. no-quirks is an absolute requirement. If legacy user agents with such behavior is not an issue, then neither the XML declaration or such comments are a problem (however, HTML-conformance rules may forbid them). == 5. Equality exceptions == xml:lang, xmlns etc are permitted despite that it results in a different DOM. Justification: required by XML. Unlese these differences were accepted, polyglots would not be possible. == 6. Banning of some HTML elements == Some HTML-element can't be used in XML. E.g. Noscript, plaintext, etc. == 7. Internationalization == Polyglot Markup needs both xml:lang and lang, or else we get a language difference. Polyglot Markup should use UTF-8, for such and such reasons: can be detected by XML-parser, HTML5-conformance permits it and more. Polyglot Markup which isn't UTF-8 or UTF-16 could use <?xml version="1.0" encoding="ISO-8859-1" ?>, however this could lead to non-polyglottness (quirks-mode) in some legacy parsers as well as non-validity in HTML5 - if this is an issue, then - for non-UTF8/16 encodings - authors *must* use an external HTTP header to set the encoding. Polyglot Markup RECOMMENDS UTF-8. == 8. Validation according to a HTML-standard == This specification does not say which HTML-standard to validate against, but defines general rules. However, HTML5 is the basis for our thinking. HTML5-validation is the only validation we are aware of which properly takes the DOM into account - other validation services, such as XHTML1.0 validation by W3C, is known for not taking into account the DOM. (That said, HTML5-validation follows many rules that are not at all related to the DOM.) == 9. XML-validity == XML-validity is only an issue if the DOCTPE contains a DTD. Some advice about how to, eventually, author a DTD - say that @id should be CDATA and so on. Say that @id in a polyglot is CDATA, and thus not subject to XML 1.0's name production. Section II: HTML5-specific examples <!-- Here most of what is already in the spec can be used. --> [[ -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Saturday, 29 January 2011 12:50:33 UTC