- From: Jon Barnett <jonbarnett@gmail.com>
- Date: Mon, 23 Jul 2007 13:02:45 -0500
- To: "Robert Burns" <rob@robburns.com>
- Cc: public-html <public-html@w3.org>
On 7/23/07, Robert Burns <rob@robburns.com> wrote: > > > This would not tell us how many people thought they were serving > XHTML as XML. I would show us how many people were authoring to XHTML > (possibly appendix C) perhaps for validation and authoring simplicity > reasons, but vending as text/html. It would also might serve as a > measure of how many pages will face some difficulties when switching > to XML. There's nothing wrong with producing XHTML code that meets > the appendix C guidelines and serving as text/html. At risk of beating a dead horse... Is your premise that there are authors who: - Use an XHTML 1.0 DOCTYPE declaration - Include the html/@xmlns attribute, and maybe even a <?xml?> PI. - Use XML syntax (ending empty elements with />, lowercase tags, etc) - Follow the guidelines of XHTML 1.0 Appendix C - Send this document as text/html - Are fully aware that this document is not treated as XHTML by any UA, and is parsed as plain HTML by every UA - Never intend to switch to serving the document as application/xhtml+xml, or have the document parsed as XML, without mowing through a long list of caveats not covered by Appendix C - ... and do all of these things solely because they prefer the syntax? (This is the important point, because I believe that's the crux of your premise) and then, - are aware that they would have this same feature-set by using an HTML 4.01 DOCTYPE and removing some / characters from the document (they're allowed in HTML 5) By observing the cowpath, that's not what I see. For example, I used to (and still rarely do) write XHTML documents and serve them as text/html. But I only did this with the intention that an XML parser could read it the same way, and that I intended to serve it as XML in the future. > > > > I'm afraid I can't offer anything > > other than anecdotes (experience on lots of forums, personal > > conversations, etc., the fact my college professor was teaching > > exactly what I learned to a number of other students), but the fact > > that this page exists says something: > > > > http://www.hixie.ch/advocacy/xhtml > > I'm not sure it odes say anything. There are lots of authors > authoring pages as XHTML and serving it as text/html (presumably > following more or less appendix C or they would be failing now). > > > >> Telling authors they're somehow made a mistake because their beating > >> down a cowpath that, for some strange reason you think is misguided, > >> does not make it any less of a cowpath. > > It's how you interpret the cowpath. I interpret it to mean that > > authors misunderstand how XHTML actually works. I think that teaching > > HTML as having XHTML-like syntax would lead to shock when the author > > first tries to do <p><ol></ol></p> > > That's invalid HTML4.01 and invalid XHTML1.0. This will be a problem > when we introduce HTML5 and authors run their documents through an > HTML5 conformance checker if the conformance checker doesn't throw up > an error. An XHTML1 validator should throw up an error for that too. > > >> No one has ever, as far as I > >> am aware, ever explained in a logical way, what could possibly be > >> wrong with authoring content that adheres to XHTML appendix C. It has > >> simply become a mantra amidst a certain web development clique. > >> ... > >> Those are very minor differences that would only be gotchas for those > >> ignoring Appendix C. Often authors are told to go with external > >> stylesheets and external scripts (so that takes care of CDATA > >> sections). Do that;, don't count on implicit elements; use Unicode > >> characters instead of named character entities and stick with DOM1 > >> through DOM3 and you'll be fine (oh and don't count on IE consuming > >> your content). There's no need to raise the Homeland Security alert > >> level over XHTML. It's just a few things to understand about it > >> before vending as XML. However, all that has nothing to do with the > >> other reason for following an appendix C syntax: for its consistency > >> and readability. > > > > All of those things you just mentioned are caveats when serving XHTML > > as text/html, and none of them are mentioned in in XHTML 1.0 Appendix > > C. > > No, those are not caveats for serving XHTML1.0 as text/html. Those > are caveats for those who, though they are successfully serving > XHTML1 as text/html, want to move to XML instead. > > However, from appendix C[1]: > quote/ > C.4. Embedded Style Sheets and Scripts > Use external style sheets if your style sheet uses < or & or ]]> or > --. Use external scripts if your script uses < or & or ]]> or --. > Note that XML parsers are permitted to silently remove the contents > of comments. Therefore, the historical practice of "hiding" scripts > and style sheets within "comments" to make the documents backward > compatible is likely to not work as expected in XML-based user agents. > > /unquote > > The other points only relate to vending that XHTML content as XML. > Authors vending as text/html need not concern themselves with those > (which is why they're not mentioned in appendix C; perhaps there > should have been an appendix E: moving your appendix C content to XML). > > > To that, I'll add that document.createElement(), one of the most basic > > DOM methods, creates an element without a namespace. If this > > quasi-XHTML eventually gets served as XHTML, even > > document.createElement would have unintended consequenses. > > > >> And it's not just a pedagogical issue. XML actually separates two > >> things that cannot be clearly separated in HTML: well-formedness and > >> validity. Take Henri's favorite example from HTML5: <p><ol><//o></ > >> p>.. In HTML5, this is perfectly valid and well-formed (presuming its > >> properly placed in a larger document). It's a part of a valid DOM > >> tree state. It's a valid XML serialization. However, it's not > >> possible to express this in HTML4 with MIME type text/html (I was > >> under the impression that it would be valid in HTML5, but Henri > >> suggests otherwise). > > > > It's mentioned here: > > http://www.whatwg.org/specs/web-apps/current-work/#element- > > restrictions > > I see. I hadn't yet read that part of the spec. I definitely support > this forking here, but we need to be extra careful about informing > readers of our recommendation about that. We might even want to > include some notation on the semantics chapter to help draw attention > to these varied content models. I had been wondering how those > content models were going to work in text/html, but I assumed that > the testing had already been done. > > >> Is it invalid in that the author > >> put an ordered list in a paragraph where it didn't belong? Or is it > >> ill-formed where the author included a closing </p> tag where it > >> didn't belong. > > > > The latter. It's invalid (or malformed) because there's a closing > > </p> tag where it didn't belong. The <p> element was implicitly > > closed when the parser reached the opening <ol> tag. > > No, you missed the point. It is definitely not the latter. The > author's intention was to invalidly place an ordered list into a > paragraph. The parser mis-guesses that it's instead an ill-formed > document fragment. That's the point of the example. Most of the HTML > error recover behaves this way. The point of the example is we start > from the author's intention: here to do something invalid. The text/ > html parser cannot tell the difference. So it assumes the wrong thing > here (wrong as in not what the author intended). (again this is off- > topic, but that is what XML introduces; a way for the parser to > always tell the difference, though it has to be well-formed before it > can move to the next step) I understood the point. You end up with an unintended consequence because an author understood a strict syntax and not what the parser would actually do. The author must because HTML syntax for what it is to prevent this. My interpretation of the mistake is correct: the parser followed the parsing rules of the spec. It's not the parser's fault if the author was taught XML-like HTML and not the actual rules of HTML.
Received on Monday, 23 July 2007 18:02:49 UTC