- From: Robert Burns <rob@robburns.com>
- Date: Mon, 23 Jul 2007 10:07:53 -0500
- To: Jon Barnett <jonbarnett@gmail.com>
- Cc: public-html <public-html@w3.org>
On Jul 23, 2007, at 8:30 AM, Jon Barnett wrote: > > On 7/21/07, Robert Burns <rob@robburns.com> wrote: > >> >> It may not be a popularity contest, but it is a relatively well- >> beaten cowpath. By cowpath principle, authors are demonstrating that >> an HTML syntax (and implementation enhancements) that allows writing >> once and deploying either as XML or as text/html is desirable. > > If anything, authors are demonstrating that they think that they *are* > using XML, even though they're serving it as text/html. There is an > astounding number of authors who write HTML for money and don't know > the difference. I used to be one of them. Well, I'm not sure we can conclude from one anecdote, what everyone else thinks about their HTML. In some ways, though, I think you may be right: authors think they're wiring XHTML (they are using XML, even if they don't serve it as such) . And many of them are. Sure, there are all sorts of subtle distinctions that might hit these authors if they switched to actually vend their XHTML as XML. That's a bit of confusion that needs to be cleared up, and I would certainly take every opportunity to make sure others understood that.. However, it still indicates many authors wanting to author using XHTML: and following appendix C. If authors author with XHTML and follow appendix C, I see few difficulties they would face if they wanted to switch over. The biggest problems would have to do with the immaturity of XHTM implementations. Many fairly recent versions of popular browsers will break on things like HTML named character entities. I'm talking fatal parse errors here. The other problems anyone might encounter (other than immature implementations) relate to using non-standards or not following appendix C. This to me is a cowpath that is authors telling us in droves we want to author with XHTML, they may even want to vend that XHTML as XML. However, the implementations have not caught up to the authors. Telling authors they're somehow made a mistake because their beating down a cowpath that, for some strange reason you think is misguided, does not make it any less of a cowpath. No one has ever, as far as I am aware, ever explained in a logical way, what could possibly be wrong with authoring content that adheres to XHTML appendix C. It has simply become a mantra amidst a certain web development clique. > They are not demonstrating that they know the difference, but like to > switch back and forth. There are better tools for that, if an author > actually knows what he is doing and wants to do that. I don't think they switch back and forth (though I know their are some who inadvisably promote that). I think they just like the authoring style of XHTML (less intricacies to remember) and they would like to take advantage of all of the other features of XML whenever the implementations catch up to the authors. > Henri pointed out some differences other than syntactic differences > that authors won't catch on to. There are plenty others including > parsing rules, DOM functions and more. Those are very minor differences that would only be gotchas for those ignoring Appendix C. Often authors are told to go with external stylesheets and external scripts (so that takes care of CDATA sections). Do that;, don't count on implicit elements; use Unicode characters instead of named character entities and stick with DOM1 through DOM3 and you'll be fine (oh and don't count on IE consuming your content). There's no need to raise the Homeland Security alert level over XHTML. It's just a few things to understand about it before vending as XML. However, all that has nothing to do with the other reason for following an appendix C syntax: for its consistency and readability. >> So I would say that the XHTML1-appendix C-like syntax is one of those >> cowpaths we should be considering even if we aren't just trying to >> judge a popularity contest. > > Appendix C is one of the reasons we now have the problem of HTML pages > pretending to be XML, and why authors would be royally confused if > they ever tried to actually serve those pages as XML. What exactly is the problem with their "pretending" to be XHTML pages? Will the server crash? Will it give me bad breath? There's a lot of scare language around this issue that just has no technical justification. > So, as I take it, your reason for wanting to encourage XML-like syntax > is to smooth the conversion of an HTML document to an XHTML document. > I contend that is a bad reason because it will encourage confusion > between the differences between the two. There are useful tools out > there to convert the syntax. Even after converting syntax, authors > still have to contend with differences in available DOM methods, > content models, parsing rules, and server configuration. No, that is not really my reason for wanting an appendix-C-like syntax. There are many reasons, but one of them is that it is a much easier syntax to understand. As a related story, I think Dreamweaver 2003 has been touted as the first authoring tool to produce valid well-formed HTML (this may just be advertising lingo, but there's some truth to it). Even the authoring tools couldn't keep HTML minimization rules straight. With the introduction of XML and XHTML, I sensed a sudden light bulb went on simultaneously in web developers around the world. "Aha!" they said "That's what proper nesting is all about!" The authoring tools started to get it right. Authors finally understood. And it's not just a pedagogical issue. XML actually separates two things that cannot be clearly separated in HTML: well-formedness and validity. Take Henri's favorite example from HTML5: <p><ol><//o></ p>.. In HTML5, this is perfectly valid and well-formed (presuming its properly placed in a larger document). It's a part of a valid DOM tree state. It's a valid XML serialization. However, it's not possible to express this in HTML4 with MIME type text/html (I was under the impression that it would be valid in HTML5, but Henri suggests otherwise). Moreover (and this is what I want to illustrate), there's no way for the parser to determine whether this is invalid or ill-formed (it's got to be one of them because text/html HTML4 forces that by it's validity and implicit </p> rules). Is it invalid in that the author put an ordered list in a paragraph where it didn't belong? Or is it ill-formed where the author included a closing </p> tag where it didn't belong. The parser has to make a decision on this and that decision will effect the rendering of the page. In XML this would be a clear invalidity violation and clearly not an ill-formedness error. XML has basically made invalidity less of a problem because it separates out the worst part that was lumpted together in text/html: ill-formedness. That is why the parser error rules are so strict in XML: because we're talking about ill-formedness errors and not simply invalidness errors. I know HTML5 wants to address ill-formedness by specifying a recovery for all the possible errors. However drawing on the existing implementations requires that most of those recovery techniques result in assuming ill-formedness (very unhelpful when trying to extend HTML's vocabulary). Anyway, this is getting off topic. The main thing is that there are many reasons to go xml-like syntax: even for text/html. There are not really any horrifying impacts that some hint at in doing so. And I think it would be good for HTML5 to foster this cowpath. By minimizing the differences between text/html and XML, HTML5 can get the word out better on how to handle those subtle differences. Since we're defining our own DOM, we can also specify what DOM APIs should be there. I could see breaking from text/html if we had good reason to deprecate those DOM calls, but it shouldn't just be based on the fact that some implementations just didn't implement document.write() for XML. It should be because we want to deprecate document.write() (If that's what we indeed want to do). Take care, Rob
Received on Monday, 23 July 2007 15:08:15 UTC