Re: HTML and XML

Elliotte Rusty Harold <>, 2009-02-10 17:32 -0800:

>  On this point, I have to call B.S. again. That a document is served as 
>  text/html does not make it HTML. Much less does it make it not XML. If a 
>  document satisfies the BNF grammar and the various well-formedness 
>  constraints,

That's exactly the point. Much if not most of the XHTML content
being served as text/html on the Web does not satisfy XML
well-formedness constraints. The only reason it doesn't become
completely unusable in browsers is that it gets processed by HTML
parsers in browsers instead of by their XML parsers. If it were to
be served with a proper XML MIME type instead -- or if browsers
were to do sniffing for the XHTML doctype or namespace and
actually parse it as XML instead of as HTML -- it would fail to
remain accessible on the Web.

> it is XML, whatever you call it.

I guess some might call it broken XML.

> It may also be HTML, and perhaps other things as well.
>  The MIME type is not normative. That someone has labeled a document as one 
>  thing or another does not make it that thing.

That may be the general case. In this specific case, serving a
document on the Web with the text/html MIME type makes it one very
definite thing: A document that will get processed as HTML
by browsers, not as XML.

It also makes it a document that should (if the producer of the
document wants to ensure that browsers can actually process it as
expected, without needing to fall back to error correction) follow
HTML-specific constraints.

That means the producers of such documents would need to follow
some constraints that XML tools are not able to check; for
example, they need to make sure they don't use self-closing tags
for elements that have required end tags, such as <script> or
<a name> instances. And they need to make sure that any <script>
or <style> element content follows HTML constraints, not XML/XHTML

>  If people are serving well-formed XML, it is likely they do so because they 
>  find it useful to do so, whatever MIME type happens to be assigned.

I don't think that's necessarily likely at all. They may have just
copied an XHTML document from somewhere else and used it as a
template for their own content. Or they may be using an editor
that by default produces XHTML-namespaced documents with an XHTML
doctype. Or they may be attempting to produce XHTML (using an
XHTML doctype and namespace, quoting attribute values, using
self-closing-tag syntax on empty elements) just because they've
heard or been taught that's what they should be doing, without
having any real understanding of the supposed benefits of doing it.


Michael(tm) Smith

Received on Wednesday, 11 February 2009 07:22:26 UTC