Re: text/html for xml extensions of XHTML

Ian Hickson <ian@hixie.ch> writes:

> Remember that the XML declaration is optional, and that giving the XML
> declaration is discouraged by the XHTML compatability guidelines (see
> section C.1), which are supposed to be followed in order to send XHTML
> as text/html.

> If you are willing to use the XML declaration as a signal to use XML,
> you might as well use text/xml since it's not going to be compatible
> with older browsers anyway.

Appendix C is addressed to content providers.  It's not something for
xml-capable user agents to hide behind.

A content provider ought to be free to make the call whether he/she/it
wants to use "<?xml ..." at the top of a document.

> >>>     b.  The instance has a string matching the case-sensitive
> >>>         pattern "<!DOCTYPE html PUBLIC .*XHTML" before the first
> >>>         document instance tag.
> >> Hmm, the valid HTML document above also matches that string.
> >
> > Well, yes, if you look beyond the end of the "<!DOCTYPE ...>". My
> > intention was that the string "XHTML" should be inside the value of
> > the FPI, and perhaps the string should be "DTD XHTML".
> >
> > For the moment I don't know exactly how I would express it. Still I
> > think that an xml capable user agent will look bad rolling past a
> > correct document type declaration for XHTML.
> 
> The moment you get more complicated than "look for a pattern at the
> start of the document" you end up having to write a fully fledged
> parser. Extreme case in point:
> 
>    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"
>           [ <!-- SYSTEM "not XHTML" --> ]>

Alas, email discussions do not always have the precision required of
code.

Here we have a document type declaration for the name "html" with a
formal public identifier that matches the pattern "DTD HTML" rather
than the pattern "DTD XHTML".  So that is the decision point.

Let's try again:

-----
An item served as "text/html" should be handled as an XML version of
"html" if

a.  It begins, apart from white space, with the string "<?xml " .
    (See comment on this in regard to Appendix C of the XHTML spec
    above.)

OR

b.  The document begins with zero or more comments and processing
    instructions that conform to the XML specification followed by
    one of the following:

    (i)  A document type declaration not containing internal comments
         with formal public identifier matching the pattern
         "DTD XHTML".

    (ii) An open tag for the element "html" with a value specification
         for the attribute "xmlns".

-----

Rolling through any initial comments and PI's conforming to the xml
standard should be quick, but, if there is still concern about
performance, perhaps these could be banned when XHTML is served as
"text/html".

> > But does Mozilla call its xml parser for http://www.w3.org/ ?
> 
> Nope. If it did, it would render the page without any expanded
> character entity references, since Mozilla is not a validating parser
> and thus skips parsing the DTD and thus doesn't know what &nbsp;,
> &middot; and &copy; are. Not to mention that it would end up ignoring

So perhaps there are other issues.

But let's not allow the other issues to distract us from the facts
that

  1. the resource identified by "http://www.w3.org/" is XHTML.

  2. XHTML is current html.

                                    -- Bill

Received on Wednesday, 2 May 2001 09:30:03 UTC