Re: text/html for xml extensions of XHTML

Ian Hickson <ian@hixie.ch> writes:

> > 2.  The instance is served through http as "text/html" and any of
> >     the following is true:
> >
> >     a.  The instance begins with the string "<?xml" .
> 
> Nope. Here is a document that is valid text/html, but non-well-formed
> text/xml, and which should therefore be sent through the HTML parser:

SGML validation does not pass on the merits of PI's.  In today's world
the appearance of "<?xml " at the beginning of a text/html item
clearly indicates XML.  Since the xml PI is present, I think that a
sane xml-aware user agent should discard this example since it is not
conforming xml even though it might validate (perhaps, however, not
without warnings) as sgml.

>    <?xml this is not?>
>    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
>    <!-- -- -->
>      This is a comment. This document is not XHTML.
>      <html xmlns="http://www.w3.org/1999/xhtml"/>
>      Ok, I'm done now. -->
>    <html>
>     <title> Need a title in HTML! </title>
>     <p> This is a valid HTML document.
>    </html>
[snip]
> >     b.  The instance has a string matching the case-sensitive pattern
> >         "<!DOCTYPE html PUBLIC .*XHTML" before the first document
> >         instance tag.
> 
> Hmm, the valid HTML document above also matches that string.

Well, yes, if you look beyond the end of the "<!DOCTYPE ...>".  My
intention was that the string "XHTML" should be inside the value of
the FPI, and perhaps the string should be "DTD XHTML".

For the moment I don't know exactly how I would express it.  Still I
think that an xml capable user agent will look bad rolling past a
correct document type declaration for XHTML.

> >     c.  The first document instance tag is an open tag for the element
> >         "html" (all lower case) with a value specified for the attribute
> >         "xmlns".
> 
> How do you know it is the first instance tag without having a full XML
> parser to skip past PIs, comments, internal subsets, and the like?

Surely a user agent in classical mode has a way of knowing what is a
tag and what is not a tag.

Since many user agents appear to ignore PI's and document type
declarations and many extant html offerings do not have document type
declarations, (c) might reasonably be the sole criterion for calling
the xml parser.

Nonetheless a new user agent should be able to handle (a) and (b).

But does Mozilla call its xml parser for http://www.w3.org/ ?

                                    -- Bill

Received on Tuesday, 1 May 2001 12:16:04 UTC