- From: Asbjørn Ulsberg <asbjorn@tigerstaden.no>
- Date: Mon, 08 Nov 2004 22:42:58 +0100
- To: trejkaz@xaoza.net, "James Cerra" <jfcst24_public@yahoo.com>
- Cc: www-html@w3.org
On Tue, 9 Nov 2004 08:05:01 +1100, Trejkaz Xaoza <trejkaz@xaoza.net> wrote: > If it doesn't start with "<?xml" but has a DOCTYPE near the top, then > it's SGML, and you perform similar rules based on what you see after it. As far as I see it, an XHTML document can start like this: 1. <?xml ...> 2. <!DOCTYPE ...> 3. <html xmlns="http://..."> Not all are valid prologs of an XHTML document, but some are as XML documents. The XML declaration is nonetheless optional, so any valid XHTML document may start with just a DOCTYPE. So may HTML documents as well, so then you actually have to parse the DOCTYPE to know what type of (X)HTML document it is. What I'd do, is the following: 1. Trigger XML parsing mode if: 1.1. The document starts with <?xml ...> 1.2. The document element is <html> with an attribute called 'xmlns' whos value is 'http://www.w3.org/1999/xhtml'. 2. Trigger SGML parsing mode if: 2.1. The document starts with a DOCTYPE that says it's HTML: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" ...> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" ...> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" ...> You may of course cater for more HTML versions than 4.01, but that would be just the same; add the DOCTYPE's to your checker. 2.2. The document element is <html> with no 'xmlns' attribute. I could have added the point «1.3. The document starts with a DOCTYPE that says it's XHTML», but that isn't necessary as all XHTML documents must have the <html> elment in the XHTML namespace. I would also do the check in this order, so that you fall back to SGML if any XHTML checks fail. Falling back to XML from SGML would give a much higher fail-rate, I think. -- Asbjørn Ulsberg -=|=- asbjornu@hotmail.com «He's a loathsome offensive brute, yet I can't look away»
Received on Monday, 8 November 2004 21:41:45 UTC