- From: Nick Kew <nick@webthing.com>
- Date: Mon, 6 Sep 2004 01:42:59 +0100 (BST)
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: public-qa-dev@w3.org
On Sun, 5 Sep 2004, Bjoern Hoehrmann wrote: > * the document has a document type declaration with a public > identifier that when split at // has a third component which > matches /^DTD\s+(\S+)/ for which $1 matches /XHTML/ > > * no public/system identifier but a <html> root element with an > explicitly *specified* xmlns attribute with a value of > "http://www.w3.org/1999/xhtml" That's too eager IMO. Appendix C applies only to XHTML 1.0. So we should permit XHTML-as- text/html only if the document uses one of the three XHTML 1.0 FPIs. For documents served as text/html that are not identifiably XHTML1.0, we should expect HTML, and emit a stern warning if they look like XML, as in any document starting with an xmldecl. Will your code do a better job of dealing with hixie's pathological use of comments? Do we parse them as SGML or XML to determine whether the document is SGML or XML? Hixie gives a first line that claims to be valid SGML, but suggesting it as valid HTML4 seems to be stretching a point. Hixie's valid point is that Appendix C is trouble, but we can't do anything about that. > I would like to know whether there are any good reasons to use a > different algorithm to determine the parse mode, whether everyone is > okay to use SGML::Parser::OpenSP to do that, where I could maintain the > tests in CVS and where code as the fragment above should go at this > point (CVS repository, module names, etc.) I'm not sure. But using Hixie's contrivance as a yardstick looks to be the way of madness. -- Nick Kew
Received on Monday, 6 September 2004 00:43:36 UTC