Re: Doctype detection

Jan Roland Eriksson wrote:
[interesting archeological dig thru specs ommitted...]
> RFC1866 is still a "winner" by having the only _normative_
> part of a spec on this so far; lets go on...

Keep in mind that the IETF has taken RFC1866
off the standards track; it has Status: HISTORIC
as of the publication of

The 'text/html' Media Type
Request for Comments: 2854
Obsoletes: 2070, 1980, 1942, 1867, 1866
http://www.ietf.org/rfc/rfc2854.txt


> Don't use "doctype-sniffing" for the wrong purpose, doing that
> will only create a new set of problems that we need to discuss
> again some years from now.

I disagree that the specs mandate any particular behaviour
in the absence of <!DOCTYPE ...>.

In any case...

I think namespaces are a better indicator of the
intended semantics of a document, since <!DOCTYPE...>
information disappears during parsing anyway.
I used to think <!DOCTYPE...> was some sort of
declaration of semantics, but I was assured
by SGML experts that it is *only* a mechanism
for referring to a DTD, and that there is *no*
difference in semantics between

	<!DOCTYPE html [
	... paste declarations from your favorite HTML spec here ...
	]>
	<html>...</html>

and

	<!DOCTYPE html public "-//identify your
		favorite HTML spec DTD here//">
	<html>...</html>

and that a "structure controlled application" should
not distinguish the two in any way.

While a <!DOCTYPE ...> is required for strict
conformance to XHTML 1.0, its only purpose
is to aid in syntactic validation of a document;
i.e. to help an author keep from making mistakes.
I expect schema validation based on namespaces to supply that
functionality fairly soon, so I expect <!DOCTYPE...>
to become obsolete.

For user agents, I recommend that they
	-- start by using a conformin XML parser
	-- if the first start tag "event" includes
		a namespace declaration that applies
		to the root element, you're in business:
		you know what language you're dealing with
		(if you get a namespace you don't recognize,
		you might try accessing it to see if you
		get a schema that relates the types used
		in the document to types you know).
	-- if not, i.e. if the first start tag
		has no namespace declaration, or if
		you get a well-formedness error before
		the first start tag is done, go
		into "there be dragons" mode.

For an implementation of "there be dragons" mode,
I recommend using tidy
	http://www.w3.org/People/Raggett/tidy/
to turn the input into XHTML, and parse that.

This would defeat progressive display, since tidy works
on the whole document at once, but I'd like to see what would
happen if suddenly all pages that weren't XHTML
loaded slower ;-)

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Thursday, 27 July 2000 15:29:45 UTC