Doctype options lacking

[I discussed this briefly off-list several weeks ago with Charles
Reitzel.  He suggested bringing it up here for discussion.]

The 'doctype' option is flawed as currently implemented.  It can take
one of six values: auto, omit, strict, loose, transitional, or a user
specified FPI (i.e. a string).  This is does not scale well, now or in
the future.

'auto' and 'omit' need not concern us here, so we are left to discuss
the remaining four, three of which are an enum (and two of those are
synonyms!), and the other a string.  Ouch.

Plus, in only being able to choose between 'strict', 'transitional'
('loose' means exactly the same thing as 'transitional'), we have
effectively frozen ourself in time to 1998, since those choices are only
applicable to HTML 4.x and XHTML 1.0.  To determine which syntax to use,
Tidy also has to look at the 'output-xhtml' option, which if set to 'no'
(the default), will use the HTML 4.01 declaration.

So, want Tidy to output XHTML Basic, or XHTML 1.1?  Forget it.  

Want to use your own custom DTD?  Forget it -- unless you're using some
roll-your-own SGML-based HTML along with a "catalog" to look up the FPI.
Why?  Because the 'doctype' option offers no "user specified SYSTEM ID"
like it does a "user specified FPI".  

For the SGML (and thus "classic" HTML) Doctype declaration, the SYSTEM
identifier is optional, the PUBLIC one required.  For XML, the reverse
is true: FPIs are optional, SYSTEM IDs are required.  (!)

So, what to do?  A proposal:

1)  Two new options, for custom declarations:

	doctype-public	// FPI 
	doctype-system	// SYSTEM ID (i.e, URL) 

These have the same name (and purpose!) as the corresponding attributes
in XSLT's <xsl:output ...> element.  

2)  A third new option, an enum, similar in function to the current
'doctype' option:

	html-doctype	// no name collision with 'doctype'

It could take one of the following values:

	auto
	omit
	html-32
	html-4
	html-4-transitional
	html-iso
	xhtml-10
	xhtml-10-transitional
	xhtml-basic
	xhtml-11
	xhtml-20	// we can wait on this one.

(Have I left out any other "standard" doctypes?)

Authors would choose between 2 and 1 above: "from the menu" or
"build-your-own".  

If option 2 were specified, any of the values beginning with 'xhtml'
would turn on (and override!) 'output-xhtml'.  Specifying 'output-xhtml'
explicitly is only needed with option #1.

Also, option #1 above would work for generic XML.

[Note: Making these changes would in effect deprecate Tidy's current
'doctype' option.  But leave 'doctype' in and let it continue to do as
it does now, because it's no doubt in wide use and a lot of things may
depend on it.  Just ignore 'doctype' if any of the new options are set.]

I can provide the FPIs and SYSTEM IDs for option #2 if necessary.


IMO, FWIW, and HTH,  :)


/Jelks

Received on Saturday, 24 August 2002 00:06:22 UTC