Re: ISSUE-54: doctype-legacy-compat

Sam Ruby wrote:
> [...]
> 
>>From a technical perspective, here is my understanding of the problem.  The
> following string is the smallest and simplest string that will trigger
> standards compatibility mode in all browsers (there was some confusion over
> this, but that was resolved[5]) and can be produced by all known tools.
> 
> <!DOCTYPE html "">

Which tools are "all known tools"?

There are tools which have an HTML4 or XHTML1.0 doctype hardcoded, and 
we can't do anything about them, so I assume they must be excluded.

Looking at the source code for TagSoup 
(<http://home.ccil.org/~cowan/XML/tagsoup/>), I believe its "XMLWriter" 
(actually an XML-incompatible HTML writer) can only output:

   <!DOCTYPE html SYSTEM "">
or
   <!DOCTYPE html PUBLIC "x" "">

(where "x" means at least one character, and "" is at least zero).


XSLT's HTML output method 
(<http://www.w3.org/TR/xslt#section-HTML-Output-Method>, 
<http://www.w3.org/TR/xslt-xquery-serialization/#HTML_DOCTYPE>) can output:

   <!DOCTYPE html PUBLIC "">
   <!DOCTYPE html PUBLIC "" "">
   <!DOCTYPE html SYSTEM "">

though I assume implementations will differ to some extent.


Apache's HTML serialiser 
(<http://xml.apache.org/xalan-j/apidocs/org/apache/xml/serializer/ToHTMLStream.html>) 
seems to be able to generate:

   <!DOCTYPE html>
   <!DOCTYPE html PUBLIC "">
   <!DOCTYPE html PUBLIC "" "">
   <!DOCTYPE html SYSTEM "">


Genshi's HTMLSerializer 
(<http://genshi.edgewall.org/browser/trunk/genshi/output.py#L456>) does:

   <!DOCTYPE html>
   <!DOCTYPE html PUBLIC "">
   <!DOCTYPE html PUBLIC "" "">
   <!DOCTYPE html SYSTEM "">


The Perl module XML::Handler::HTMLWriter 
(<http://cpansearch.perl.org/src/MSERGEANT/XML-Handler-HTMLWriter-2.01/HTMLWriter.pm>) 
can do:

   <!DOCTYPE HTML PUBLIC "x">
   <!DOCTYPE HTML PUBLIC "x" "x">
   <!DOCTYPE HTML SYSTEM "x">


This list could (and probably should?) be extended further, but I don't 
know how to easily find more HTML serialiser libraries. Given the 
current list:

<!DOCTYPE html PUBLIC ""> would only help XSLT.

<!DOCTYPE html PUBLIC "(something)"> would help XSLT and 
XML::Handler::HTMLWriter.

<!DOCTYPE html SYSTEM ""> and <!DOCTYPE html PUBLIC "(something)" ""> 
would help XSLT and TagSoup.

<!DOCTYPE html SYSTEM "(something)"> and <!DOCTYPE html PUBLIC 
"(something)" "(something)"> would help XSLT and TagSoup and 
XML::Handler::HTMLWriter.

So if the goal is to work in as many tools as possible, the shortest 
option is <!DOCTYPE html SYSTEM "x">.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Friday, 16 January 2009 13:57:47 UTC