W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2000

HTML to XML yet again

From: Michael Tan <MTan@LA.Opus360.com>
Date: Mon, 6 Nov 2000 14:22:27 -0500 (EST)
Message-ID: <59CCDC67EB1BD41184D7009027DC78BD6F02B4@mail.la.opus360.com>
To: "'html-tidy@w3.org'" <html-tidy@w3.org>
Hi, I'm using Tidy to convert HTML to XML.  I've read the other posts and
still do not understand why Tidy does certain things:

1) Why does Tidy insert a <!DOCTYPE HTML ...> element when I specify
output-xml:yes and doctype:omit?  The only way I can eliminate this is to
use the output-xhtml:yes and doctype:omit.

2) Why doesn't Tidy escape the character entities in text nodes as default
behavior for output-xml since that is required for well-formed XML?  I read
David Ragget's response
(http://lists.w3.org/Archives/Public/html-tidy/2000JulSep/0310.html), but
shouldn't the character entities (&,<,>) be escaped under any text node for
legitimate XML output?  You could also go CDATA, but that seems to be
modifying the original document structure.  

Received on Monday, 6 November 2000 16:07:15 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:49 UTC