- From: Jany Quintard <quintard.j@cgi.fr>
- Date: Thu, 20 Jan 2000 17:57:52 +0100 (CET)
- To: html-tidy@w3.org
On Wed, 19 Jan 2000, Peter Levine wrote: > Hi, > > When I set output-xml: yes why does the output include <html>, <head>, > <title> and <body> tags when my original file doesn't include these > tags? > > I'm using tidy as a last cleanup step after stripping those tags from an > HTML file. The idea is to get my 'almost' XML' file cleaned up by tidy > before presenting it to an XML parser. > > TIA, > Pete > XML files are SGML files which use a special SGML declaration. In this declaration, you have the following code : FEATURES MINIMIZE DATATAG NO OMITTAG NO So you are not allowed to omit tags (and elements). Actually, in a SGML file, are many elements that you do not see, because of OMITTAG sttings. Anyway, they are present and when a parser builds a tree from your document, those things are there. In XML, all must be explicite. This is why in an XML DTD, you never see the - -, - O, O - that you can encounter in a more loose SGML DTD. Compare : SGML : (http://www.w3.org/TR/REC-html40/loose.dtd) <!ELEMENT OL - - (LI)+ -- ordered list --> XML (http://www.w3.org/TR/xhtml1/DTD/transitional.dtd) <!-- Ordered (numbered) list --> <!ELEMENT ol (li)+> The two DTD describe the HTML transitional version 4 the two forms. You can notice that the use of cases and comments is more strict in the XML version. So, if you strip your XML file, I guess the will say it is not valid, even if it is well formed. Depends on what you intend to do. Jany.
Received on Thursday, 20 January 2000 11:58:27 UTC