Re: extra tags in output from Jany Quintard on 2000-03-24 (html-tidy@w3.org from January to March 2000)

From: Jany Quintard <quintard.j@cgi.fr>
Date: Fri, 24 Mar 2000 11:45:35 -0600
To: html-tidy@w3.org
Message-ID: <OFFAA29653.891A6ECD-ON8625686C.005DA487@rfdinc.com>

On Wed, 19 Jan 2000, Peter Levine wrote:

> Hi,
>
> When I set output-xml: yes why does the output include <html>, <head>,
> <title> and <body> tags when my original file doesn't include these
> tags?
>
> I'm using tidy as a last cleanup step after stripping those tags from an
> HTML file. The idea is to get my 'almost' XML' file cleaned up by tidy
> before presenting it to an  XML parser.
>
> TIA,
> Pete
>
XML files are SGML files which use a special SGML declaration.
In this declaration, you have the following code :
     FEATURES
         MINIMIZE
             DATATAG NO
             OMITTAG NO

So you are not allowed to omit tags (and elements). Actually, in a SGML
file, are many elements that you do not see, because of OMITTAG sttings.
Anyway, they are present and when a parser builds a tree from your
document, those things are there.
In XML, all must be explicite.
This is why in an XML DTD, you never see the - -, - O, O - that you can
encounter in a more loose SGML DTD. Compare :

SGML : (http://www.w3.org/TR/REC-html40/loose.dtd)
<!ELEMENT OL - - (LI)+                 -- ordered list -->

XML (http://www.w3.org/TR/xhtml1/DTD/transitional.dtd)
<!-- Ordered (numbered) list -->
<!ELEMENT ol (li)+>

The two DTD describe the HTML transitional version 4 the two forms.
You can notice that the use of cases and comments is more strict in the
XML version.

So, if you strip your XML file, I guess the will say it is not valid, even
if it is well formed. Depends on what you intend to do.

Jany.

Received on Friday, 24 March 2000 13:14:52 UTC