Fw: JTidy praise and a question from Andy Quick on 2000-02-02 (html-tidy@w3.org from January to March 2000)

From: Andy Quick <ac.quick@sympatico.ca>
Date: Tue, 1 Feb 2000 19:49:59 -0500 (EST)
To: <html-tidy@w3.org>
Message-ID: <001b01bf6d17$347ac640$4cceacce@quick>

Would it make sense to add the following snippet of
code to Lexer.fixXMLPI(Node root) after the call to
addStringLiteral("xml version=...") ?

    if (this.configuration.CharEncoding == Configuration.UTF8)
        addStringLiteral(" encoding=\"UTF8\"");
    else if (this.configuration.CharEncoding == Configuration.LATIN1)
        addStringLiteral(" encoding=\"ISO-8859-1\"");

Could an encoding attribute also be added for ISO2022 or MACROMAN?

Regards,

Andy Quick

----- Original Message ----- 
From: Paul Silvey <psilvey@mitre.org>
To: <ac.quick@sympatico.ca>
Sent: January 28, 2000 3:52 PM
Subject: JTidy praise and a question


> I'm using JTidy to convert HTML files from an on-line Spanish 
> Newspaper into clean XML, so that I can run an XSLT processor on 
> them.  The source documents are encoded using ISO-8859-1 (LATIN1) 
> characters, but when I use the setCharEncoding method with the 
> Configuration.LATIN1 argument and output XML, the prologue line looks 
> as follows:
> 
> <?xml version="1.0"?>
> 
> When I then try to apply XSL transformations, I get UTF-8 encoding 
> errors.  Apparently, the XML parsers that I've tried (Sun's and 
> IBM's) assume UTF-8 if their is no encoding attribute specified.  If 
> I manually change the prologue line to be:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> 
> the parsers are happy and the file is properly transformed.
>

Received on Wednesday, 2 February 2000 09:56:37 UTC