Fw: JTidy praise and a question from Andy Quick on 2000-03-24 (html-tidy@w3.org from January to March 2000)

From: Andy Quick <ac.quick@sympatico.ca>
Date: Fri, 24 Mar 2000 11:45:42 -0600
To: <html-tidy@w3.org>
Message-ID: <OFBB2C8F1D.84092B25-ON86256879.00522816@rfdinc.com>

Would it make sense to add the following snippet of
code to Lexer.fixXMLPI(Node root) after the call to
addStringLiteral("xml version=...") ?

    if (this.configuration.CharEncoding == Configuration.UTF8)
        addStringLiteral(" encoding=\"UTF8\"");
    else if (this.configuration.CharEncoding == Configuration.LATIN1)
        addStringLiteral(" encoding=\"ISO-8859-1\"");

Could an encoding attribute also be added for ISO2022 or MACROMAN?

Regards,

Andy Quick

----- Original Message -----
From: Paul Silvey <psilvey@mitre.org>
To: <ac.quick@sympatico.ca>
Sent: January 28, 2000 3:52 PM
Subject: JTidy praise and a question


> I'm using JTidy to convert HTML files from an on-line Spanish
> Newspaper into clean XML, so that I can run an XSLT processor on
> them.  The source documents are encoded using ISO-8859-1 (LATIN1)
> characters, but when I use the setCharEncoding method with the
> Configuration.LATIN1 argument and output XML, the prologue line looks
> as follows:
>
> <?xml version="1.0"?>
>
> When I then try to apply XSL transformations, I get UTF-8 encoding
> errors.  Apparently, the XML parsers that I've tried (Sun's and
> IBM's) assume UTF-8 if their is no encoding attribute specified.  If
> I manually change the prologue line to be:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
>
> the parsers are happy and the file is properly transformed.
>

Received on Friday, 24 March 2000 13:14:48 UTC