- From: Andy Quick <ac.quick@sympatico.ca>
- Date: Tue, 1 Feb 2000 19:49:59 -0500 (EST)
- To: <html-tidy@w3.org>
Would it make sense to add the following snippet of
code to Lexer.fixXMLPI(Node root) after the call to
addStringLiteral("xml version=...") ?
if (this.configuration.CharEncoding == Configuration.UTF8)
addStringLiteral(" encoding=\"UTF8\"");
else if (this.configuration.CharEncoding == Configuration.LATIN1)
addStringLiteral(" encoding=\"ISO-8859-1\"");
Could an encoding attribute also be added for ISO2022 or MACROMAN?
Regards,
Andy Quick
----- Original Message -----
From: Paul Silvey <psilvey@mitre.org>
To: <ac.quick@sympatico.ca>
Sent: January 28, 2000 3:52 PM
Subject: JTidy praise and a question
> I'm using JTidy to convert HTML files from an on-line Spanish
> Newspaper into clean XML, so that I can run an XSLT processor on
> them. The source documents are encoded using ISO-8859-1 (LATIN1)
> characters, but when I use the setCharEncoding method with the
> Configuration.LATIN1 argument and output XML, the prologue line looks
> as follows:
>
> <?xml version="1.0"?>
>
> When I then try to apply XSL transformations, I get UTF-8 encoding
> errors. Apparently, the XML parsers that I've tried (Sun's and
> IBM's) assume UTF-8 if their is no encoding attribute specified. If
> I manually change the prologue line to be:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
>
> the parsers are happy and the file is properly transformed.
>
Received on Wednesday, 2 February 2000 09:56:37 UTC