- From: Andy Quick <ac.quick@sympatico.ca>
- Date: Fri, 24 Mar 2000 11:45:42 -0600
- To: <html-tidy@w3.org>
Would it make sense to add the following snippet of code to Lexer.fixXMLPI(Node root) after the call to addStringLiteral("xml version=...") ? if (this.configuration.CharEncoding == Configuration.UTF8) addStringLiteral(" encoding=\"UTF8\""); else if (this.configuration.CharEncoding == Configuration.LATIN1) addStringLiteral(" encoding=\"ISO-8859-1\""); Could an encoding attribute also be added for ISO2022 or MACROMAN? Regards, Andy Quick ----- Original Message ----- From: Paul Silvey <psilvey@mitre.org> To: <ac.quick@sympatico.ca> Sent: January 28, 2000 3:52 PM Subject: JTidy praise and a question > I'm using JTidy to convert HTML files from an on-line Spanish > Newspaper into clean XML, so that I can run an XSLT processor on > them. The source documents are encoded using ISO-8859-1 (LATIN1) > characters, but when I use the setCharEncoding method with the > Configuration.LATIN1 argument and output XML, the prologue line looks > as follows: > > <?xml version="1.0"?> > > When I then try to apply XSL transformations, I get UTF-8 encoding > errors. Apparently, the XML parsers that I've tried (Sun's and > IBM's) assume UTF-8 if their is no encoding attribute specified. If > I manually change the prologue line to be: > > <?xml version="1.0" encoding="ISO-8859-1"?> > > the parsers are happy and the file is properly transformed. >
Received on Friday, 24 March 2000 13:14:48 UTC