- From: Dan Connolly <connolly@w3.org>
- Date: Thu, 05 Nov 1998 11:01:32 -0600
- To: dsr@w3.org
- CC: www-html@w3.org
The tidy[1] -asxml feature is a pretty cool idea,
but it's broken in the 1Sep release[2].
[1] http://www.w3.org/People/Raggett/tidy/
[2] http://www.w3.org/People/Raggett/tidy01sep98.tgz
It writes
<?XML version="1.0"> but it's supposed to be lower case:
=============
http://www.w3.org/TR/REC-xml#sec-prolog-dtd
<?xml version="1.0"?>
<greeting>Hello, world!</greeting>
=============
Here's a patch.
=============
retrieving revision 1.1
diff -u -r1.1 lexer.c
connolly@pancake ../tidy01sep98[1104] less ,patch
--- lexer.c 1998/11/05 16:48:44 1.1
+++ lexer.c 1998/11/05 16:49:15
@@ -714,7 +714,7 @@
{
s = &lexer->lexbuf[root->content->start];
- if (s[0] == 'X' && s[1] == 'M' && s[2] == 'L')
+ if (s[0] == 'x' && s[1] == 'm' && s[2] == 'l')
return true;
}
@@ -728,7 +728,7 @@
root->content = xml;
lexer->txtstart = lexer->txtend = lexer->lexsize;
- AddStringLiteral(lexer, "XML version=\"1.0\"");
+ AddStringLiteral(lexer, "xml version=\"1.0\"");
lexer->txtend = lexer->lexsize;
xml->start = lexer->txtstart;
=============
Also, the XML declaration should be
-- nothing if the encoding is UTF-8 (or US-ASCII) or UTF-16
-- <?xml encoding="iso-8859-1" version="1.0">
if the tidy output is -latin1
and similar for -iso2022, but I don't know the
details.
So FixDocType should take another argument for the encoding.
I haven't hacked that up yet, but it should be easy.
--
Dan Connolly
http://www.w3.org/People/Connolly/
phone:+1-512-310-2971 (office, mobile)
Received on Thursday, 5 November 1998 12:00:44 UTC