- From: Dan Connolly <connolly@w3.org>
- Date: Thu, 05 Nov 1998 11:01:32 -0600
- To: dsr@w3.org
- CC: www-html@w3.org
The tidy[1] -asxml feature is a pretty cool idea, but it's broken in the 1Sep release[2]. [1] http://www.w3.org/People/Raggett/tidy/ [2] http://www.w3.org/People/Raggett/tidy01sep98.tgz It writes <?XML version="1.0"> but it's supposed to be lower case: ============= http://www.w3.org/TR/REC-xml#sec-prolog-dtd <?xml version="1.0"?> <greeting>Hello, world!</greeting> ============= Here's a patch. ============= retrieving revision 1.1 diff -u -r1.1 lexer.c connolly@pancake ../tidy01sep98[1104] less ,patch --- lexer.c 1998/11/05 16:48:44 1.1 +++ lexer.c 1998/11/05 16:49:15 @@ -714,7 +714,7 @@ { s = &lexer->lexbuf[root->content->start]; - if (s[0] == 'X' && s[1] == 'M' && s[2] == 'L') + if (s[0] == 'x' && s[1] == 'm' && s[2] == 'l') return true; } @@ -728,7 +728,7 @@ root->content = xml; lexer->txtstart = lexer->txtend = lexer->lexsize; - AddStringLiteral(lexer, "XML version=\"1.0\""); + AddStringLiteral(lexer, "xml version=\"1.0\""); lexer->txtend = lexer->lexsize; xml->start = lexer->txtstart; ============= Also, the XML declaration should be -- nothing if the encoding is UTF-8 (or US-ASCII) or UTF-16 -- <?xml encoding="iso-8859-1" version="1.0"> if the tidy output is -latin1 and similar for -iso2022, but I don't know the details. So FixDocType should take another argument for the encoding. I haven't hacked that up yet, but it should be easy. -- Dan Connolly http://www.w3.org/People/Connolly/ phone:+1-512-310-2971 (office, mobile)
Received on Thursday, 5 November 1998 12:00:44 UTC