W3C home > Mailing lists > Public > www-html@w3.org > November 1998

tidy -asxml fix

From: Dan Connolly <connolly@w3.org>
Date: Thu, 05 Nov 1998 11:01:32 -0600
Message-ID: <3641D9EC.EB0@w3.org>
To: dsr@w3.org
CC: www-html@w3.org
The tidy[1] -asxml feature is a pretty cool idea,
but it's broken in the 1Sep release[2].

[1] http://www.w3.org/People/Raggett/tidy/
[2] http://www.w3.org/People/Raggett/tidy01sep98.tgz

It writes
<?XML version="1.0"> but it's supposed to be lower case:

=============
http://www.w3.org/TR/REC-xml#sec-prolog-dtd

<?xml version="1.0"?>
  <greeting>Hello, world!</greeting>
=============

Here's a patch.

=============
retrieving revision 1.1
diff -u -r1.1 lexer.c
connolly@pancake ../tidy01sep98[1104] less ,patch
--- lexer.c     1998/11/05 16:48:44     1.1
+++ lexer.c     1998/11/05 16:49:15
@@ -714,7 +714,7 @@
        {
                s = &lexer->lexbuf[root->content->start];

-               if (s[0] == 'X' && s[1] == 'M' && s[2] == 'L')
+               if (s[0] == 'x' && s[1] == 'm' && s[2] == 'l')
                        return true;
        }

@@ -728,7 +728,7 @@
        root->content = xml;

     lexer->txtstart = lexer->txtend = lexer->lexsize;
-       AddStringLiteral(lexer, "XML version=\"1.0\"");
+       AddStringLiteral(lexer, "xml version=\"1.0\"");
     lexer->txtend = lexer->lexsize;

     xml->start = lexer->txtstart;
=============

Also, the XML declaration should be
	-- nothing if the encoding is UTF-8 (or US-ASCII) or UTF-16
	-- <?xml encoding="iso-8859-1" version="1.0">
		if the tidy output is -latin1
		and similar for -iso2022, but I don't know the
		details.

So FixDocType should take another argument for the encoding.
I haven't hacked that up yet, but it should be easy.

-- 
Dan Connolly
http://www.w3.org/People/Connolly/
phone:+1-512-310-2971 (office, mobile)
Received on Thursday, 5 November 1998 12:00:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:37 GMT