- From: <www-html-request@w3.org>
- Date: Mon, 9 Nov 1998 09:24:00 -0500 (EST)
ÚImCrP:cø¾www19.w3.orgwww19.w3.org<www-html-request@w3.org>c=SE;a=400NET;p=TERACOM;l=GATEWAY29811051910WHN1ZHVPArg@lu.er<PAM@teracom.se>IA/oEwLsReceived: from www19.w3.org ([18.29.0.19]) by gateway2.teracom.se with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2232.9)
id WHN1ZHVP; Thu, 5 Nov 1998 20:10:52 +0100
Received: (from daemon@localhost)
by www19.w3.org (8.9.0/8.9.0) id NAA16054;
Thu, 5 Nov 1998 13:50:13 -0500 (EST)
Resent-Date: Thu, 5 Nov 1998 13:50:13 -0500 (EST)
Resent-Message-Id: <199811051850.NAA16054@www19.w3.org>
Message-ID: <3641F391.6032@w3.org>
Date: Thu, 05 Nov 1998 12:50:57 -0600
From: Dan Connolly <connolly@w3.org>
Organization: World Wide Web Consortium (http://www.w3.org/)
X-Mailer: Mozilla 3.04 (WinNT; I)
MIME-Version: 1.0
To: dsr@w3.org, www-html@w3.org
References: <3641D9EC.EB0@w3.org>
Content-Type: multipart/mixed; boundary="------------274B38103A1E"
Subject: Re: tidy -asxml fix
Resent-From: www-html@w3.org
X-Mailing-List: <www-html@w3.org> archive/latest/736
X-Loop: www-html@w3.org
Sender: www-html-request@w3.org
Resent-Sender: www-html-request@w3.org
Precedence: list
This is a multi-part message in MIME format.
--------------274B38103A1E
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Dan Connolly wrote:
>
> The tidy[1] -asxml feature is a pretty cool idea,
> but it's broken in the 1Sep release[2].
>
> [1] http://www.w3.org/People/Raggett/tidy/
> [2] http://www.w3.org/People/Raggett/tidy01sep98.tgz
[...]
> Also, the XML declaration should be
> -- nothing if the encoding is UTF-8 (or US-ASCII) or UTF-16
> -- <?xml encoding="iso-8859-1" version="1.0">
> if the tidy output is -latin1
> and similar for -iso2022, but I don't know the
> details.
>
> So FixDocType should take another argument for the encoding.
> I haven't hacked that up yet, but it should be easy.
OK... done. patch attached. (turns out the encoding is
a global variable, so I dind't have to add an argument.)
There are some limitations:
+ AddStringLiteral(lexer, "xml version=\"1.0\" encoding=\"");
+ /* @@FIXME: ISO2022 isn't any one character set
+ in the sense of
+ http://www.isi.edu/in-notes/iana/assignments/character-sets
+
+ And if it's raw, we really don't know
+ */
+ AddStringLiteral(lexer, CharEncoding==LATIN1 ? "iso-8859-1" :
"???");
+
+ AddStringLiteral(lexer, "\"");
The patch also adds -ansi to the gcc invocation;
I got some warnings about redeclaration of uint in platform.h
on linux, and this fixed it. But it made the // style comments
generate errors, so I supplimented them with #if 0/#endif.
--
Dan Connolly
http://www.w3.org/People/Connolly/
phone:+1-512-310-2971 (office, mobile)
--------------274B38103A1E
Content-Type: text/plain; charset=us-ascii; name=",patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename=",patch"
--- lexer.c 1998/11/05 18:44:46 1.1
+++ lexer.c 1998/11/05 17:57:19
@@ -34,6 +34,8 @@
#include "platform.h"
#include "html.h"
+extern int CharEncoding; /* from tidy.c */
+
AttVal *ParseAttrs(Lexer *lexer, bool *isempty); /* forward references */
void CheckAttributes(Lexer *lexer, Node *node);
Node *CommentToken(Lexer *lexer);
@@ -714,7 +716,12 @@
{
s = &lexer->lexbuf[root->content->start];
- if (s[0] == 'X' && s[1] == 'M' && s[2] == 'L')
+ if (s[0] == 'x' && s[1] == 'm' && s[2] == 'l')
+ return true;
+ }
+
+ if( CharEncoding == ASCII ||
+ CharEncoding == UTF8 ){
return true;
}
@@ -728,7 +735,16 @@
root->content = xml;
lexer->txtstart = lexer->txtend = lexer->lexsize;
- AddStringLiteral(lexer, "XML version=\"1.0\"");
+ AddStringLiteral(lexer, "xml version=\"1.0\" encoding=\"");
+ /* @@FIXME: ISO2022 isn't any one character set
+ in the sense of
+ http://www.isi.edu/in-notes/iana/assignments/character-sets
+
+ And if it's raw, we really don't know
+ */
+ AddStringLiteral(lexer, CharEncoding==LATIN1 ? "iso-8859-1" : "???");
+
+ AddStringLiteral(lexer, "\"");
lexer->txtend = lexer->lexsize;
xml->start = lexer->txtstart;
--- pprint.c 1998/11/05 18:44:09 1.1
+++ pprint.c 1998/11/05 18:42:50
@@ -394,7 +394,7 @@
{
if (c > 127 && CharEncoding == ASCII)
{
- sprintf(entity, "&#x%x;", c);
+ sprintf(entity, "&#%d;", c);
for (p = entity; *p; ++p)
AddC(*p, linelen++);
@@ -407,7 +407,8 @@
/* default treatment for ASCII */
if (c > 126 || (c < ' ' && c != '\t'))
{
- if ((p = EntityName(c)) != null)
+ if (((p = EntityName(c)) != null)
+ && XmlOut == false) /* don't use named entities in XML */
sprintf(entity, "&%s;", p);
else if (c > 255)
sprintf(entity, "&#x%x;", c);
@@ -503,8 +504,10 @@
if (c == '\n')
{
PFlushLine(fout, indent);
+#if 0
//indent = 0; /* kludge */
//InAttVal = true;
+#endif
continue;
}
--- Makefile 1998/11/05 18:44:46 1.1
+++ Makefile 1998/11/05 17:56:31
@@ -1,6 +1,6 @@
# Makefile - for tidy
-CC= gcc
+CC= gcc -ansi
CFLAGS= -O
--------------274B38103A1E--
Received on Monday, 9 November 1998 09:24:44 UTC