[patch] from Jun Kuriyama on 1999-10-13 (html-tidy@w3.org from October to December 1999)

From: Jun Kuriyama <kuriyama@sky.rim.or.jp>
Date: Wed, 13 Oct 1999 23:36:41 +0900
To: html-tidy@w3.org
Cc: Jun Kuriyama <kuriyama@sky.rim.or.jp>
Message-ID: <14340.39161.781471.78002K@localhost.sky.rim.or.jp>

When I use -raw option with EUC-JP encoding, some entity references
(such as &copy;) are converted to ISO-8859-1 (?) character code.  But
that code is not re-converted to entity reference with -raw option.

# EUC-JP encoding uses 8th bit.  Japanese characters in this encoding
# may include 0xA0-0xFF character code.  Then EUC-JP cannot co-exist
# with other 8bit encodings.

So I like -raw option not to modify any entity references in input and 
print out as-is.  Can tidy accept this approach?


Index: lexer.c
===================================================================
RCS file: /tmp/tidycvs/tidy/lexer.c,v
retrieving revision 1.1.1.3
diff -u -r1.1.1.3 lexer.c
--- lexer.c	1999/10/13 14:06:29	1.1.1.3
+++ lexer.c	1999/10/13 14:06:39
@@ -358,15 +358,21 @@
             ReportEntityError(lexer, MISSING_SEMICOLON, lexer->lexbuf+start, c);
         }
 
-        lexer->lexsize = start;
-        AddCharToLexer(lexer, ch);
+        if (lexer->in->encoding == RAW)
+            if (semicolon)
+                AddCharToLexer(lexer, ';');
+        else
+	{
+            lexer->lexsize = start;
+            AddCharToLexer(lexer, ch);
 
-        if (ch == '&' && !QuoteAmpersand)
-        {
-            AddCharToLexer(lexer, 'a');
-            AddCharToLexer(lexer, 'm');
-            AddCharToLexer(lexer, 'p');
-            AddCharToLexer(lexer, ';');
+            if (ch == '&' && !QuoteAmpersand)
+            {
+                AddCharToLexer(lexer, 'a');
+                AddCharToLexer(lexer, 'm');
+                AddCharToLexer(lexer, 'p');
+                AddCharToLexer(lexer, ';');
+            }
         }
     }
 }
Index: pprint.c
===================================================================
RCS file: /tmp/tidycvs/tidy/pprint.c,v
retrieving revision 1.1.1.3
diff -u -r1.1.1.3 pprint.c
--- pprint.c	1999/10/13 14:06:30	1.1.1.3
+++ pprint.c	1999/10/13 14:06:39
@@ -291,7 +291,7 @@
     }
 
     /* except in CDATA map < to &lt; etc. */
-    if (! (mode & CDATA) )
+    if (!(mode & CDATA) && CharEncoding != RAW)
     {
         if (c == '<')
         {


Jun Kuriyama // kuriyama@sky.rim.or.jp
            // kuriyama@FreeBSD.org

Received on Wednesday, 13 October 1999 10:37:03 UTC