- From: Jun Kuriyama <kuriyama@sky.rim.or.jp>
- Date: Wed, 13 Oct 1999 23:36:41 +0900
- To: html-tidy@w3.org
- Cc: Jun Kuriyama <kuriyama@sky.rim.or.jp>
When I use -raw option with EUC-JP encoding, some entity references
(such as ©) are converted to ISO-8859-1 (?) character code. But
that code is not re-converted to entity reference with -raw option.
# EUC-JP encoding uses 8th bit. Japanese characters in this encoding
# may include 0xA0-0xFF character code. Then EUC-JP cannot co-exist
# with other 8bit encodings.
So I like -raw option not to modify any entity references in input and
print out as-is. Can tidy accept this approach?
Index: lexer.c
===================================================================
RCS file: /tmp/tidycvs/tidy/lexer.c,v
retrieving revision 1.1.1.3
diff -u -r1.1.1.3 lexer.c
--- lexer.c 1999/10/13 14:06:29 1.1.1.3
+++ lexer.c 1999/10/13 14:06:39
@@ -358,15 +358,21 @@
ReportEntityError(lexer, MISSING_SEMICOLON, lexer->lexbuf+start, c);
}
- lexer->lexsize = start;
- AddCharToLexer(lexer, ch);
+ if (lexer->in->encoding == RAW)
+ if (semicolon)
+ AddCharToLexer(lexer, ';');
+ else
+ {
+ lexer->lexsize = start;
+ AddCharToLexer(lexer, ch);
- if (ch == '&' && !QuoteAmpersand)
- {
- AddCharToLexer(lexer, 'a');
- AddCharToLexer(lexer, 'm');
- AddCharToLexer(lexer, 'p');
- AddCharToLexer(lexer, ';');
+ if (ch == '&' && !QuoteAmpersand)
+ {
+ AddCharToLexer(lexer, 'a');
+ AddCharToLexer(lexer, 'm');
+ AddCharToLexer(lexer, 'p');
+ AddCharToLexer(lexer, ';');
+ }
}
}
}
Index: pprint.c
===================================================================
RCS file: /tmp/tidycvs/tidy/pprint.c,v
retrieving revision 1.1.1.3
diff -u -r1.1.1.3 pprint.c
--- pprint.c 1999/10/13 14:06:30 1.1.1.3
+++ pprint.c 1999/10/13 14:06:39
@@ -291,7 +291,7 @@
}
/* except in CDATA map < to < etc. */
- if (! (mode & CDATA) )
+ if (!(mode & CDATA) && CharEncoding != RAW)
{
if (c == '<')
{
Jun Kuriyama // kuriyama@sky.rim.or.jp
// kuriyama@FreeBSD.org
Received on Wednesday, 13 October 1999 10:37:03 UTC