- From: Matej Vela <vela@debian.org>
- Date: Sat, 23 Dec 2000 15:13:03 +0100
- To: html-tidy@w3.org
- Cc: "Gregor N. Purdy" <gregor@focusresearch.com>, Joey Hess <joey@kitenet.net>, 67685-forwarded@bugs.debian.org
On Tue, Nov 28, 2000 at 03:23:51PM -0500, Gregor N. Purdy wrote:
> I was using tidy to process an XML file that used some custom entities.
> Since tidy wasn't aware of those entities, it refused to generate a
> tidied file. I did a little rooting around and modified the code just
> enough to produce the desired effect. There are some TODO comments in
> the code to point out where it is suboptimal. Nonetheless, I'm posting
> it here in case anyone else wants to make use of it or in case someone
> wants to tidy (ahem) it up for inclusion in the next release.
Here's a cleaned up version. (It doesn't expand custom entities
because I feel that's beyond Tidy's scope.)
--- tidy4aug00.orig/Overview.html Fri Aug 4 18:21:05 2000
+++ tidy4aug00/Overview.html Sat Dec 23 15:07:13 2000
@@ -422,6 +422,13 @@
this feature is to allow Tidy to be applied to Cold Fusion
files.</p>
+<p>You can also teach Tidy about new entities by declaring them
+in the configuration file, the syntax is:</p>
+
+<pre>
+ new-entities: <em>entity1, entity2, entity3</em>
+</pre>
+
<p class="c7">I am working on ways to make it easy to customize
the permitted document syntax using <a
href="http://www.w3.org/People/Raggett/dtdgen/Docs/">assertion
@@ -1088,6 +1095,14 @@
CDATA elements (similar to script).</dd>
</dl>
+<dt>new-entities: <em>entity1, entity2, entity3</em></dt>
+
+<dd>Use this to declare new entities. The option takes a space
+or comma separated list of entity names. There is no mechanism
+for specifying the values of the entities. Note that tidy still
+does not read custom entities from any internal document subset.
+</dd>
+
<h4>Sample Config File</h4>
<p>This is just an example to get you started.</p>
@@ -1115,6 +1130,7 @@
mprescripts, mtable, mtr, mtd, mth
new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse
+new-entities: disclaimer
</pre>
<h3><a id="scripts" name="scripts">Using Tidy from
--- tidy4aug00.orig/config.c Fri Aug 4 18:21:05 2000
+++ tidy4aug00/config.c Sat Dec 23 15:07:13 2000
@@ -105,6 +105,7 @@
static char *block_tags;
static char *empty_tags;
static char *pre_tags;
+static char *entities;
typedef struct _plist PList;
@@ -182,6 +183,7 @@
{"new-blocklevel-tags", {(int *)&block_tags}, ParseTagNames},
{"new-empty-tags", {(int *)&empty_tags}, ParseTagNames},
{"new-pre-tags", {(int *)&pre_tags}, ParseTagNames},
+ {"new-entities", {(int *)&entities}, ParseTagNames},
{"char-encoding", {(int *)&CharEncoding}, ParseCharEncoding},
{"doctype", {(int *)&doctype_str}, ParseDocType},
{"fix-backslash", {(int *)&FixBackslash}, ParseBool},
@@ -665,6 +667,8 @@
DefineEmptyTag(buf);
else if (location.string == &pre_tags)
DefinePreTag(buf);
+ else if (location.string == &entities)
+ DefineEntity(buf);
i = 0;
}
--- tidy4aug00.orig/entities.c Fri Aug 4 18:21:05 2000
+++ tidy4aug00/entities.c Sat Dec 23 15:07:13 2000
@@ -349,6 +349,11 @@
return 0; /* zero signifies unknown entity name */
}
+void DefineEntity(char *name)
+{
+ install(name, '&');
+}
+
void InitEntities(void)
{
struct entity *ep;
--- tidy4aug00.orig/html.h Fri Aug 4 18:21:05 2000
+++ tidy4aug00/html.h Sat Dec 23 15:07:13 2000
@@ -501,6 +501,8 @@
Bool IsWord2000(Node *root);
/* entities.c */
+void DefineEntity(char *name);
+
void InitEntities(void);
void FreeEntities(void);
uint EntityCode(char *name);
--- tidy4aug00.orig/lexer.c Fri Aug 4 18:21:05 2000
+++ tidy4aug00/lexer.c Sat Dec 23 15:07:13 2000
@@ -296,7 +296,7 @@
{
uint start, map;
Bool first = yes, semicolon = no;
- int c, ch, startcol;
+ int c, ch, startcol, i;
start = lexer->lexsize - 1; /* to start at "&" */
startcol = lexer->in->curcol - 1;
@@ -352,6 +352,15 @@
}
else /* naked & */
ReportEntityError(lexer, UNESCAPED_AMPERSAND, lexer->lexbuf+start, ch);
+
+ if (QuoteAmpersand)
+ {
+ for (i = 0; i < 4; ++i)
+ AddCharToLexer(lexer, '\0');
+ for (i = lexer->lexsize - 1; i > start + 4; --i)
+ lexer->lexbuf[i] = lexer->lexbuf[i - 4];
+ wstrncpy (lexer->lexbuf + start + 1, "amp;", 4);
+ }
}
else
{
@@ -363,19 +372,16 @@
ReportEntityError(lexer, MISSING_SEMICOLON, lexer->lexbuf+start, c);
}
- lexer->lexsize = start;
-
- if (ch == 160 && (mode & Preformatted))
- ch = ' ';
+ if (ch == '&')
+ AddCharToLexer(lexer, ';');
+ else
+ {
+ lexer->lexsize = start;
- AddCharToLexer(lexer, ch);
+ if (ch == 160 && (mode & Preformatted))
+ ch = ' ';
- if (ch == '&' && !QuoteAmpersand)
- {
- AddCharToLexer(lexer, 'a');
- AddCharToLexer(lexer, 'm');
- AddCharToLexer(lexer, 'p');
- AddCharToLexer(lexer, ';');
+ AddCharToLexer(lexer, ch);
}
}
}
--- tidy4aug00.orig/pprint.c Fri Aug 4 18:21:05 2000
+++ tidy4aug00/pprint.c Sat Dec 23 15:07:13 2000
@@ -362,21 +362,6 @@
return;
}
- /*
- naked '&' chars can be left alone or
- quoted as & The latter is required
- for XML where naked '&' are illegal.
- */
- if (c == '&' && QuoteAmpersand)
- {
- AddC('&', linelen++);
- AddC('a', linelen++);
- AddC('m', linelen++);
- AddC('p', linelen++);
- AddC(';', linelen++);
- return;
- }
-
if (c == '"' && QuoteMarks)
{
AddC('&', linelen++);
--- tidy4aug00.orig/release-notes.html Fri Aug 4 18:21:05 2000
+++ tidy4aug00/release-notes.html Sat Dec 23 15:07:13 2000
@@ -73,6 +73,14 @@
current workload means that I don't get much time left to work on
HTML Tidy.</p>
+<h2>December 2000</h2>
+
+<p>Gregor N. Purdy <gregor@focusresearch.com> made a quick
+hack to permit the definition of custom entities in the config
+file via the new-entities option, and Matej Vela
+<vela@debian.org> cleaned it up. This is handy for tidying
+XML files.</p>
+
<h2>August 2000</h2>
<p>Ann Navarro comments that the "appears to" message is
Received on Saturday, 23 December 2000 12:23:34 UTC