Re: Ampersands in text from Raffaele Sena on 1999-06-17 (www-lib@w3.org from April to June 1999)

From: Raffaele Sena <raff@nuvomedia.com>
Date: Thu, 17 Jun 1999 14:27:03 -0700
To: "Allen Comer" <allen.comer@entropic.com>, <www-lib@w3.org>
Message-ID: <004901beb908$24554ec0$52c0a8c0@nuvomedia.com>

> As a follow-up to yesterday's message.  I seem to be running into more
> than a my fair share of problems and I'm wondering what I might be doing
> wrong.  The latest problem I've found is that ampersands in plain text
> areas of an HTML document seem to confuse the HTML parser.  There are no
> unclosed <form> tags anywhere nor is there anything else that looks
> potentially troublesome.
>
> Any suggestions or ideas would be appreciated.
>
    Yap! If you check SGML.c you will see that an ampersand is always
    considered to start an entity, valid or invalid.

    One way to put it back in the text is register a callback for unparsed
entities.

    This is a quick hack in the 'showtext' example
 libwww/Library/Examples/showtext.c ).
    It's still eating a white space after the ampersand, but you can get the
idea
    (and I think the check for an isolated ampersand - i.e. x & y - should go
in SGML.c)

        diff -r1.2 showtext.c
        49a50,55
        > PRIVATE void unparsedEntity (HText * text, const char * buf, int
len)
        > {
        >     fputc('&', stdout);
        >     if (buf) fwrite(buf, 1, len, stdout);
        > }
        >
        74a81
        >     HText_registerUnparsedEntityCallback(unparsedEntity);

-- Raffaele

Received on Thursday, 17 June 1999 17:27:02 UTC