Character entities in ALT text

From: Harvey Bingham <hbingham@acm.org>
Date: Sun, 07 May 2000 01:16:59 -0400
Message-Id: <>
To: w3c-wai-eo@w3.org

At 2000-05-04 19:03-0400, Kathleen Anderson wrote:
>Harvey: ...

>Just to clarify, though - are you speaking of the code and images
>supplied by affiliate programs? If so, I have another item for your
>list. Please encourage them to use '&amp;' instead of '&', which doesn't
>validate and then I have to correct it, which goes against their terms
>and conditions (you're not supposed to modify the code they supply).
1. I appreciate your broadening this suggestion to include delivery of images
with accompanying alt-text consolidated from any affiliated third parties.
They needn't be advertisers.

2. I believe we in the User Agent and Web Content groups have focused on
what is delivered to the client user. It is possible that affiliate
programs are called by the portal application supplying the client.
In sending what they get on to the user, the portal is responsible for
supplying the alt-text, including restoration of any character entities
therein that may have been removed by the XML/HTML parser.

Kathleen reminds us that tools that do not depend on a prior HTML (or
XML) parser should check text of attribute values for proper use of
character entities for otherwise syntactically confusing characters.
An XML Parser normalizes attribute values before passing the value
of any attribute on to the application by:

     stripping the surrounding matching pair of single or double quotes,
     replacing character entity values,
     discarding leading, trailing whitespace
     replacing multiple internal white space (spaces, tabs, newlines,
         linefeeds) by a single space.

For example, use character entities in attribute values, like

     <img src="attlogo.gif" alt="AT&amp;T logo">

The XML-recommended minimum set of character entities are:

     &amp;  rather than "&"
     &lt;   rather than "<"
     &gt;   rather than ">"
     &apos; rather than "'" within a string surrounded by single quotes
     &quot; rather than '"' within a string surrounded by double quotes

Also use Unicode character entities for non-ASCII characters. These have
either of the forms:

     decimal     "&#decimal-value;" or
     hexadecimal "&#xhex-value;"

For example, the alternatives for "&gt;" are
     "&#62;"     decimal, or
     "&#x3e;"    its hex equivalent.

Of course, such character entities should appear in delivered text content,
where they are replaced by the parser before passing on to the application.

Note that whitespace normalization in attribute values may change the
original and that is not supposed to matter for the interpretation or use
of such values.

Also note that the local part of some URIs permits some of those characters.
I believe they need to be interchanged in attribute values as character entity

Regards/Harvey Bingham
