Re: Doesn't the ALT field of an image support ä?

Daniel W. Connolly (connolly@beach.w3.org)
Fri, 26 Apr 1996 00:33:10 -0400


Message-Id: <m0uCfDS-0002TqC@beach.w3.org>
To: Eva <spencer@algonet.se>
Cc: www-html@w3.org
Subject: Re: Doesn't the ALT field of an image support &auml;? 
In-Reply-To: Your message of "Sat, 20 Apr 1996 11:19:46 +0200."
             <199604200919.LAA17943@hermes.algonet.se> 
Date: Fri, 26 Apr 1996 00:33:10 -0400
From: "Daniel W. Connolly" <connolly@beach.w3.org>

In message <199604200919.LAA17943@hermes.algonet.se>, Eva writes:
>At 10.59 1996-04-20 -0700, you wrote:
>
>>I was just viewing my own homepages with lynx, and it seems that if a
>&auml; (the a 
>>with two dots over it) is in normal text, it is displayed coorectly. Inside
>the ALT 
>>field of an IMG, it is displayed as "&auml;".  Does this mean that ALT can't 
>>contain Finnish (or any other umlaut etc) characters, or is this a defect
>in Lynx?

Bug in lynx.

>The specs say:
>"The alt text can contain entities e.g. for accented characters or special
>symbols, but it can't contain markup. The latter is possible, however, with
>the FIG element"

Ummm.. what spec says that? The _expired_ march '95 HTML 3 draft?
Perhaps. Please cite your source.

"can't contain markup" is misleading/wrong -- entity references in
attribute value literals _are_ markup. "can't contain tags" is better,
but still misleading. <img alt="<foo>" src=xxx.gif> is legal, but
<foo> is not treated as a tag.

The HTML 2 spec doesn't explicitly say that entity references count
in attribute value literals, but the SGML spec does, and the HTML spec
does discuss the issue:

http://www.w3.org/pub/WWW/MarkUp/html-spec/html-spec_3.html#SEC3.2.4
=======================
A useful technique for computing an attribute value literal for a given string is to
replace each quote and white space character by an entity reference or numeric
character reference as follows: 

                 ENTITY      NUMERIC
       CHARACTER REFERENCE   CHAR REF     CHARACTER DESCRIPTION
       --------- ----------  -----------  ---------------------
         HT                  &#9;         Tab
         LF                  &#10;        Line Feed
         CR                  &#13;        Carriage Return
         SP                  &#32;        Space
         "       &quot;      &#34;        Quotation mark 
         &       &amp;       &#38;        Ampersand 

For example: 

<IMG SRC="image.jpg" alt="First &quot;real&quot; example">
=======================

>Entities in ALT don't validate.

I'm pretty sure you're mistaken.

See also:

A Lexical Analyzer for HTML and Basic SGML 
http://www.w3.org/pub/WWW/MarkUp/SGML/#sgml-lex

in particular:

http://www.w3.org/pub/WWW/MarkUp/SGML/sgml-lex/sgml-lex#API
====================
Section 7.9.3 of SGML says that an attribute value literal is
interpreted as an attribute value by:

     Removing the quotes 
     Replacing character and entity references 
     Deleting character 10 (ASCII LF) 
     Replacing character 9 and 13 (ASCII HT and CR) with character 32 (SPACE) 
====================

Dan