ISO character entitties (was Re: Web Magazine featuring Accessibility issues) from Gregory J. Rosmaita on 1999-10-21 (w3c-wai-ig@w3.org from October to December 1999)

From: Gregory J. Rosmaita <unagi69@concentric.net>
Date: Thu, 21 Oct 1999 15:04:06 -0400
To: "Leonard R. Kasday" <kasday@acm.org>
Cc: WAI Interest Group Emailing List <w3c-wai-ig@w3.org>
Message-Id: <4.1.19991021142232.00ae2530@pop3.concentric.net>

Len wrote, in part,
quote:
  In other words, the semicolon is optional here.  However, to avoid browser
  bugs and inaccurate criticisms from people perusing my code, I'm gonna
  leave in the semicolon.  Thanks again for your clear explanation.
unquote

aloha, len!

actually, it is my impression, based upon the ISO 8859-1 and the ISO 8879
character sets [reference 1] that the semi-colon is only optional when the
character entity is followed by white space (or, conceivably, the end of a
line)

the character entity is read -- as has been pointed out repeatedly in this
thread -- from the ampersand to the terminator, which -- in the case of a
character entity -- is a semi-colon...  if you leave off semi-colons, how is
the UA or parser supposed to know when the character entity ends?  and, the
simple fact that UAs can interpret URL-ampersands correctly doesn't necessarily
mean that it is ok to leave off the terminating semi-colon -- that logic
teeters upon a slippery slope which i don't think any of us want to tumble
down!

which is why i always advise people whom i'm teaching the elements of web
authoring to ALWAYS terminate their character entities with a semi-colon, and
why i expressed my hope earlier in this thread that authoring tools that
conform to the Authoring Tools Accessibility Guidelines [reference 2] will
automatically replace ampersands (as well as other characters that should be
represented by a character entity) that appear in URIs with the appropriate
character entity code for that particular character...  i also hope that when
authors manually enter a character entity code that doesn't terminate with a
semi-colon, that the authoring tool either barks at them (if configured to
alert the author to invalidities as they are input) or, during the validation
process, identifies the lack of a semi-colon as an invalidity...

so, why stress the terminal semi-colon?  for an illustration of how the lack of
a semi-colon can affect page rendering, just use lynx to check any page that
employs character entity codes that do not end with semi-colons -- if the
character entity code is followed by a character, most versions of lynx will
simply render the character entity code literally -- i.e.

AT&ampT

will be rendered by Lynx as "AT&ampT" -- for an illustration, of this
phenomenon, check
        http://www.hicom.net/~oedipus/temp/charater_entity.html
using lynx...

note that i hesitated to type that "all" versions of lynx will simply render
the character entity code literally, despite the fact that every version of
lynx that i've ever used, including 2.8.2.rel.1, render character entities that
lack semi-colons and which aren't terminated by white space thus...

gregory.

References
[1] ISO 8879 character set
        http://www.w3.org/MarkUp/Wilbur/ISOlat1.ent
[2] Authoring Tool Accessibility Guidelines
        http://www.w3.org/WAI/AU/WAI-AUTOOLS/
--------------------------------------------------------
He that lives on Hope, dies farting
     -- Benjamin Franklin, Poor Richard's Almanack, 1763
--------------------------------------------------------
Gregory J. Rosmaita <unagi69@concentric.net>
   WebMaster and Minister of Propaganda, VICUG NYC
        <http://www.hicom.net/~oedipus/vicug/index.html>
--------------------------------------------------------

Received on Thursday, 21 October 1999 14:58:04 UTC