W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > October to December 1999

ISO character entitties (was Re: Web Magazine featuring Accessibility issues)

From: Gregory J. Rosmaita <unagi69@concentric.net>
Date: Thu, 21 Oct 1999 15:04:06 -0400
Message-Id: <4.1.19991021142232.00ae2530@pop3.concentric.net>
To: "Leonard R. Kasday" <kasday@acm.org>
Cc: WAI Interest Group Emailing List <w3c-wai-ig@w3.org>
Len wrote, in part,
  In other words, the semicolon is optional here.  However, to avoid browser
  bugs and inaccurate criticisms from people perusing my code, I'm gonna
  leave in the semicolon.  Thanks again for your clear explanation.

aloha, len!

actually, it is my impression, based upon the ISO 8859-1 and the ISO 8879
character sets [reference 1] that the semi-colon is only optional when the
character entity is followed by white space (or, conceivably, the end of a

the character entity is read -- as has been pointed out repeatedly in this
thread -- from the ampersand to the terminator, which -- in the case of a
character entity -- is a semi-colon...  if you leave off semi-colons, how is
the UA or parser supposed to know when the character entity ends?  and, the
simple fact that UAs can interpret URL-ampersands correctly doesn't necessarily
mean that it is ok to leave off the terminating semi-colon -- that logic
teeters upon a slippery slope which i don't think any of us want to tumble

which is why i always advise people whom i'm teaching the elements of web
authoring to ALWAYS terminate their character entities with a semi-colon, and
why i expressed my hope earlier in this thread that authoring tools that
conform to the Authoring Tools Accessibility Guidelines [reference 2] will
automatically replace ampersands (as well as other characters that should be
represented by a character entity) that appear in URIs with the appropriate
character entity code for that particular character...  i also hope that when
authors manually enter a character entity code that doesn't terminate with a
semi-colon, that the authoring tool either barks at them (if configured to
alert the author to invalidities as they are input) or, during the validation
process, identifies the lack of a semi-colon as an invalidity...

so, why stress the terminal semi-colon?  for an illustration of how the lack of
a semi-colon can affect page rendering, just use lynx to check any page that
employs character entity codes that do not end with semi-colons -- if the
character entity code is followed by a character, most versions of lynx will
simply render the character entity code literally -- i.e.


will be rendered by Lynx as "AT&ampT" -- for an illustration, of this
phenomenon, check
using lynx...

note that i hesitated to type that "all" versions of lynx will simply render
the character entity code literally, despite the fact that every version of
lynx that i've ever used, including 2.8.2.rel.1, render character entities that
lack semi-colons and which aren't terminated by white space thus...


[1] ISO 8879 character set
[2] Authoring Tool Accessibility Guidelines
He that lives on Hope, dies farting
     -- Benjamin Franklin, Poor Richard's Almanack, 1763
Gregory J. Rosmaita <unagi69@concentric.net>
   WebMaster and Minister of Propaganda, VICUG NYC
Received on Thursday, 21 October 1999 14:58:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 13 October 2015 16:21:06 UTC