ISO 8879 diacritical marks as HTML character entities -Reply

Jim Taylor (JHTaylor@videodiscovery.com)
Thu, 25 Jul 1996 12:21:55 -0800


Message-Id: <s1f76810.018@videodiscovery.com>
Date: Thu, 25 Jul 1996 12:21:55 -0800
From: Jim Taylor <JHTaylor@videodiscovery.com>
To: www-html@w3.org
Subject: ISO 8879 diacritical marks as HTML character entities -Reply

>>> Chung-Chieh Shan <t-chungs@microsoft.com> - 7/25/96 12:05 AM >>>
>I am interested in the list of character entities that are/will be
>included in HTML 3.2.  In particular, I am working on
computerization of
>several Taiwanese languages, the romanization of which requires
>diacritics to be placed over letters such as "m" and "n".  Since
there
>are already entities like &acute; and &grave; defined in
>ftp://ftp.ifi.uio.no/pub/SGML/ENTITIES/ISOdia, I suppose the only
>question is whether these entities will be included in HTML 3.2 (I'm
>actually not absolutely sure that they haven't been included in
previous
>versions; I'd be very happy if they have), and -- if they will --
>whether any specific rendering behavior is to be specified by HTML. 
If
>it is HTML's responsibility to specify rendering behavior for these
>entities, I think the logical way to proceed is to follow Unicode's
>placement of non-spacing marks, i.e., use m&acute; (rather than
>&acute;m) for m with acute above, and so on.

Entities for these diacriticals have not been in any HTML standard,
and are not in the experimental Cougar document[1].  However, these
characters are included in the ISO 8859-1 repertoire, so you can
directly use characters for the diacriticals, which should work in
any browser correctly supporting 8859-1. If you want non-spacing
diacriticals you could use numeric character references (from
Unicode) but most browsers won't support them.

acute: character 180 (&#180;)
acute, non-spacing: &#57351
grave: character 96 (&#96;)
grave, not-spacing: &#57350

Unicode also includes glyphs such as M with acute accent (&#7742;),
but it's not likely you'll get many browsers that support that
either.

You could propose that the SGML entities for diacritials (ISO
8879:1986//ENTITIES Diacritical Marks//EN) [2] be added to HTML, but
most of these are already included in the 8859-1 set and supported by
decent browsers. I.e., why write &grave; when you can write `?

<!ENTITY acute  SDATA "[acute ]"--=acute accent-->
<!ENTITY breve  SDATA "[breve ]"--=breve-->
<!ENTITY caron  SDATA "[caron ]"--=caron-->
<!ENTITY cedil  SDATA "[cedil ]"--=cedilla-->
<!ENTITY circ   SDATA "[circ  ]"--=circumflex accent-->
<!ENTITY dblac  SDATA "[dblac ]"--=double acute accent-->
<!ENTITY die    SDATA "[die   ]"--=dieresis-->
<!ENTITY dot    SDATA "[dot   ]"--=dot above-->
<!ENTITY grave  SDATA "[grave ]"--=grave accent-->
<!ENTITY macr   SDATA "[macr  ]"--=macron-->
<!ENTITY ogon   SDATA "[ogon  ]"--=ogonek-->
<!ENTITY ring   SDATA "[ring  ]"--=ring-->
<!ENTITY tilde  SDATA "[tilde ]"--=tilde-->
<!ENTITY uml    SDATA "[uml   ]"--=umlaut mark-->

-----

[1] http://www.w3.org/pub/WWW/MarkUp/Cougar/HTML.dtd
[2] ftp://ftp.ifi.uio.no/pub/SGML/ENTITIES/ISOdia