[HTML 4.01] Clarification about hyphen characters (9.3.3)

Hello,

The HTML 4.01 specification says:

------------------------------------------------------------------------
In HTML, there are two types of hyphens: the plain hyphen and the soft
hyphen. [...]

In HTML, the plain hyphen is represented by the "-" character (-
or -). The soft hyphen is represented by the character entity
reference ­ (­ or ­)
------------------------------------------------------------------------

But ISO10646/Unicode (the character set for HTML 4.01) contains other
hyphen characters:

2010;HYPHEN;Pd;0;ON;;;;;N;;;;;
2011;NON-BREAKING HYPHEN;Pd;0;ON;<noBreak> 2010;;;;N;;;;;

Moreover, there are probably better than the overloaded ASCII "-",
in particular if the user wants a hyphen character that can be
broken across lines (for compound words).

I think that Section 9.3.3 should be clarified about the use of hyphen
characters: either mention U+2010 and U+2011 (or possibly say that the
current list is not exhaustive) or explicitly forbid hyphen characters
other than U+002D and U+00AD.

Regards,

Vincent Lefèvre.

Received on Thursday, 14 August 2003 02:33:26 UTC