Hyphenation in HTML 4.0 Draft

Section 10.3.4 of the current draft [1] specfies that the `soft hyphen'
(­ or #xAD) is a character that is only visible if the corresponding
hyphenation point actually does result in a line break.

This, at least, does not reflect the current practice of established
browsers, which always display the shy character and do not perform any
line breaking.  To me, it is unclear if the 4.0 Draft definition is in
accordance with the intended intrepretation of the ISO 8859 Standard and
the Unicode Standard.  Generally, there seems to be quite a bit of
confusion about the interpretation of the the concept of hyphens and
dashes in various character sets [2,3].

In my opinion, a complete and usable hyphenation concept should at least
include and non-ambigiously describe the following concepts: 

    1. an optional hyphen which is only visible if the
       corresponding hyphenation point is activated at the end of a line
    
    2. a hyphen which is always visible and allows a following linebreak
    
    3. a hyphen which is always visible and allows no following linebreak
    
    4. a construct which explicitly specifies the behaviour in the
       hyphenated and non-hyphenated variant (for Words like the German
       `Zucker' [`sugar'] which is hyphenated `Zuk-ker',
       for which TeX provides `Zu\discretionary{k-}{}{c}ker')

In a system which uses automatic rule- or dictionary-based hyphenation
(which is *not* HTML's intention) there are more variations one can
think of, especially variants that allow or disallow automatic
hyphenation in words where explicit hyphenation point appear (like the
"= construct in (La)TeX's german.sty).

Are there any plans to change or extend Section 10.3.4 in the light of
standards conformance and established browser practice?

Best Regards,

	J. Schwarze

--

  Jochen Schwarze       E-Mail: schwarze@isa.de
  ISA Systems GmbH      WWW:    http://www.isa.de/~schwarze
                        PGP:    D3 43 A9 E7 9A BC 2E 0F 58 FE 10 A4 A4 C5 F6 CC


References:

[1] <http://www.w3.org/TR/WD-html40/struct/text.html#h-10.3.4>
[2] <http://www.hut.fi/~jkorpela/shy.html>
[3] <http://wwwwbs.cs.tu-berlin.de/~mfx/h/space.html>

Received on Thursday, 20 November 1997 06:03:20 UTC