- From: Jukka Korpela <jkorpela@cc.hut.fi>
- Date: Thu, 21 Jan 1999 10:13:02 +0200 (EET)
- To: www-html@w3.org
[Aristeu, please fix the settings of your Outlook so that it sends plain text only, not text and (pseudo-)HTML, less than 80 chars per line and please use Ascii characters only of possible, e.g. no "smart quotes" in Windows-specific encoding.] On Mon, 18 Jan 1999, Aristeu E B da Silva wrote: [reformatted for readability] > It is clear at ‘HTML 4.0 Specification’, item ‘9.3.3 Hyphenon’ > and because ‘CSS2 Specification’ doesn’t defines any "Hyphenated" > attribute, that hyphenation is an author’s concern I would say that hyphenation, being a presentational thing, is _basically_ a user agent's concern, but authors may need or wish to make their contributions. In principle, giving adequate information about the natural language(s) used, via LANG attributes, is the most essential thing to do. Although user agents currently ignore those attributes, they are certainly the way to go. Additional information from the author might be needed in special cases, e.g. as hyphenation hints or prohibitions. A user agent can hardly be expected to analyze e.g. whether "record" is being used as a noun (to be hyphenated rec-ord) or as a verb (re-cord)! Fundamentally, hyphenation, if applied, needs to be done according to language-specific rules, possibly applying some exceptions indicating in the document itself in some notation. High-quality software might apply quite complicated methods which give different weights to possible hyphenation points (preferring e.g. a division of a compound word at the compound boundary). I don't think item 9.3.3 should be read as suggesting that authors should generally include soft hyphens to indicate possible hyphenation points. Rather, that they _may_ do so and user agents _may_ use them in hyphenation. > It looks fine to me, because -- as an author -- I can tell, not only, > what should and what should not be hyphenated, but also, tell how this > hyphenation should be performed. No matter the user agent’s language or > hypothetic hyphenation algorithm, which should not exist at all. The user agent's language, in the sense that it's _user interface_ may use some natural language (in menus, error messages, help files, etc) should of course have nothing to do with hyphenation. What matters is the language used in the _document_. The idea of prehyphenation -- running the document through a utility which determines the possible hyphenation points in the text and includes some hyphenation hints, before putting the document onto the Web -- would significantly increase document size and transfer time, especially if soft hyphens are used as entities (and not as raw octets). Admittedly it would simplify the user agent's task. In practical terms, the soft hyphen is not supported by browsers, and using it would result in serious problems in current browsers. Moreover, I don't think the soft hyphen would be a good solution at all. It seems obvious to me that using it as a hyphenation hint would not comply with the _definition_ of soft hyphen in ISO 8859 standards. See http://www.hut.fi/u/jkorpela/shy.html > I would do that by inserting the ‘soft hyphen’ character enti > ty (decimal 173) everywhere in my paragraphs where I want to allow > hyphenation to be performed, and not doing so where I want not. I > already have a software of my own which is able to do it in Portuguese. > > But, unfortunately, both IE4 and Netscape 4.5 does exactly what > should not be done with decimal 173, that is, show them as plain > hyphens. By doing so they’re preventing us on using hyphenated texts I don't expect this to change. From ISO 8859 viewpoint, a soft hyphen anywhere but at the end of a line is an anomaly, so any processing in other contexts can be classified as error recovery. Not displaying it at all might be more reasonable error recovery, but this would imply ignoring actual data in a document. Naturally HTML definition might assign special meanings to characters (just as space characters have special semantics, not to mention characters like <>&). It could define that a soft hyphen, or a normal hyphen, or the letter h is to be treated as a hyphenation hint, not as normal data character. But no HTML specification has _really_ defined a special meaning for the soft hyphen. The older specs were written so that the soft hyphen _as defined by character set standards_ was assumed to have the semantics of a "discretionary hyphen". The HTML 4.0 tries to be more explicit, but the current formulation imposes requirements on "those browsers that interpret soft hyphens" without requiring that browsers must "interpret" them. So a conforming browser should go on displaying soft hyphens as hyphens. > In my opinion, hyphenation is not only an important lay-out feature, > but it’s also ‘an cultural issue’, in Portuguese it’s ‘strange’ when the > body text isn’t justified and hyphenated. I agree on the importance of hyphenation. Justifying text is a different thing -- text justified on both sides looks very often very odd when the window is narrow -- but naturally _if_ text is justified it should normally be hyphenated to get a decent result, especially when very long words may occur. Hyphenation needs to be programmed into browsers. Given the fact that popular browsers are mammouths which do miscellaneous things with very little to do with what a Web browser should really do, it would be just decent to include some basic hyphenation rules into them. For author's hyphenation hints or prohibitions, there is really no single _character_ which could logically be assigned to the job. It's more like a job for tags. For prohibitions, most browsers seem to support the <nobr> tags. It should probably be promoted to a real element, defined as text-level markup in a future HTML specification. An obvious solution would be to introduce an empty element, say <hy>, for the purpose, so one could write rec<hy>ord. Unlike ­, this would degrade gracefully on browsers which do not support it. Moreover, the element might take an attribute with a numeric value indicating the level of acceptability of word division at that point, ranging from a value indicating a most preferred point (such as between the constituents of a compound word) to a value which suggests that hyphenation should be applied only if absolutely necessary. Alternatively, hyphenation hints (and perhaps hyphenation prohibitions too) could be regarded as purely presentational, to be handled in style sheets. But I'd say it would be less practical to write <span id="someid">record</span> and then a CSS rule for that particular occurrence of the word "record". And since hyphenation is sometimes related to the _meaning_ of a word in a natural language, hyphenation hints can be regarded as part of the structure of a document in a sense. (The same applies to pronunciation hints/information. One might say that _ideally_ an author should be able to specify, in HTML markup, the _meaning_ of a word like "record", by referring to a dictionary entry in a specific format, useful both for hyphenation and pronunciation, as well as automatic analysis of the document for translation or other purpose; and a user agent might make it an implicit link, so that the user may request for a definition of the word from the dictionary.) > What is W3C’s position about it, will this approach be changed > or should we wait the User Agents to change? Will they? > > Aristeu Escobar Branco da Silva > > São Paulo, Brasil. Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html
Received on Thursday, 21 January 1999 03:13:18 UTC