- From: <Jukka.Korpela@hut.fi>
- Date: Fri, 14 Apr 2000 15:44:40 +0300 (EET DST)
- cc: www-html@w3.org
On Fri, 14 Apr 2000, Daniel Glazman wrote: > Just FYI and information of the other readers : one of my best friends is > a latinist. He had to put on an intranet some weeks ago the exact copy of > a romanian wall inscription where words are separated by a colon. > Not by whitespaces. This raises an interesting question about division of text into lines. The current HTML specifications are very vague about it. (Even HTML 2.0 was more explicit, I'd say.) I think the general - and correct - idea is that the problem is inherently dependent on the natural language used in the document, to be solved among other "i18n" issues. But that's not all. Lots of problems are language-independent, at least mostly. It would be somewhat artificial to approach the particular problem mentioned above by introducing a specifier to language code (say lang="la-inscriptions"). (By the way, I'd say that most inscriptions don't use a colon but a character that can be identified with the middle dot character, ·, which has rather mixed usage, see http://www.hut.fi/u/jkorpela/latin1/3.html#B7 ) It would be desirable to impose _some_ requirements or at least recommendations about division of text into lines by browsers. They could try to cover some small but important subset of the line breaking default rules in Unicode, see http://www.unicode.org/unicode/reports/tr14/ But even if both such rules and language-sensitive word division methods will be applied, and _especially_ since it will take long time before they will be widely useable, some methods for preventing line breaks _and_ for explicitly allowing line breaks are needed. (The objection that they should be handled in style sheets would be theoretical at present, and not very good theory in my opinion; such issues are difficult to handle in CSS even in principle, since the natural way to handle them is _interspersed_ markup; and it's questionable whether the inherent indivisibility or divisibility of a string is a purely presentational issue.) Well, such methods actually exist and are widely supported, though not defined in HTML specifications: WBR and NOBR. They could be defined simply as phrase level markup and as applicable to textual data only. It's obviously desirable to be able to specify allowable break points within long "words". The need for NOBR is not that obvious but see some notes at http://www.hut.fi/u/jkorpela/html/nobr.html or regard it just a logical counterpart to WBR. :-) In a case like the one discussed, WBR would not make (or does not make; it can actually be used at present, it's just not standardized) things that simple, but at least one could programmatically insert <wbr> (or <wbr />) after each colon or middle dot. -- Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html
Received on Friday, 14 April 2000 08:44:43 UTC