- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sat, 6 Aug 2005 08:07:25 +0300 (EEST)
- To: XHTML-Liste <www-html@w3.org>
On Fri, 5 Aug 2005, Devin Bayer wrote: > On Aug 5, 2005, at 15:26, Jukka K. Korpela wrote: > >> Thus, Bundesࠌregierung would differ from Bundesregierung only by >> disallowing a line break - at the most preferable point of division! > > Words are not supposed to be divided in the middle. They are, when needed typographically. Web browsers traditionally break words, but this is just a symptom of their being primitive in handling text. Consult suitable general and language-specific manuals on typography and orthography if you disagree. >> Indicating the internal structure of a word, such as its being a compound >> word, would most logically belong to markup, not character level. > > I disagree. The internal structure of a word is the job for character data. > Markup is much better at the external structure, at the word level and above. Characters, as defined by Unicode and other character standards, correspond to units of written text, such as letters, digits, punctuation, syllabic characters, ideographic characters, and mathematical symbols. Characters are not meant to act as invisible descriptors of structure. There are deviations from this, mainly from historical reasons, but you would find it impossible to persuade the relevant standardization bodies on a fundamental change towards using characters to indicate the internal structure of a word. >> As a rule, empty elements indicate design flaws in markup language. >> The logical markup would be something like >> >> <compound><part>Bund<case>es</case></part><part>regierung</part></compound> > > This solution is way over-engineered. I was simply making a logical conclusion. If you wish to have the internal structure of a word indicated, that would be about the _simplest_ engineering that is consistent with modern design principles. > I can't imagine you want to work with > that kind of mess. Does anyone work with the XML mess "by hand"? Besides, my fictitious markup is very simple and logical; consider MathML, which is much more messy, yet taken seriously by many. > Do you support using markup on the first letter of a > sentence instead of having capital characters? No, but I could well imagine using markup for a sentence, with a style sheet used to capitalize the first character of the first word of a sentence, if that's the style. The real question is what kind of structures should be indicated in markup. (Just having some markup element does not mean one would need to use it for every occasion where it might be used. So this is partly a matter of markup language design, partly a matter of authors' choices.) I would need a good reason to indicate the "fine structure" of text in the first place. But once you have decided on such a matter, the rest is rather obvious, except perhaps for the choice of minor details like element and attribute names. (You might prefer write-only markup with cryptic single-letter names, or long and descriptive names, or something between - HTML is currently an awkward mixture in this respect.) > I do agree I had the semantics a little off. No, you had it completely wrong. The Unicode standard says that the word joiner is for line break control only. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Saturday, 6 August 2005 05:07:32 UTC