- From: Devin Bayer <devin.bayer@rochester.edu>
- Date: Fri, 05 Aug 2005 21:17:10 -0700
- To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
- Cc: XHTML-Liste <www-html@w3.org>
On Aug 5, 2005, at 15:26, Jukka K. Korpela wrote: > Thus, Bundesࠌregierung would differ from Bundesregierung only > by disallowing a line break - at the most preferable point of > division! Words are not supposed to be divided in the middle. > Indicating the internal structure of a word, such as its being a > compound word, would most logically belong to markup, not character > level. I disagree. The internal structure of a word is the job for character data. Markup is much better at the external structure, at the word level and above. > As a rule, empty elements indicate design flaws in markup language. > The logical markup would be something like > > <compound><part>Bund<case>es</case></part><part>regierung</part></ > compound> This solution is way over-engineered. I can't imagine you want to work with that kind of mess. Do you support using markup on the first letter of a sentence instead of having capital characters? I do agree I had the semantics a little off. At first I wanted to go with the Zero-Width No-Break character, because that implies two things: * Normally the adjacent characters should be considered part of two words * The word should never be broken when wrapping. However, that character is deprecated and is replaced by the word joiner. Unicode has no notion of compound words. Even the hyphen is completely undefined semantically. So, I recommended the character with the closest semantics, even though it isn't perfect. It joins two words. That's almost what we need. -- Devin Bayer
Received on Saturday, 6 August 2005 04:18:03 UTC