Re: tag for notion and compound indication

On Aug 5, 2005, at 15:26, Jukka K. Korpela wrote:

> Thus, Bundesࠌregierung would differ from Bundesregierung only  
> by disallowing a line break - at the most preferable point of  
> division!

Words are not supposed to be divided in the middle.

> Indicating the internal structure of a word, such as its being a  
> compound word, would most logically belong to markup, not character  
> level.

I disagree.  The internal structure of a word is the job for  
character data.  Markup is much better at the external structure, at  
the word level and above.

> As a rule, empty elements indicate design flaws in markup language.
> The logical markup would be something like
>
> <compound><part>Bund<case>es</case></part><part>regierung</part></ 
> compound>

This solution is way over-engineered.  I can't imagine you want to  
work with that kind of mess.  Do you support using markup on the  
first letter of a sentence instead of having capital characters?

I do agree I had the semantics a little off.  At first I wanted to go  
with the Zero-Width No-Break character, because that implies two things:

     * Normally the adjacent characters should be considered part of  
two words
     * The word should never be broken when wrapping.

However, that character is deprecated and is replaced by the word  
joiner.  Unicode has no notion of compound words.  Even the hyphen is  
completely undefined semantically. So, I recommended the character  
with the closest semantics, even though it isn't perfect.  It joins  
two words.  That's almost what we need.

-- Devin Bayer

Received on Saturday, 6 August 2005 04:18:03 UTC