W3C home > Mailing lists > Public > www-html@w3.org > August 2005

Re: tag for notion and compound indication

From: Devin Bayer <devin.bayer@rochester.edu>
Date: Fri, 05 Aug 2005 21:17:10 -0700
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: XHTML-Liste <www-html@w3.org>
Message-id: <0EA158A7-6E81-465A-9899-B1117D3C2EF1@rochester.edu>

On Aug 5, 2005, at 15:26, Jukka K. Korpela wrote:

> Thus, Bundes&#2060;regierung would differ from Bundesregierung only  
> by disallowing a line break - at the most preferable point of  
> division!

Words are not supposed to be divided in the middle.

> Indicating the internal structure of a word, such as its being a  
> compound word, would most logically belong to markup, not character  
> level.

I disagree.  The internal structure of a word is the job for  
character data.  Markup is much better at the external structure, at  
the word level and above.

> As a rule, empty elements indicate design flaws in markup language.
> The logical markup would be something like
> <compound><part>Bund<case>es</case></part><part>regierung</part></ 
> compound>

This solution is way over-engineered.  I can't imagine you want to  
work with that kind of mess.  Do you support using markup on the  
first letter of a sentence instead of having capital characters?

I do agree I had the semantics a little off.  At first I wanted to go  
with the Zero-Width No-Break character, because that implies two things:

     * Normally the adjacent characters should be considered part of  
two words
     * The word should never be broken when wrapping.

However, that character is deprecated and is replaced by the word  
joiner.  Unicode has no notion of compound words.  Even the hyphen is  
completely undefined semantically. So, I recommended the character  
with the closest semantics, even though it isn't perfect.  It joins  
two words.  That's almost what we need.

-- Devin Bayer
Received on Saturday, 6 August 2005 04:18:03 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:11 UTC