- From: fantasai <fantasai.lists@inkedblade.net>
- Date: Sat, 11 Jun 2005 12:27:31 -0400
- To: Unicode Mailing List <unicode@unicode.org>
- CC: www-style@w3.org, www-internationalization@w3.org
Andreas Prilop wrote on the Unicode mailing list[1]: > Does the Unicode standard only deal with plain text or > does it also deal with text in markup languages like SGML/HTML? > > I wonder whether Arabic letters should join when they are > separated by markup. Here's an example: > > http://www.unics.uni-hannover.de/nhtcapri/temp/nastaliq.html > > Current programs display the letters separated by markup > differently: Internet Explorer 6 and StarOffice 7 join the > letters, but Mozilla 1.7 does not. > > Is it left to the rules of SGML/HTML to decide or > has the Unicode standard any opinion about this? In semantic markup languages like HTML, it's really the domain of the formatting system used to process the markup, not the markup system itself. [1] So, for web pages, this behavior would be governed by the Unicode and CSS specs. I haven't read the Unicode book cover to cover, but since there's an argument here, I'm guessing it's not covered by Unicode quite yet. :) Like many other people here, I think that the goal should be to make the text as readable as possible, even if it means ignoring some of the styling. Therefore, these are the rules I suggest: For characters within the same inline sequence. 1. Shaping and joining behavior MUST NOT be affected by element boundaries. 2. Ligatures, including obligatory ligatures, MUST be broken if the formatting rules introduce extra space between the affected characters (e.g. by putting a border and margin around one of the characters). 3. Optional ligatures SHOULD be broken if the formatting rules cannot otherwise be accomodated. 4. Obligatory ligatures MUST NOT be broken if the formatting rules introduce no extra space between the affected characters, even if this means some of the characters are rendered in the wrong font or as part of the wrong visual element. 5. Combining characters MUST be rendered as the combined grapheme cluster if the system is capable of rendering the combination, even if this means some of the characters are rendered in the wrong font or as part of the wrong visual element. The combined grapheme cluster SHOULD be rendered as part of the base character's element, or, in the case of combining jamos, the initial character's element. I'm quite certain of #1, but as I don't have extensive background in this stuff, I am not so certain of the others. Comments are appreciated. I can ask the CSS Working Group to consider adding a recommendation to the next revision of CSS2.1 if there seems to be a consensus around a particular set of rules, and/or to refer to relevant parts of the Unicode standard. ~fantasai [1] http://www.unicode.org/mail-arch/unicode-ml/y2005-m06/0110.html username: unicode-ml ; pass: unicode [2] CSS determines whether an element visually behaves as a block or an inline or a table cell. Given the CSS rule * { display: inline; } both <div>ARA</div><div>BIC</div> and <span>ARA</span><span>BIC</span> would result in the exact same rendering.
Received on Saturday, 11 June 2005 20:29:51 UTC