W3C home > Mailing lists > Public > public-tt@w3.org > August 2006

Re: Words and spaces

From: Chris Lilley <chris@w3.org>
Date: Fri, 25 Aug 2006 09:57:30 +0200
Message-ID: <1109782640.20060825095730@w3.org>
To: "Glenn A. Adams" <gadams@xfsi.com>
Cc: public-tt@w3.org

On Thursday, July 27, 2006, 5:07:43 PM, Glenn wrote:

GAA> Chris,

GAA> Thanks for your comment. The TT WG has reviewed this comment has agreed
GAA> upon the following response:

GAA> Regarding your question, it depends upon whether the language or writing
GAA> system is unknown or unspecified. If either of these cases hold, then,
GAA> according to rule 2 above, each of your examples except the last would
GAA> be interpreted as one word. The last would be interpreted as two words,
GAA> presuming that the ' ' between "Masayasu" and "Ishikawa" is represented
GAA> as #x20. In contrast, if the language or writing system is known, e.g.,
GAA> if xml:lang="en" is specified on the root element (and no override
GAA> appears), then a word unit is specified in accordance of the rules of
GAA> that language or writing system. DFXP does not specify these latter
GAA> rules in an interoperable manner (as Unicode also does not specify).

Thank you for the clarification. This response is satisfactory to me.

GAA> Regards,
GAA> Glenn


GAA> -----Original Message-----
GAA> From: public-tt-request@w3.org [mailto:public-tt-request@w3.org] On
GAA> Behalf Of Chris Lilley
GAA> Sent: Saturday, June 03, 2006 2:04 AM
GAA> To: public-tt@w3.org
GAA> Subject: Words and spaces


GAA> Hello public-tt,

GAA> In section 8.3.7 <flowFunction>

GAA>  The dynamic flow unit word must be interpreted as being dependent upon
GAA>  the language or writing system of the affected content. If the language
GAA>  or writing system is unknown or unspecified, then word is interpreted
GAA>  as follows:

GAA>    1. If the affected content consists solely or mostly of Unified CJK
GAA>    Ideographic characters or of characters of another Unicode character
GAA>    block that are afforded similar treatment to that of Unified CJK
GAA>    Ideographic characters, then word is to be interpreted as if
GAA>    character were specified.
GAA>    
GAA>    2. Otherwise, word is to be interpreted as denoting a sequence of one
GAA>    or more characters that are not interpreted as an XML whitespace
GAA>    character.

GAA> Noting the "must" which is a testable conformance requirement, do the
GAA> following paragraphs contain one word or two?

GAA> <p>Hello&#x3000;World</p>
GAA> <p xml:lang="en">Hello&#x3000;World</p>
GAA> <p xml:lang="en">Hello&#x2004;World</p>
GAA> <p xml:lang="ja">Hello&#x3000;World</p>
GAA> <p xml:lang="ja">Hello&#x2004;World</p>
GAA> <p xml:lang="ja">Masayasu Ishikawa</p>

GAA> For a list of Unicode space characters, see for example
GAA> http://www.cs.tut.fi/~jkorpela/chars/spaces.html






-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG
Received on Friday, 25 August 2006 07:58:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 2 November 2009 22:41:34 GMT