RE: Words and spaces

Chris,

Thanks for your comment. The TT WG has reviewed this comment has agreed
upon the following response:

Regarding your question, it depends upon whether the language or writing
system is unknown or unspecified. If either of these cases hold, then,
according to rule 2 above, each of your examples except the last would
be interpreted as one word. The last would be interpreted as two words,
presuming that the ' ' between "Masayasu" and "Ishikawa" is represented
as #x20. In contrast, if the language or writing system is known, e.g.,
if xml:lang="en" is specified on the root element (and no override
appears), then a word unit is specified in accordance of the rules of
that language or writing system. DFXP does not specify these latter
rules in an interoperable manner (as Unicode also does not specify).

Regards,
Glenn


-----Original Message-----
From: public-tt-request@w3.org [mailto:public-tt-request@w3.org] On
Behalf Of Chris Lilley
Sent: Saturday, June 03, 2006 2:04 AM
To: public-tt@w3.org
Subject: Words and spaces


Hello public-tt,

In section 8.3.7 <flowFunction>

 The dynamic flow unit word must be interpreted as being dependent upon
 the language or writing system of the affected content. If the language
 or writing system is unknown or unspecified, then word is interpreted
 as follows:

   1. If the affected content consists solely or mostly of Unified CJK
   Ideographic characters or of characters of another Unicode character
   block that are afforded similar treatment to that of Unified CJK
   Ideographic characters, then word is to be interpreted as if
   character were specified.
   
   2. Otherwise, word is to be interpreted as denoting a sequence of one
   or more characters that are not interpreted as an XML whitespace
   character.

Noting the "must" which is a testable conformance requirement, do the
following paragraphs contain one word or two?

<p>Hello&#x3000;World</p>
<p xml:lang="en">Hello&#x3000;World</p>
<p xml:lang="en">Hello&#x2004;World</p>
<p xml:lang="ja">Hello&#x3000;World</p>
<p xml:lang="ja">Hello&#x2004;World</p>
<p xml:lang="ja">Masayasu Ishikawa</p>

For a list of Unicode space characters, see for example
http://www.cs.tut.fi/~jkorpela/chars/spaces.html


-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG

Received on Thursday, 27 July 2006 15:08:05 UTC