- From: <mahmoudbahaa.eg@gmail.com>
- Date: Thu, 28 May 2009 06:07:07 -0500
- To: public-xhtml2@w3.org
- CC: voyager-issues@mn.aptest.com
In the 2nd edition of XHTML http://www.w3.org/TR/2002/REC-xhtml1-20020801/#uaconf the following paragraph from the 1st version here http://www.w3.org/TR/2000/REC-xhtml1-20000126/#uaconf > . The XHTML user agent in addition, must treat the following characters as whitespace: > > Form feed () > Zero-width space (​) > > In elements where the 'xml:space' attribute is set to 'preserve', the user agent must leave all whitespace characters intact (with the exception of leading and trailing whitespace characters, which should be removed). Otherwise, whitespace is handled according to the following rules: > > All whitespace surrounding block elements should be removed. > Comments are removed entirely and do not affect whitespace handling. One whitespace character on either side of a comment is treated as two white space characters. > Leading and trailing whitespace inside a block element must be removed. > Line feed characters within a block element must be converted into a space (except when the 'xml:space' attribute is set to 'preserve'). > A sequence of white space characters must be reduced to a single space character (except when the 'xml:space' attribute is set to 'preserve'). > With regard to rendition, the User Agent should render the content in a manner appropriate to the language in which the content is written. In languages whose primary script is Latinate, the ASCII space character is typically used to encode both grammatical word boundaries and typographic whitespace; in languages whose script is related to Nagari (e.g., Sanskrit, Thai, etc.), grammatical boundaries may be encoded using the ZW 'space' character, but will not typically be represented by typographic whitespace in rendered output; languages using Arabiform scripts may encode typographic whitespace using a space character, but may also use the ZW space character to delimit 'internal' grammatical boundaries (what look like words in Arabic to an English eye frequently encode several words, e.g. 'kitAbuhum' = 'kitAbu-hum' = 'book them' == their book); and languages in the Chinese script tradition typically neither encode such delimiters nor use typographic whitespace in this way. was removed from section 3.2 which I seriously don't know why first off the first part of the removed part of considering Form feed () % Zero-width space (​) as white spaces as well seems consistent with the HTML 4.01 spec particularly section 9.1 on white space http://www.w3.org/TR/html401/struct/text.html#h-9.1where it says : > In HTML, only the following characters are defined as white space characters: > > ASCII space ( ) > ASCII tab (	) > ASCII form feed () > Zero-width space (​) as for the rest it actually explain how conforming user agents should handle white spaces which what this part was all about as it says in the the first line of the paragraph "White space is handled according to the following rules" and with these rules removed the paragraph seems missing an important part now . the behavior described in 1st version seems consistent with that done in HTML 4.01 user agents , so does that mean XHTML 1.0 2nd edition define no specific behavior for user agents to handle white spaces or the entire removal of this paragraph was not intentional ?
Received on Thursday, 28 May 2009 11:11:35 UTC