W3C home > Mailing lists > Public > public-xformsusers@w3.org > March 2016

Whitespace in various specs

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Wed, 30 Mar 2016 12:54:06 +0200
To: "public-xformsusers@w3.org" <public-xformsusers@w3.org>
Message-ID: <op.ye4wkgt6smjzpq@steven-aspire-s7>
HTML5

The space characters, for the purposes of this specification, are U+0020  
SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR" (U+000D).

The White_Space characters are those that have the Unicode property  
"White_Space" in the Unicode PropList.txt data file. [UNICODE]

https://www.w3.org/TR/html5/infrastructure.html#space-separated-tokens
https://www.w3.org/TR/html5/infrastructure.html#space-character

XML

spaces, tabs, and blank lines

	S	   ::=   	(#x20 | #x9 | #xD | #xA)+

https://www.w3.org/TR/REC-xml/#NT-S

XPATH

  same as XML

SCHEMA

All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return)  
are replaced with #x20 (space).

https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/datatypes.html#rf-whiteSpace

CSS

Space, tab, newline, linefeed
"UAs may additionally treat other forced break characters as newline  
characters per UAX14." [UAX14 undefined]
https://www.w3.org/TR/CSS2/text.html#white-space-prop

UNICODE

17 characters

http://www.fileformat.info/info/unicode/category/Zs/list.htm

"Spaces, separator characters and other control characters which should be  
treated by programming languages as "white space" for the purpose of  
parsing elements. See also Line_Break, Grapheme_Cluster_Break,  
Sentence_Break, and Word_Break, which classify space characters and  
related controls somewhat differently for particular text segmentation  
contexts."

http://www.unicode.org/reports/tr44/#White_Space

Steven
Received on Wednesday, 30 March 2016 10:54:43 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 30 March 2016 10:54:43 UTC