Whitespace in various specs

HTML5

The space characters, for the purposes of this specification, are U+0020  
SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR" (U+000D).

The White_Space characters are those that have the Unicode property  
"White_Space" in the Unicode PropList.txt data file. [UNICODE]

https://www.w3.org/TR/html5/infrastructure.html#space-separated-tokens
https://www.w3.org/TR/html5/infrastructure.html#space-character

XML

spaces, tabs, and blank lines

 S    ::=    (#x20 | #x9 | #xD | #xA)+

https://www.w3.org/TR/REC-xml/#NT-S

XPATH

  same as XML

SCHEMA

All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return)  
are replaced with #x20 (space).

https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/datatypes.html#rf-whiteSpace

CSS

Space, tab, newline, linefeed
"UAs may additionally treat other forced break characters as newline  
characters per UAX14." [UAX14 undefined]
https://www.w3.org/TR/CSS2/text.html#white-space-prop

UNICODE

17 characters

http://www.fileformat.info/info/unicode/category/Zs/list.htm

"Spaces, separator characters and other control characters which should be  
treated by programming languages as "white space" for the purpose of  
parsing elements. See also Line_Break, Grapheme_Cluster_Break,  
Sentence_Break, and Word_Break, which classify space characters and  
related controls somewhat differently for particular text segmentation  
contexts."

http://www.unicode.org/reports/tr44/#White_Space

Steven

Received on Wednesday, 30 March 2016 10:54:43 UTC