- From: Nathan <nathan@webr3.org>
- Date: Tue, 05 Apr 2011 01:47:59 +0100
- To: RDF WG <public-rdf-wg@w3.org>
- CC: RDFA Working Group <public-rdfa-wg@w3.org>
Hi All, I've always heard of an "IRIs can contain whitespace" issue. So I thought I'd take a closer look. From what I can tell, IRI extends the the class of unreserved charectors by adding the characters of the UCS beyond U+007F. Here's a chart of all the white space chars defined in unicode, and whether they need to be percent encoded, or whether they can be included as is: ---------------------------------------- | U+0009 \t | U+000A \n | U+000B \v % encoded --> | U+000C \f | U+000D \r | U+0020 SPACE | U+0085 NEL (NEXT LINE) ---------------------------------------- | U+00A0 NBSP (NO-BREAK SPACE) | U+1680 OGHAM SPACE MARK | U+180E MONGOLIAN VOWEL SEPARATOR | U+2000 EN QUAD | U+2001 EM QUAD | U+2002 EN SPACE allowed in IRI -->| U+2003 EM SPACE | U+2004 THREE-PER-EM SPACE | U+2005 FOUR-PER-EM SPACE | U+2006 SIX-PER-EM SPACE | U+2007 FIGURE SPACE | U+2008 PUNCTUATION SPACE | U+2009 THIN SPACE | U+200A HAIR SPACE | U+2028 LINE SEPARATOR | U+2029 PARAGRAPH SEPARATOR | U+202F NARROW NO-BREAK SPACE | U+205F MEDIUM MATHEMATICAL SPACE | U+3000 IDEOGRAPHIC SPACE ---------------------------------------- In Turtle, SPARQL, RDFa 1.1 Core (and XML 5th edition) whitespace is defined as: U+0009 U+000A U+000D U+0020 So where's the collission / issue? I'm a little confused now. Best, Nathan
Received on Tuesday, 5 April 2011 00:48:49 UTC