- From: Nathan <nathan@webr3.org>
- Date: Tue, 05 Apr 2011 01:47:59 +0100
- To: RDF WG <public-rdf-wg@w3.org>
- CC: RDFA Working Group <public-rdfa-wg@w3.org>
Hi All,
I've always heard of an "IRIs can contain whitespace" issue. So I
thought I'd take a closer look.
From what I can tell, IRI extends the the class of unreserved
charectors by adding the characters of the UCS beyond U+007F.
Here's a chart of all the white space chars defined in unicode, and
whether they need to be percent encoded, or whether they can be included
as is:
----------------------------------------
| U+0009 \t
| U+000A \n
| U+000B \v
% encoded --> | U+000C \f
| U+000D \r
| U+0020 SPACE
| U+0085 NEL (NEXT LINE)
----------------------------------------
| U+00A0 NBSP (NO-BREAK SPACE)
| U+1680 OGHAM SPACE MARK
| U+180E MONGOLIAN VOWEL SEPARATOR
| U+2000 EN QUAD
| U+2001 EM QUAD
| U+2002 EN SPACE
allowed in IRI -->| U+2003 EM SPACE
| U+2004 THREE-PER-EM SPACE
| U+2005 FOUR-PER-EM SPACE
| U+2006 SIX-PER-EM SPACE
| U+2007 FIGURE SPACE
| U+2008 PUNCTUATION SPACE
| U+2009 THIN SPACE
| U+200A HAIR SPACE
| U+2028 LINE SEPARATOR
| U+2029 PARAGRAPH SEPARATOR
| U+202F NARROW NO-BREAK SPACE
| U+205F MEDIUM MATHEMATICAL SPACE
| U+3000 IDEOGRAPHIC SPACE
----------------------------------------
In Turtle, SPARQL, RDFa 1.1 Core (and XML 5th edition) whitespace is
defined as:
U+0009 U+000A U+000D U+0020
So where's the collission / issue? I'm a little confused now.
Best,
Nathan
Received on Tuesday, 5 April 2011 00:48:49 UTC