IRI Whitespace?

Hi All,

I've always heard of an "IRIs can contain whitespace" issue. So I 
thought I'd take a closer look.

 From what I can tell, IRI extends the the class of unreserved 
charectors by adding the characters of the UCS beyond U+007F.

Here's a chart of all the white space chars defined in unicode, and 
whether they need to be percent encoded, or whether they can be included 
as is:

                    ----------------------------------------
                   |  U+0009 \t
                   |  U+000A \n
                   |  U+000B \v
     % encoded --> |  U+000C \f
                   |  U+000D \r
                   |  U+0020 SPACE
                   |  U+0085 NEL (NEXT LINE)
                    ----------------------------------------
                   |  U+00A0 NBSP (NO-BREAK SPACE)
                   |  U+1680 OGHAM SPACE MARK
                   |  U+180E MONGOLIAN VOWEL SEPARATOR
                   |  U+2000 EN QUAD
                   |  U+2001 EM QUAD
                   |  U+2002 EN SPACE
allowed in IRI -->|  U+2003 EM SPACE
                   |  U+2004 THREE-PER-EM SPACE
                   |  U+2005 FOUR-PER-EM SPACE
                   |  U+2006 SIX-PER-EM SPACE
                   |  U+2007 FIGURE SPACE
                   |  U+2008 PUNCTUATION SPACE
                   |  U+2009 THIN SPACE
                   |  U+200A HAIR SPACE
                   |  U+2028 LINE SEPARATOR
                   |  U+2029 PARAGRAPH SEPARATOR
                   |  U+202F NARROW NO-BREAK SPACE
                   |  U+205F MEDIUM MATHEMATICAL SPACE
                   |  U+3000 IDEOGRAPHIC SPACE
                    ----------------------------------------

In Turtle, SPARQL, RDFa 1.1 Core (and XML 5th edition) whitespace is 
defined as:

   U+0009 U+000A U+000D U+0020

So where's the collission / issue? I'm a little confused now.

Best,

Nathan

Received on Tuesday, 5 April 2011 00:49:47 UTC