W3C home > Mailing lists > Public > public-rdf-comments@w3.org > August 2012

Re: RDF 1.1 IRIs and %-escaping

From: Graham Klyne <GK@ninebynine.org>
Date: Mon, 06 Aug 2012 12:01:26 +0100
Message-ID: <501FA406.5070409@ninebynine.org>
To: David Booth <david@dbooth.org>
CC: public-rdf-comments@w3.org
Daviod,

I too think it is all a bit confusing ;)

I think it has, in part, to do with the fact that a URI can be considered as a 
sequence of characters OR as a sequence of octets.  My understanding is that the 
RFRC3986 syntax refers to URIs encoded as octets via ASCII codepoints (cf. 
http://tools.ietf.org/html/rfc3986#section-2).

When URIs are encoded in XML, such as in RDF/XML, then they are presented as 
*character* sequences rather than octet sequences. The confusion arises, I 
think≤ because %-escaping is use in two distinct ways:  (1) to encode 
out-of-range characters as ASCII code-points (which here includes spaces), and 
(2) to encode characters that would otherwise be interpreted as delimiters.  In 
case (1), the encoding is not necessarily needed when the URI is considered as a 
sequence of characters, but is needed when the URI is considered as a sequence 
of octets.  For case (2), the %-encoding is needed whether the URI is considered 
as characters or octets.  Bhy this reading a space in a URI in an XML document 
need not be %-escaped.

The above reading of RFC3986 is highly questionable, and somewhat at odds with 
section 2.4.  Mostly, it is my (weak) attempt to rationalize the behaviour of 
existing software which clearly *does* allow spaces in URIs in RDF/XML 
(including the W3C RDF validator). So, irrespective of how this is squared with 
RFC3986, I think you'll find a significant body of existing data and/or apps 
that *do* allow spaces in URIs transported as character sequences.

Another way to read this is that RDF/XML conveys not URIs (per RFC3986), but 
character sequences that conform to URI syntax when the %-encoding rules are 
applied (cf. http://tools.ietf.org/html/rfc3986#section-1.2.1).  I think (from 
memory) this is pretty much what the current RDF spec says, and also the XML 
schema datatype text describing AnyURI.

#g
--

On 19/07/2012 14:52, David Booth wrote:
> Graham,
>
> I'm confused about this, because to my knowledge, a URI has *never* been
> allowed to contain an unescaped space.  Unless I'm misreading RFC 3986
> grammar
> http://www.ietf.org/rfc/rfc3986.txt
> (and the grammar older URI spec RFC 2396),
> http://www.ietf.org/rfc/rfc2396.txt
> spaces in a URI *must* be percent-encoded.  In fact, I am dismayed to
> see that some recent browsers are now incorrectly displaying URIs as
> containing spaces, instead of %20's, thus misleading people into
> thinking that URIs can contain spaces.
>
> Can you clarify?
>
> David
>
> On Thu, 2012-07-19 at 11:57 +0100, Graham Klyne wrote:
>> With reference to http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-IRIs
>>
>> And in particular to the note:
>> [[
>> Previous versions of RDF used the term “RDF URI Reference” instead of “IRI” and
>> allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘“’
>> (double quote), and “ ” (space). In IRIs, these characters must be
>> percent-encoded as described in section 2.1 of [URI].
>> ]]
>>
>> I have a concern that this change may lead to incompatibility with deployed
>> software, and consequent failure of interoperability.
>>
>> Currently, the W3C RDF validator, Python rdflib and Jena libraries all allow
>> and/or generate RDF with URIs that contain unescaped spaces (and presumably
>> other characters).
>>
>> This note suggests that spaces (and other characters) must be %-escaped before
>> being serialized into RDF 1.1, where current practice, as far as I can discern,
>> is to assume that RDF carries character-encoded URIs that are not %-escaped.
>>
>> For the following, I assume that RDF 1.1 is intending to say that spaces MUST be
>> %-escaped in URIs used as RDF node identifiers.
>>
>> Suppose I implement a service that accepts RDF from its clients.  What is it to
>> do about URIs containing unescaped spaces?
>>
>> If it rejects them as ill-formed, then it fails compatibility with existing
>> clients that provide RDF 1.0 compatible data.
>>
>> If it applies %-escaping to non-URI-valid characters, this will result in
>> double-escaping of RDF data from RDF 1.1 clients, something that RFC3986 says
>> must be guarded against (http://tools.ietf.org/html/rfc3986#section-2.4), and
>> may fail to recognize as equal URIs that should be equal provided by RDV 1.0 and
>> RDF 1.1 clients.
>>
>> ****
>>
>> And a nit: the [IRI] reference in this document actually links to the URI spec.
>>
>> #g
>>
>>
>>
>>
>
Received on Tuesday, 7 August 2012 06:58:17 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:29:53 UTC