W3C home > Mailing lists > Public > public-rdf-comments@w3.org > July 2012

Re: RDF 1.1 IRIs and %-escaping

From: David Booth <david@dbooth.org>
Date: Thu, 19 Jul 2012 09:52:30 -0400
To: Graham Klyne <GK@ninebynine.org>
Cc: public-rdf-comments@w3.org
Message-ID: <1342705950.3486.9397.camel@dbooth-laptop>

I'm confused about this, because to my knowledge, a URI has *never* been
allowed to contain an unescaped space.  Unless I'm misreading RFC 3986
(and the grammar older URI spec RFC 2396), 
spaces in a URI *must* be percent-encoded.  In fact, I am dismayed to
see that some recent browsers are now incorrectly displaying URIs as
containing spaces, instead of %20's, thus misleading people into
thinking that URIs can contain spaces.

Can you clarify?


On Thu, 2012-07-19 at 11:57 +0100, Graham Klyne wrote:
> With reference to http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-IRIs
> And in particular to the note:
> [[
> Previous versions of RDF used the term “RDF URI Reference” instead of “IRI” and 
> allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘“’ 
> (double quote), and “ ” (space). In IRIs, these characters must be 
> percent-encoded as described in section 2.1 of [URI].
> ]]
> I have a concern that this change may lead to incompatibility with deployed 
> software, and consequent failure of interoperability.
> Currently, the W3C RDF validator, Python rdflib and Jena libraries all allow 
> and/or generate RDF with URIs that contain unescaped spaces (and presumably 
> other characters).
> This note suggests that spaces (and other characters) must be %-escaped before 
> being serialized into RDF 1.1, where current practice, as far as I can discern, 
> is to assume that RDF carries character-encoded URIs that are not %-escaped.
> For the following, I assume that RDF 1.1 is intending to say that spaces MUST be 
> %-escaped in URIs used as RDF node identifiers.
> Suppose I implement a service that accepts RDF from its clients.  What is it to 
> do about URIs containing unescaped spaces?
> If it rejects them as ill-formed, then it fails compatibility with existing 
> clients that provide RDF 1.0 compatible data.
> If it applies %-escaping to non-URI-valid characters, this will result in 
> double-escaping of RDF data from RDF 1.1 clients, something that RFC3986 says 
> must be guarded against (http://tools.ietf.org/html/rfc3986#section-2.4), and 
> may fail to recognize as equal URIs that should be equal provided by RDV 1.0 and 
> RDF 1.1 clients.
> ****
> And a nit: the [IRI] reference in this document actually links to the URI spec.
> #g

David Booth, Ph.D.

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Thursday, 19 July 2012 13:53:04 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:59:30 UTC