RDF 1.1 IRIs and %-escaping

With reference to http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-IRIs

And in particular to the note:
[[
Previous versions of RDF used the term “RDF URI Reference” instead of “IRI” and 
allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘“’ 
(double quote), and “ ” (space). In IRIs, these characters must be 
percent-encoded as described in section 2.1 of [URI].
]]

I have a concern that this change may lead to incompatibility with deployed 
software, and consequent failure of interoperability.

Currently, the W3C RDF validator, Python rdflib and Jena libraries all allow 
and/or generate RDF with URIs that contain unescaped spaces (and presumably 
other characters).

This note suggests that spaces (and other characters) must be %-escaped before 
being serialized into RDF 1.1, where current practice, as far as I can discern, 
is to assume that RDF carries character-encoded URIs that are not %-escaped.

For the following, I assume that RDF 1.1 is intending to say that spaces MUST be 
%-escaped in URIs used as RDF node identifiers.

Suppose I implement a service that accepts RDF from its clients.  What is it to 
do about URIs containing unescaped spaces?

If it rejects them as ill-formed, then it fails compatibility with existing 
clients that provide RDF 1.0 compatible data.

If it applies %-escaping to non-URI-valid characters, this will result in 
double-escaping of RDF data from RDF 1.1 clients, something that RFC3986 says 
must be guarded against (http://tools.ietf.org/html/rfc3986#section-2.4), and 
may fail to recognize as equal URIs that should be equal provided by RDV 1.0 and 
RDF 1.1 clients.

****

And a nit: the [IRI] reference in this document actually links to the URI spec.

#g

Received on Thursday, 19 July 2012 10:59:47 UTC