Re: RDF 1.1 IRIs and %-escaping from Andy Seaborne on 2012-07-19 (public-rdf-comments@w3.org from July 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 19 Jul 2012 15:58:34 +0100
To: public-rdf-comments@w3.org
Message-ID: <5008209A.1070609@epimorphics.com>

On 19/07/12 11:57, Graham Klyne wrote:
> With reference to
> http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-IRIs
>
> And in particular to the note:
> [[
> Previous versions of RDF used the term “RDF URI Reference” instead of
> “IRI” and allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”,
> “^”, “`”, ‘“’ (double quote), and “ ” (space). In IRIs, these characters
> must be percent-encoded as described in section 2.1 of [URI].
> ]]
>
> I have a concern that this change may lead to incompatibility with
> deployed software, and consequent failure of interoperability.
>
> Currently, the W3C RDF validator, Python rdflib and Jena libraries all
> allow and/or generate RDF with URIs that contain unescaped spaces (and
> presumably other characters).

You can create technically legal unserializable graphs in RDF-2004 
(spaces in properties) and restrict serialization possibilities (clases 
in class names).

Jena does not make any guarantees for RDF URI Reference with spaces in - 
specifically writing then reading in again may generate lots of warnings.

We *strongly* discourage their use.

They are not legal in SPARQL 1.0 queries.

They were an unfortunate effect of RDF-2004 going ahead before IRIs were 
finalised (or some I'm told - I wasn't there - Graham was).

The only place that allows them in any spec is in "RDF URI Reference" - 
it is time to fix on using IRIs in all semantic web specs and drop "RDF 
URI Reference".

> This note suggests that spaces (and other characters) must be %-escaped
> before being serialized into RDF 1.1, where current practice, as far as
> I can discern, is to assume that RDF carries character-encoded URIs that
> are not %-escaped.

Nit pick: encode != escape.

The string "abc\ndef" really does have a raw LF in it and is 7 chars 
long. The string "abc%0Adef" does not and is 9 chars long.

> For the following, I assume that RDF 1.1 is intending to say that spaces
> MUST be %-escaped in URIs used as RDF node identifiers.
>
> Suppose I implement a service that accepts RDF from its clients.  What
> is it to do about URIs containing unescaped spaces?
 >
> If it rejects them as ill-formed, then it fails compatibility with
> existing clients that provide RDF 1.0 compatible data.

But do they occur intentionally in the wild?

I only ever see them occurring by accident.

> If it applies %-escaping to non-URI-valid characters, this will result
> in double-escaping of RDF data from RDF 1.1 clients, something that
> RFC3986 says must be guarded against
> (http://tools.ietf.org/html/rfc3986#section-2.4), and may fail to
> recognize as equal URIs that should be equal provided by RDV 1.0 and RDF
> 1.1 clients.
>
> ****
>
> And a nit: the [IRI] reference in this document actually links to the
> URI spec.
>
> #g

	Andy

Received on Thursday, 19 July 2012 14:59:08 UTC