Re: RDF 1.1 IRIs and %-escaping

+cc: Gio

On 19 July 2012 16:58, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
>
>
> On 19/07/12 11:57, Graham Klyne wrote:
>>
>> With reference to
>> http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-IRIs
>>
>> And in particular to the note:
>> [[
>> Previous versions of RDF used the term “RDF URI Reference” instead of
>> “IRI” and allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”,
>> “^”, “`”, ‘“’ (double quote), and “ ” (space). In IRIs, these characters
>> must be percent-encoded as described in section 2.1 of [URI].
>> ]]
>>
>> I have a concern that this change may lead to incompatibility with
>> deployed software, and consequent failure of interoperability.
>>
>> Currently, the W3C RDF validator, Python rdflib and Jena libraries all
>> allow and/or generate RDF with URIs that contain unescaped spaces (and
>> presumably other characters).
>
>
> You can create technically legal unserializable graphs in RDF-2004 (spaces
> in properties) and restrict serialization possibilities (clases in class
> names).
>
> Jena does not make any guarantees for RDF URI Reference with spaces in -
> specifically writing then reading in again may generate lots of warnings.
>
> We *strongly* discourage their use.

Anecdotally, in my experience, the RDF community have avoided URIs in
spaces. Even if the 2004 specs allow them as a theoretical
possibility.

I wonder if we could substantiate this - e.g. with crawled LOD/RDF
data. I've Cc:'d Giovanni from Sindice here. Could Sindice be used to
check how many RDF triples/documents/datasets deployed spaces in their
URIs? Although perhaps such information is lost during
parsing/normalisation?

Dan

Received on Tuesday, 7 August 2012 11:54:58 UTC