Re: RDF 1.1 IRIs and %-escaping

Dan,

Sindice uses any23, which automatically converts spaces in URIs to %20. (I believe it's actually Sesame doing that.) So Sindice never sees URIs with spaces.

Best,
Richard


On 7 Aug 2012, at 12:54, Dan Brickley wrote:

> +cc: Gio
> 
> On 19 July 2012 16:58, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
>> 
>> 
>> On 19/07/12 11:57, Graham Klyne wrote:
>>> 
>>> With reference to
>>> http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-IRIs
>>> 
>>> And in particular to the note:
>>> [[
>>> Previous versions of RDF used the term “RDF URI Reference” instead of
>>> “IRI” and allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”,
>>> “^”, “`”, ‘“’ (double quote), and “ ” (space). In IRIs, these characters
>>> must be percent-encoded as described in section 2.1 of [URI].
>>> ]]
>>> 
>>> I have a concern that this change may lead to incompatibility with
>>> deployed software, and consequent failure of interoperability.
>>> 
>>> Currently, the W3C RDF validator, Python rdflib and Jena libraries all
>>> allow and/or generate RDF with URIs that contain unescaped spaces (and
>>> presumably other characters).
>> 
>> 
>> You can create technically legal unserializable graphs in RDF-2004 (spaces
>> in properties) and restrict serialization possibilities (clases in class
>> names).
>> 
>> Jena does not make any guarantees for RDF URI Reference with spaces in -
>> specifically writing then reading in again may generate lots of warnings.
>> 
>> We *strongly* discourage their use.
> 
> Anecdotally, in my experience, the RDF community have avoided URIs in
> spaces. Even if the 2004 specs allow them as a theoretical
> possibility.
> 
> I wonder if we could substantiate this - e.g. with crawled LOD/RDF
> data. I've Cc:'d Giovanni from Sindice here. Could Sindice be used to
> check how many RDF triples/documents/datasets deployed spaces in their
> URIs? Although perhaps such information is lost during
> parsing/normalisation?
> 
> Dan
> 

Received on Tuesday, 7 August 2012 21:14:55 UTC