- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 19 Jul 2012 15:58:34 +0100
- To: public-rdf-comments@w3.org
On 19/07/12 11:57, Graham Klyne wrote: > With reference to > http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-IRIs > > And in particular to the note: > [[ > Previous versions of RDF used the term RDF URI Reference instead of > IRI and allowed additional characters: <, >, {, }, |, \, > ^, `, (double quote), and (space). In IRIs, these characters > must be percent-encoded as described in section 2.1 of [URI]. > ]] > > I have a concern that this change may lead to incompatibility with > deployed software, and consequent failure of interoperability. > > Currently, the W3C RDF validator, Python rdflib and Jena libraries all > allow and/or generate RDF with URIs that contain unescaped spaces (and > presumably other characters). You can create technically legal unserializable graphs in RDF-2004 (spaces in properties) and restrict serialization possibilities (clases in class names). Jena does not make any guarantees for RDF URI Reference with spaces in - specifically writing then reading in again may generate lots of warnings. We *strongly* discourage their use. They are not legal in SPARQL 1.0 queries. They were an unfortunate effect of RDF-2004 going ahead before IRIs were finalised (or some I'm told - I wasn't there - Graham was). The only place that allows them in any spec is in "RDF URI Reference" - it is time to fix on using IRIs in all semantic web specs and drop "RDF URI Reference". > This note suggests that spaces (and other characters) must be %-escaped > before being serialized into RDF 1.1, where current practice, as far as > I can discern, is to assume that RDF carries character-encoded URIs that > are not %-escaped. Nit pick: encode != escape. The string "abc\ndef" really does have a raw LF in it and is 7 chars long. The string "abc%0Adef" does not and is 9 chars long. > For the following, I assume that RDF 1.1 is intending to say that spaces > MUST be %-escaped in URIs used as RDF node identifiers. > > Suppose I implement a service that accepts RDF from its clients. What > is it to do about URIs containing unescaped spaces? > > If it rejects them as ill-formed, then it fails compatibility with > existing clients that provide RDF 1.0 compatible data. But do they occur intentionally in the wild? I only ever see them occurring by accident. > If it applies %-escaping to non-URI-valid characters, this will result > in double-escaping of RDF data from RDF 1.1 clients, something that > RFC3986 says must be guarded against > (http://tools.ietf.org/html/rfc3986#section-2.4), and may fail to > recognize as equal URIs that should be equal provided by RDV 1.0 and RDF > 1.1 clients. > > **** > > And a nit: the [IRI] reference in this document actually links to the > URI spec. > > #g Andy
Received on Thursday, 19 July 2012 14:59:08 UTC