Re: RDF 1.1 IRIs and %-escaping

Graham,

On 21 Aug 2012, at 12:39, Graham Klyne wrote:
> While I support the thrust and direction of the change to %-escaping of spaces, I think you're being too dismissive of the (possible) legacy base.  

Where is this legacy base? So far, it is purely hypothetical.

> Given that the current RDF specification does allow spaces in RDF URI references, I think it would be appropriate to invoke the Postel Principle ("be conservative in what you do, be liberal in what you accept from others").

The Robustness Principle is a guideline for implementations, not standards.

> In this case, I'd suggest something like:
> [[
> RDF applications SHOULD generate IRIs in which spaces are %-encoded, but MAY accept RDF containing RDF URI references in which spaces are not %-escaped.
> ]]

The effect would be that some implementations accept unescaped spaces and pass them on to the next system in the chain, which may or may not halt and catch fire. It just makes it harder to predict at what point in the processing chain things blow up. This seems unwise.

By the way, RDF Concepts doesn't say that applications have to reject IRIs with spaces. It only says that data with IRIs that contain spaces is not RDF. RDF Concepts does not prescribe how applications handle non-RDF (or broken RDF) data. How to make broken RDF interoperable is a fascinating and important topic (I've spent half a year building parsers and extractors that ingest HTML tag soup), but in no way a topic that's ripe for standardization, and certainly not in this WG.

>> 1. RDF 2004 is the only standard on the planet that allows spaces in URIs/IRIs
> 
> http://www.w3.org/TR/xmlschema-2/#anyURI

You mean this bit?

[[
Note:  Spaces are, in principle, allowed in the ·lexical space· of anyURI, however, their use is highly discouraged (unless they are encoded by %20).
]]

>> 2. This is the result of historical accident and WG timing (the IRI spec was still in draft status, and the 2004 RDF-WG thought it important to remain compatible with early IRI drafts)
> 
> It may be, but there is now code that is based on this historical accident.  I came across this as a problem precisely because I was working with an RDF library, trying to follow specifications in good faith, and ended up generating URIs with spaces that are rejected by other applications.

How is this relevant? This other application is surely not an RDF 1.1 application but an RDF 2004 application. Interoperability for URIs with spaces is poor in RDF 2004 implementations, we know that.

Your proposed course of action -- continue to allow spaces in URIs -- will not fix your interoperability problem, as we already know that this design has never been consistently implemented, and is unlikely to be in the future.

The WG's intended course of action -- following the IRI standard -- may actually improve the situation in the future by moving us towards a situation where spaces in IRIs are consistently rejected / not produced.

>> 5. Not a single instance where spaces in URIrefs are used intentionally has been shown
> 
> I did this in a mapping from file names to URIs.  (I can change my code, but it counts as a single instance, and others may have done something similar.)

Ok, that's one instance.

And yeah I've seen situations where file names were simply used verbatim as URIs more than once, but those are generally not intentionally, and will fail for other reasons (e.g., backslashes).

>> 7. The RDF-WG charter asks the WG to update RDF to follow SPARQL with regard to IRIs
> 
> The charter also says: "any valid RDF graphs (in terms of the RDF 2004 version) should remain valid in terms of a new version of RDF" and "Care should be taken to not jeopardize exisiting RDF deployment efforts and adoption. In case of doubt, the guideline should be not to include a feature in the set of additions if doing so might raise backward compatibility issues."

I don't read this as forbidding fixes of corner cases, especially where these corner cases already won't work with the SPARQL Recommendations.

Best,
Richard

Received on Wednesday, 22 August 2012 20:17:00 UTC