Re: RDF-ISSUE-8 (IRI vs URI): Incorporate IRI-s into the RDF documents [Cleanup tasks] from David Wood on 2011-03-09 (public-rdf-wg@w3.org from March 2011)

From: David Wood <david.wood@talis.com>
Date: Wed, 9 Mar 2011 12:32:48 -0500
To: Mischa Tuffield <Mischa.tuffield@garlik.com>
Cc: Alex Hall <alexhall@revelytix.com>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <02141D9E-916D-4E95-8B94-385F4DEC54EF@talis.com>
On Mar 9, 2011, at 11:00, Mischa Tuffield wrote:

> 
> On 7 Mar 2011, at 20:35, Alex Hall wrote:
> 
>> On Mon, Mar 7, 2011 at 11:12 AM, Mischa Tuffield <mischa.tuffield@garlik.com> wrote:
>> Hello, 
>> 
>> <snip/>
>> 
>> On 5 Mar 2011, at 15:26, Pat Hayes wrote:
>> 
>>> 
>>> On Mar 5, 2011, at 5:19 AM, RDF Working Group Issue Tracker wrote:
>>> 
>>>> 
>>>> RDF-ISSUE-8 (IRI vs URI): Incorporate IRI-s into the RDF documents [Cleanup tasks]
>>>> 
>>>> http://www.w3.org/2011/rdf-wg/track/issues/8
>>>> 
>>>> Raised by: Ivan Herman
>>>> On product: Cleanup tasks
>>>> 
>>>> The IRI Spec[1] is from 2005, and it may be necessary to retrofit it to RDF. Eg, what is the relationship between "http://résumé.example.org" and "http://xn--rsum-bpad.example.org"? Are they the same resource or not? Note that SPARQL has something on that[2]...
>> 
>> Context matters here.  "http://xn--rsum-bpad.example.org" is the URI mapped from the IRI "http://résumé.example.org" but it is also a valid IRI in its own right (I think -- correct me if I'm wrong).  If you're dereferencing the resource to fetch its representation then I think you can safely conclude that those represent the same resource, but that decision is up to your application.
>> 
>> However, from the perspective of RDF semantics I think it would be wrong to put the burden on the implementer to consider normalization when computing term equality, graph equivalence, etc.  This is already an issue to some extent; see the note in RDF Concepts [1] that says: "Because of the risk of confusion between RDF URI references that would be equivalent if derefenced, the use of %-escaped characters in RDF URI references is strongly discouraged. See also the URI equivalence issue of the Technical Architecture Group."
>> 
>> Nowhere in either the RDF or SPARQL specs do I see anything that implies applications should normalize URIRefs when comparing them; they all seem to specify a simple string comparison of the URIRefs.  Likewise, I think that "http://xn--rsum-bpad.example.org" and "http://résumé.example.org" when taken as IRIs should be considered different terms/nodes/resources/whatever you want to call them.
> 
> Personally i don't think that the burden of normalising URIs should be on applications. What is key here from my POV is the ability to roundtrip RDF, I will explain what I mean by this. I would like to be certain that if I generate new triples in my triplestore using a SPARQL Update query, and that I can be certain to generate valid RDF including those triples using the CONSTRUCT verb. Otherwise things just get too confusing.
> 
> Given that SPARQL is currently in last call, it would be good to be able to unify what URI definitions are used in both the standard serialisations and in the query language. As a developer I would like to use only one library for generating URIs in my application, regardless of whether I am writing SPARQL or RDF. 


+1


> 
> I do have one question on the matter though, which I will look into if need be. I wonder whether all URIRefs are valid IRIs, in terms of backwards compatibility? And on this note, is this WG going to update the RDF/XML's definition of what a URI is?


We cannot update RDF/XML according to the charter, although changes to RDF semantics could theoretically impact its interpretation.

Regards,
Dave



> I hope that IRIs are a subset of URIRefs, that would make back compatibility a non-issue. Otherwise if we are in a world where RDF/XML uses URIRefs and Turtle IRI, I see that developers would never use RDF/XML and SPARQL combo, as again, it would require the use of two different URI encoding libraries. 
> 
>>> 
>>> SPARQL says "IRI (corresponds to the Concepts and Abstract Syntax term "RDF URI reference")"  
>> 
>> As far as I am aware, URI Ref definition came out before the RFC defining IRI. They are "pretty similar" insofar as the URIRef work was second guessing what IRIs would be, but they didn't managed to get it 100% correct. 
>> 
>>> 
>>> Is this strictly correct? That is, are IRIs in fact just URI references by another name? If not (as I suspect) can anyone briefly outline the points of difference?
>> 
>> No, they are not the same thing, the differences lie in terms of what characters get encoded and which don't. One example is the backtick character `, which doesn't need to be % encoded when creating an IRI but it does need to be when generating a URI Ref. I sent an email to the SWIG mailing list about this a while back [1], whereby people pointed out the history, and some of the subtle differences between the two. 
>> 
>> In addition to the encoding differences, note that the RFC defining IRIs (RFC3987) is based on a more recent URI definition (RFC3986).  However, RDF Concepts calls out an old definition of URI (RFC2396) when defining URIRefs.  Among other differences, this old definition does not allow percent-encoded characters in the host component, while IRIs and new-style URIs do allow internationalized domain names.  So there seems to be a whole class of IRIs that, strictly speaking, are not representable as RDF URIRefs under the current definition.  (My apologies if this has been re-hashed elsewhere, I'm somewhat new to this discussion.)
> 
> Exactly, and thanks for pointing to the relevant RFCs. 
> 
> Cheers, 
> 
> Mischa
> 
> 
>> 
>> -Alex
>> 
>> [1] http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref
>> 
>>  
>> 
>> Mischa
>> 
>> [1] http://lists.w3.org/Archives/Public/semantic-web/2010Jul/0426.html 
>> 
>>> 
>>> Pat
>>> 
>>>> 
>>>> [1] http://www.ietf.org/rfc/rfc3987.txt
>>>> [2] http://www.w3.org/TR/rdf-sparql-query/#docTerminology
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 or (650)494 3973   
>>> 40 South Alcaniz St.           (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> ___________________________________
>> Mischa Tuffield PhD
>> Email: mischa.tuffield@garlik.com
>> Homepage - http://mmt.me.uk/
>> Garlik Limited, 1-3 Halford Road, Richmond, TW10 6AW
>> +44(0)845 652 2824  http://www.garlik.com/
>> Registered in England and Wales 535 7233 VAT # 849 0517 11
>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>> 
>> 
> 
> ___________________________________
> Mischa Tuffield PhD
> Email: mischa.tuffield@garlik.com
> Homepage - http://mmt.me.uk/
> Garlik Limited, 1-3 Halford Road, Richmond, TW10 6AW
> +44(0)845 652 2824  http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>
Received on Wednesday, 9 March 2011 17:33:25 UTC