Re: IRIs from Dave Reynolds on 2007-04-17 (public-rif-wg@w3.org from April 2007)

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Tue, 17 Apr 2007 13:56:09 +0100
To: Michael Kifer <kifer@cs.sunysb.edu>
CC: Sandro Hawke <sandro@w3.org>, Jeremy Carroll <jjc@hpl.hp.com>, Christian de Sainte Marie <csma@ilog.fr>, RIF WG <public-rif-wg@w3.org>
Message-ID: <4624C3E9.8040609@hplb.hpl.hp.com>

Michael Kifer wrote:
>> Michael Kifer wrote:
>>>> Michael Kifer wrote:
>>>>> Thanks. I think this answers my question.
>>>>> My concern was that there might be an IRI, x, such that its encoding as a URI,
>>>>> f(x), is not equivalent to x *as an IRI*.
>>>>> You seems to be saying that this is not possible.
>>>> Sandro's Kanji example illustrates that this is possible. If an IRI i 
>>>> isn't itself a URI then the URI encoding of it must be different. Unless 
>>>> you specify some normalization f(i) and i are different.
>>> Of course they are different. I was talking of them being *equivalent*.
>> RFC3987 uses the term "different" as the negation of "equivalent":
>>
>> [[[ (Section 5.1)
>>     For this reason, determination
>>     of equivalence or difference of IRIs is based on string comparison,
>>     perhaps augmented by reference to additional rules provided by URI
>>     scheme definitions.  We use the terms "different" and "equivalent" to
>>     describe the possible outcomes of such comparisons, but there are
>>     many application-dependent versions of equivalence.
>> ]]]
>>
>> and goes on to say:
>>
>> [[[
>>     Applications using IRIs as identity tokens with no relationship to a
>>     protocol MUST use the Simple String Comparison (see section 5.3.1).
>> ]]]
>>
>> I claim RIF usage, like RDF usage, would fall under this clause and so 
>> we would not specify any additional normalization step or RIF-specific 
>> notion of equivalence. Hence the properties described in Jeremy's 
>> semi-parallel email apply.
> 
> OK. But it wasn't clear to me from Sandro's email that encoded URIs when
> viewed as IRIs are non-equivalent to the original. I think he said just the
> opposite. 

Clearly this depends on the definition of equivalence, which in the 
specs means which level of the "comparison ladder" you use. The specs 
allow this choice to be application dependent in general except for the 
MUST clause that I noted above. This freedom is a potential source of 
confusion.

If we were to choose a different normalization level, e.g. Syntax-Based 
Normalization (RFC3987#5.3.2 and RFC3986#6.2.2, including the 
percent-encoding normalization) then you could arrange that enc(x) ~ x 
holds but the URI/IRI nesting properties that Jeremy lists would also 
still hold.

However, that would be more work for implementers, more work for us in 
writing the spec, diverge from RDF causing potential interoperation 
problems and would violate the quoted MUST clause.

> And so did Jeremy as far as I understand.
> 
> In any case, are you proposing that if we have an IRI, x, and its encoded
> form, enc(x), then x and enc(x) would be allowed to point to different
> resources? I am slightly uncomfortable with this, but could live with it.

Yes for some value of "point to".

In RDF terms if I have a non-URI IRI x and enc is the IRI to URI mapping 
function then x and enc(x) are different RDF resources. So for example:
    x p v .
does not entail:
    enc(x) p v .

I'm proposing that this should carry over into RIF.

However, if x is an http: scheme IRI and I dereference it on the web 
(e.g. do an HTTP GET) then x and enc(x) should resolve to same address 
and return the same representation. Clearly in general it is possible 
for two different URIs to resolve to the same resource, there's no 
syntactic way of avoiding this.

Dave
-- 
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England

Received on Tuesday, 17 April 2007 12:56:38 UTC