Re: IRIs

Michael Kifer wrote:
> Thanks. I think this answers my question.
> My concern was that there might be an IRI, x, such that its encoding as a URI,
> f(x), is not equivalent to x *as an IRI*.
> You seems to be saying that this is not possible.

Sandro's Kanji example illustrates that this is possible. If an IRI i 
isn't itself a URI then the URI encoding of it must be different. Unless 
you specify some normalization f(i) and i are different.

We are free to specify some normalization step (N) such as the URI->IRI 
mapping which will remove such aliases so that:
    N(i) == N(f(i))

I don't think we should do this. As Sandro points out, in RDF et al the 
minimal amount of normalization is specified for ease of implementation 
and I think we should be compatible. We simply want to use IRIs as 
identifiers and be able to write them in source files such as XML in a 
convenient way.

In terms of experience in practice then Jena has supported IRIs for many 
years (thanks to Jeremy), which it had to to meet the RDF specs. As 
someone who spends much time on our support list I do see them being 
used. Whilst there are sometimes support issues with the XML input side 
(specifying the character encoding), and with whether spaces are 
allowed, I have never seen a case where someone %-encoded their IRI to 
make it look like a URI and then expected it to compare to a 
non-%-encoded "equivalent".

Dave
-- 
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England

> In this case I indeed see no reason why we shouldn't be using IRIs.
> 
> 
> 	--michael  
> 
> 
>>> The following might be a naive question due to my inadequate familiarity
>>> with RFCs.
>>>
>>> Are symbols like ~ allowed in IRIs? My understanding is that only
>>> a-z, A-Z, 0-9, ., -, *, and _ are allowed as is and the rest are encoded.
>>> So, since ~ is supposed to be encoded, something like
>>>
>>>     http://www.cs.sunysb.edu/~kifer/
>> Tilde (~) is allowed in URIs.   
>>
>>       unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
>>
>> [ http://www.ietf.org/rfc/rfc3986.txt ] 
>>
>> I'm not quite sure what you're getting at.   If I don't address it
>> below, maybe try an example other than "~".
>>
>>> Or, am I wrong and the %-encodings mean the same as in IRIs as they do in U=
>>> RIs?
>> Percent-encodings mean the same things in IRIs an URIs.
>>
>> Percent-encodings are one of several things that can potentially
>> complicate using URIs as identifiers.  Another is case.  Are these two
>> URIs the same?
>>
>>      http://www.w3.org
>>      http://WWW.W3.ORG 
>>
>> The domain name system is defined to be case-insensitive, so in some
>> sense those two URIs have to mean the same thing.  But if all Semantic
>> Web software was supposed to know all the rules like that, it would be
>> crazy.
>>
>> RFC 3986 (URIs) says:
>>
>> | 6.  Normalization and Comparison
>> | 
>> |    One of the most common operations on URIs is simple comparison:
>> |    determining whether two URIs are equivalent without using the URIs to
>> |    access their respective resource(s).  A comparison is performed every
>> |    time a response cache is accessed, a browser checks its history to
>> |    color a link, or an XML parser processes tags within a namespace.
>> |    Extensive normalization prior to comparison of URIs is often used by
>> |    spiders and indexing engines to prune a search space or to reduce
>> |    duplication of request actions and response storage.
>> | 
>> |    URI comparison is performed for some particular purpose.  Protocols
>> |    or implementations that compare URIs for different purposes will
>> |    often be subject to differing design trade-offs in regards to how
>> |    much effort should be spent in reducing aliased identifiers.  This
>> |    section describes various methods that may be used to compare URIs,
>> |    the trade-offs between them, and the types of applications that might
>> |    use them.
>>
>>
>> It then talks about a "Comparison Ladder" from simple string comparison
>> on to more and more sophisticated ways one might be able to tell two
>> URIs are equivalent.  In RDF and related specifications, the choice has
>> been to stay on the bottom rung and just treat the identifiers as opaque
>> strings.
>>
>>      -- Sandro
>>
> 
> 
> 

Received on Tuesday, 17 April 2007 08:09:15 UTC