Re: IRIs from Michael Kifer on 2007-04-17 (public-rif-wg@w3.org from April 2007)

From: Michael Kifer <kifer@cs.sunysb.edu>
Date: Tue, 17 Apr 2007 04:30:50 -0400
To: Dave Reynolds <der@hplb.hpl.hp.com>
Cc: Sandro Hawke <sandro@w3.org>, Jeremy Carroll <jjc@hpl.hp.com>, Christian de Sainte Marie <csma@ilog.fr>, RIF WG <public-rif-wg@w3.org>
Message-ID: <30556.1176798650@cs.sunysb.edu>
> Michael Kifer wrote:
> > Thanks. I think this answers my question.
> > My concern was that there might be an IRI, x, such that its encoding as a URI,
> > f(x), is not equivalent to x *as an IRI*.
> > You seems to be saying that this is not possible.
> 
> Sandro's Kanji example illustrates that this is possible. If an IRI i 
> isn't itself a URI then the URI encoding of it must be different. Unless 
> you specify some normalization f(i) and i are different.

Of course they are different. I was talking of them being *equivalent*.

We may or may not want to introduce an equality relation on such equivalent
IRIs (note: equality != identity), but regardless of that if
a ~ b as URIs and not as IRIs then using IRIs as an extension of URIs would
be problematic. If this doesn't happen then I see no problem.


	--michael  


> We are free to specify some normalization step (N) such as the URI->IRI 
> mapping which will remove such aliases so that:
>     N(i) == N(f(i))
> 
> I don't think we should do this. As Sandro points out, in RDF et al the 
> minimal amount of normalization is specified for ease of implementation 
> and I think we should be compatible. We simply want to use IRIs as 
> identifiers and be able to write them in source files such as XML in a 
> convenient way.
> 
> In terms of experience in practice then Jena has supported IRIs for many 
> years (thanks to Jeremy), which it had to to meet the RDF specs. As 
> someone who spends much time on our support list I do see them being 
> used. Whilst there are sometimes support issues with the XML input side 
> (specifying the character encoding), and with whether spaces are 
> allowed, I have never seen a case where someone %-encoded their IRI to 
> make it look like a URI and then expected it to compare to a 
> non-%-encoded "equivalent".
> 
> Dave
> -- 
> Hewlett-Packard Limited
> Registered Office: Cain Road, Bracknell, Berks RG12 1HN
> Registered No: 690597 England
> 
> > In this case I indeed see no reason why we shouldn't be using IRIs.
> > 
> > 
> > 	--michael  
> > 
> > 
> >>> The following might be a naive question due to my inadequate familiarity
> >>> with RFCs.
> >>>
> >>> Are symbols like ~ allowed in IRIs? My understanding is that only
> >>> a-z, A-Z, 0-9, ., -, *, and _ are allowed as is and the rest are encoded.
> >>> So, since ~ is supposed to be encoded, something like
> >>>
> >>>     http://www.cs.sunysb.edu/~kifer/
> >> Tilde (~) is allowed in URIs.   
> >>
> >>       unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
> >>
> >> [ http://www.ietf.org/rfc/rfc3986.txt ] 
> >>
> >> I'm not quite sure what you're getting at.   If I don't address it
> >> below, maybe try an example other than "~".
> >>
> >>> Or, am I wrong and the %-encodings mean the same as in IRIs as they do in U=
> >>> RIs?
> >> Percent-encodings mean the same things in IRIs an URIs.
> >>
> >> Percent-encodings are one of several things that can potentially
> >> complicate using URIs as identifiers.  Another is case.  Are these two
> >> URIs the same?
> >>
> >>      http://www.w3.org
> >>      http://WWW.W3.ORG 
> >>
> >> The domain name system is defined to be case-insensitive, so in some
> >> sense those two URIs have to mean the same thing.  But if all Semantic
> >> Web software was supposed to know all the rules like that, it would be
> >> crazy.
> >>
> >> RFC 3986 (URIs) says:
> >>
> >> | 6.  Normalization and Comparison
> >> | 
> >> |    One of the most common operations on URIs is simple comparison:
> >> |    determining whether two URIs are equivalent without using the URIs to
> >> |    access their respective resource(s).  A comparison is performed every
> >> |    time a response cache is accessed, a browser checks its history to
> >> |    color a link, or an XML parser processes tags within a namespace.
> >> |    Extensive normalization prior to comparison of URIs is often used by
> >> |    spiders and indexing engines to prune a search space or to reduce
> >> |    duplication of request actions and response storage.
> >> | 
> >> |    URI comparison is performed for some particular purpose.  Protocols
> >> |    or implementations that compare URIs for different purposes will
> >> |    often be subject to differing design trade-offs in regards to how
> >> |    much effort should be spent in reducing aliased identifiers.  This
> >> |    section describes various methods that may be used to compare URIs,
> >> |    the trade-offs between them, and the types of applications that might
> >> |    use them.
> >>
> >>
> >> It then talks about a "Comparison Ladder" from simple string comparison
> >> on to more and more sophisticated ways one might be able to tell two
> >> URIs are equivalent.  In RDF and related specifications, the choice has
> >> been to stay on the bottom rung and just treat the identifiers as opaque
> >> strings.
> >>
> >>      -- Sandro
> >>
> > 
> > 
> > 
> 
>
Received on Tuesday, 17 April 2007 08:38:17 UTC