- From: Michael Kifer <kifer@cs.sunysb.edu>
- Date: Tue, 17 Apr 2007 04:30:50 -0400
- To: Dave Reynolds <der@hplb.hpl.hp.com>
- Cc: Sandro Hawke <sandro@w3.org>, Jeremy Carroll <jjc@hpl.hp.com>, Christian de Sainte Marie <csma@ilog.fr>, RIF WG <public-rif-wg@w3.org>
> Michael Kifer wrote: > > Thanks. I think this answers my question. > > My concern was that there might be an IRI, x, such that its encoding as a URI, > > f(x), is not equivalent to x *as an IRI*. > > You seems to be saying that this is not possible. > > Sandro's Kanji example illustrates that this is possible. If an IRI i > isn't itself a URI then the URI encoding of it must be different. Unless > you specify some normalization f(i) and i are different. Of course they are different. I was talking of them being *equivalent*. We may or may not want to introduce an equality relation on such equivalent IRIs (note: equality != identity), but regardless of that if a ~ b as URIs and not as IRIs then using IRIs as an extension of URIs would be problematic. If this doesn't happen then I see no problem. --michael > We are free to specify some normalization step (N) such as the URI->IRI > mapping which will remove such aliases so that: > N(i) == N(f(i)) > > I don't think we should do this. As Sandro points out, in RDF et al the > minimal amount of normalization is specified for ease of implementation > and I think we should be compatible. We simply want to use IRIs as > identifiers and be able to write them in source files such as XML in a > convenient way. > > In terms of experience in practice then Jena has supported IRIs for many > years (thanks to Jeremy), which it had to to meet the RDF specs. As > someone who spends much time on our support list I do see them being > used. Whilst there are sometimes support issues with the XML input side > (specifying the character encoding), and with whether spaces are > allowed, I have never seen a case where someone %-encoded their IRI to > make it look like a URI and then expected it to compare to a > non-%-encoded "equivalent". > > Dave > -- > Hewlett-Packard Limited > Registered Office: Cain Road, Bracknell, Berks RG12 1HN > Registered No: 690597 England > > > In this case I indeed see no reason why we shouldn't be using IRIs. > > > > > > --michael > > > > > >>> The following might be a naive question due to my inadequate familiarity > >>> with RFCs. > >>> > >>> Are symbols like ~ allowed in IRIs? My understanding is that only > >>> a-z, A-Z, 0-9, ., -, *, and _ are allowed as is and the rest are encoded. > >>> So, since ~ is supposed to be encoded, something like > >>> > >>> http://www.cs.sunysb.edu/~kifer/ > >> Tilde (~) is allowed in URIs. > >> > >> unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" > >> > >> [ http://www.ietf.org/rfc/rfc3986.txt ] > >> > >> I'm not quite sure what you're getting at. If I don't address it > >> below, maybe try an example other than "~". > >> > >>> Or, am I wrong and the %-encodings mean the same as in IRIs as they do in U= > >>> RIs? > >> Percent-encodings mean the same things in IRIs an URIs. > >> > >> Percent-encodings are one of several things that can potentially > >> complicate using URIs as identifiers. Another is case. Are these two > >> URIs the same? > >> > >> http://www.w3.org > >> http://WWW.W3.ORG > >> > >> The domain name system is defined to be case-insensitive, so in some > >> sense those two URIs have to mean the same thing. But if all Semantic > >> Web software was supposed to know all the rules like that, it would be > >> crazy. > >> > >> RFC 3986 (URIs) says: > >> > >> | 6. Normalization and Comparison > >> | > >> | One of the most common operations on URIs is simple comparison: > >> | determining whether two URIs are equivalent without using the URIs to > >> | access their respective resource(s). A comparison is performed every > >> | time a response cache is accessed, a browser checks its history to > >> | color a link, or an XML parser processes tags within a namespace. > >> | Extensive normalization prior to comparison of URIs is often used by > >> | spiders and indexing engines to prune a search space or to reduce > >> | duplication of request actions and response storage. > >> | > >> | URI comparison is performed for some particular purpose. Protocols > >> | or implementations that compare URIs for different purposes will > >> | often be subject to differing design trade-offs in regards to how > >> | much effort should be spent in reducing aliased identifiers. This > >> | section describes various methods that may be used to compare URIs, > >> | the trade-offs between them, and the types of applications that might > >> | use them. > >> > >> > >> It then talks about a "Comparison Ladder" from simple string comparison > >> on to more and more sophisticated ways one might be able to tell two > >> URIs are equivalent. In RDF and related specifications, the choice has > >> been to stay on the bottom rung and just treat the identifiers as opaque > >> strings. > >> > >> -- Sandro > >> > > > > > > > >
Received on Tuesday, 17 April 2007 08:38:17 UTC