Re: IRIs from Sandro Hawke on 2007-04-16 (public-rif-wg@w3.org from April 2007)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 16 Apr 2007 15:10:03 -0400
To: kifer@cs.sunysb.edu (Michael Kifer)
Cc: Jeremy Carroll <jjc@hpl.hp.com>, Dave Reynolds <der@hplb.hpl.hp.com>, Christian de Sainte Marie <csma@ilog.fr>, RIF WG <public-rif-wg@w3.org>
Message-Id: <20070416191031.41D664EFDE@homer.w3.org>

> The following might be a naive question due to my inadequate familiarity
> with RFCs.
> 
> Are symbols like ~ allowed in IRIs? My understanding is that only
> a-z, A-Z, 0-9, ., -, *, and _ are allowed as is and the rest are encoded.
> So, since ~ is supposed to be encoded, something like
> 
>     http://www.cs.sunysb.edu/~kifer/

Tilde (~) is allowed in URIs.   

      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

[ http://www.ietf.org/rfc/rfc3986.txt ] 

I'm not quite sure what you're getting at.   If I don't address it
below, maybe try an example other than "~".

> Or, am I wrong and the %-encodings mean the same as in IRIs as they do in U=
> RIs?

Percent-encodings mean the same things in IRIs an URIs.

Percent-encodings are one of several things that can potentially
complicate using URIs as identifiers.  Another is case.  Are these two
URIs the same?

     http://www.w3.org
     http://WWW.W3.ORG 

The domain name system is defined to be case-insensitive, so in some
sense those two URIs have to mean the same thing.  But if all Semantic
Web software was supposed to know all the rules like that, it would be
crazy.

RFC 3986 (URIs) says:

| 6.  Normalization and Comparison
| 
|    One of the most common operations on URIs is simple comparison:
|    determining whether two URIs are equivalent without using the URIs to
|    access their respective resource(s).  A comparison is performed every
|    time a response cache is accessed, a browser checks its history to
|    color a link, or an XML parser processes tags within a namespace.
|    Extensive normalization prior to comparison of URIs is often used by
|    spiders and indexing engines to prune a search space or to reduce
|    duplication of request actions and response storage.
| 
|    URI comparison is performed for some particular purpose.  Protocols
|    or implementations that compare URIs for different purposes will
|    often be subject to differing design trade-offs in regards to how
|    much effort should be spent in reducing aliased identifiers.  This
|    section describes various methods that may be used to compare URIs,
|    the trade-offs between them, and the types of applications that might
|    use them.


It then talks about a "Comparison Ladder" from simple string comparison
on to more and more sophisticated ways one might be able to tell two
URIs are equivalent.  In RDF and related specifications, the choice has
been to stay on the bottom rung and just treat the identifiers as opaque
strings.

     -- Sandro

Received on Monday, 16 April 2007 19:10:33 UTC