- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Mon, 16 Apr 2007 12:48:59 +0100
- To: Michael Kifer <kifer@cs.sunysb.edu>
- CC: Dave Reynolds <der@hplb.hpl.hp.com>, Christian de Sainte Marie <csma@ilog.fr>, RIF WG <public-rif-wg@w3.org>
Yes: IRIs are a superset of URIs. Supporting text below. On the question of character sets the difference is as follows: [[ A Uniform Resource Identifier (URI) is a compact sequence of characters ]] [1] and [[ A URI is a sequence of characters from a very limited set: the letters of the basic Latin alphabet, digits, and a few special characters. ]] [1] versus [[ An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). ]] [2] i.e. both are simply a sequence of characters (i.e. abstract letters) the definition of 'character' is given in BCP 19 [[ A member of a set of elements used for the organization, control, or representation of data. ]] [3] The set of letters used for URIs is a subset of that used for IRIs (and a small subset!) Neither specification (RFC 3986 URIs, or RFC 3987 IRIs) requires any specific encoding of such characters. As is, any sequence of characters from the URI set, when encoded in US-ASCII come to a sequence of bytes. When the same sequence is encoded as UTF-8 it comes to the same sequence of bytes. So even at the binary level, the typical use of both specifications is compatible. On the more general question of the relationship between the two: Supporting text: ================= 1.1 [[ This document defines a new protocol element called Internationalized Resource Identifier (IRI) by extending the syntax of URIs to a much wider repertoire of characters. ]] [2] [[ 2.1. Summary of IRI Syntax IRIs are defined similarly to URIs in [RFC 3986], but the class of unreserved characters is extended by adding the characters of the UCS (Universal Character Set, [ISO10646]) beyond U+007F, subject to the limitations given in the syntax rules below and in section 6.1. Otherwise, the syntax and use of components and reserved characters is the same as that in [RFC 3986]. ]] [2] A detailed study of the rules in section 2.2 shows that this goal is achieved, and the "limitations" do not contradict the fact that all URIs are IRIs. Jeremy [1] http://rfc.net/rfc3986.txt [2] http://rfc.net/rfc3987.txt [3] http://rfc.net/bcp19.html Note: I understand that the chairs are minded to not yet table this issue for discussion. If it is contentious then that is understandable. I expect Dave will prod me again when they do. I strongly support the use of IRIs. Michael Kifer wrote: >> 1. They are a superset of URIs and specifying the superset seems like >> the safe default course. If someone especially wanted a dialect with >> syntactic restriction to URIs then they could add that restriction in >> the dialect. > > Can somebody give a synopsis of URI vs. IRI? > On the surface, it seems that IRIs are a superset, but > in the last telecon I asked if this is true and somebody (forgot who) said > that they aren't because IRIs use unicode and uris ascii. > > In any case, I made some small changes along the lines of what was > discussed, which states that rif:uri can be a uri or a iri. Also, I > proposed to the chairs (I think somebody also mentioned this at the > telecon) to call this thing rif:resource. The issue whether it will be a > uri or an iri can be decided later. If uris are a subset of iris then > deciding either way for now (provided that we call it rif:resource) will be > acceptable and can be changed later. If one is not a subset of the other > then still the decision can be changed later without major consequences. > > > --michael > -- Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England
Received on Monday, 16 April 2007 11:49:46 UTC