Re: What standards and implementations use IRIs?

Larry Masinter writes:

> I understand there are widespread implementations of URIs and 
> URI processing, but what other systems implement IRIs
> according to RFC 3987 terms?

In case it's of interest, the XML Schema 1.1 anyURI datatype [1] is 
(intended as) an IRI.  Among the pertinent parts of the CR specification 

[Definition:]   anyURI represents an Internationalized Resource Identifier 
Reference (IRI).  An anyURI value can be absolute or relative, and may 
have an optional fragment identifier (i.e., it may be an IRI Reference). 
This type should be used when the value fulfills the role of an IRI, as 
defined in [RFC 3987] or its successor(s) in the IETF Standards Track.


The ·lexical space· of anyURI is the set of finite-length sequences of 
zero or more characters (as defined in [XML]) that ·match· the Char 
production from [XML].

Note: For an anyURI value to be usable in practice as an IRI, the result 
of applying to it the algorithm defined in Section 3.1 of [RFC 3987] 
should be a string which is a legal URI according to [RFC 3986]. (This is 
true at the time this document is published; if in the future [RFC 3987] 
and [RFC 3986] are replaced by other specifications in the IETF Standards 
Track, the relevant constraints will be those imposed by those successor 

Each URI scheme imposes specialized syntax rules for URIs in that scheme, 
including restrictions on the syntax of allowed fragment identifiers. 
Because it is impractical for processors to check that a value is a 
context-appropriate URI reference, neither the syntactic constraints 
defined by the definitions of individual schemes nor the generic syntactic 
constraints defined by [RFC 3987] and [RFC 3986] and their successors are 
part of this datatype as defined here. Applications which depend on anyURI 
values being legal according to the rules of the relevant specifications 
should make arrangements to check values against the appropriate 
definitions of IRI, URI, and specific schemes.

Also, there is the following note about space characters.  Before getting 
too upset about it, please note that using this type for IRIs at all is a 
"should", and there is in fact no required normative content checking 
except that the characters match the XML Char production.  So, the 
following is not a backhanded way of saying that space characters >should< 
be allowed in IRIs;  rather it is an acknowledgement that, with no 
normative prohibition, a health warning is in order. 

Note: Spaces are, in principle, allowed in the ·lexical space· of anyURI, 
however, their use is highly discouraged (unless they are encoded by 



Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142

Larry Masinter <>
Sent by:
12/30/2009 02:02 PM
        To:     "Roy T. Fielding" <>
        cc:     "" <>, 
"" <>, (bcc: Noah 
        Subject:        What standards and implementations use IRIs?

I think it would be helpful if we could be more explicit about
which standards and implementations of IRIs are better served
by the current normative definition of IRIs vs. a more liberal
specification, closer to what browsers, operating systems, and
common URL-parsing libraries accept and process?

Outside of XML's LEIRI, which is itself an extension of what
RFC 3987 allows, or URL-parsing libraries, which seem to have
parameters or options letting the caller determine which
syntax they want to process against?

I understand there are widespread implementations of URIs
and URI processing, but what other systems implement IRIs
according to RFC 3987 terms?


Received on Monday, 11 January 2010 16:36:56 UTC