w3c-rdfcore-wg@w3.org > September 2003

Re: escaping % in RDF URI references

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 11 Sep 2003 16:29:23 +0200
To: <w3c-rdfcore-wg@w3.org>
Cc: <w3c-i18n-ig@w3.org>

> Personally my preference would be to follow Martin Durst's advice ...
> at least :) ].

> Are you suggesting soliciting further advice?

Yes - Martin any comments,

would it be better to go with our current text
6.4 RDF URI References
A URI reference within an RDF graph (an RDF URI reference) is a Unicode
string [UNICODE] that would produce a valid URI character sequence (per
RFC2396 [URI], sections 2.1) representing an absolute URI with optional
fragment identifier when subjected to the encoding described below.

The encoding consists of:

1. encoding the Unicode string as UTF-8 [RFC-2279], giving a sequence of
octet values.
%-escaping octets that do not correspond to permitted US-ASCII characters.
2. The disallowed octets that must be %-escaped include all those that do
not correspond to US-ASCII characters, and the excluded characters listed in
Section 2.4 of [URI], except for the number sign (#), percent sign (%), and
the square bracket characters re-allowed in [RFC-2732].

Disallowed octets must be escaped with the URI escaping mechanism (that is,
converted to %HH, where HH is the 2-digit hexadecimal numeral corresponding
to the octet value).

Two RDF URI references are equal if and only if they compare as equal,
character by character, as Unicode strings.

Note: RDF URI references are compatible with the anyURI datatype as defined
by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather
than a relative URI reference.

Note: RDF URI references are compatible with International Resource
Identifiers as defined by [XML Namespaces 1.1].

Note: The restriction to absolute URI references is found in this abstract
syntax. When there is a well-defined base URI, concrete syntaxes, such as
RDF/XML, may permit relative URIs as a shorthand for such absolute URI

or text based on

Work is currently in progress to produce an RFC defining Internationalized
Resource Identifiers (IRIs). Since this work is not yet complete, in this
section we give a syntactic definition of IRIs for the purposes of this
specification. We expect to issue an erratum replacing this section with a
reference to the RFC when it is published. Users defining namespaces are
advised to restrict namespace names to URIs until software supporting IRIs
is in common use.

For a more general definition and discussion of IRIs see [IRI draft] (work
in progress).

URI references are restricted to a subset of the ASCII characters; IRI
references allow some of the disallowed ASCII characters as well as most
Unicode characters from #xA0 onwards.

[Definition: The additional characters allowed in IRIs are: ]

+ space #x20

+ the delimiters < #x3C, > #x3E and " #x22

+ the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and ` #x60

+ the Unicode plane 0 characters #xA0 - #xD7FF, #xF900-#xFDCF, #xFDF0-#xFFEF

+ the Unicode plane 1-14 characters #x10000-#x1FFFD ... #xE000-#xEFFD

[Definition: An IRI reference is a string that can be converted to a URI
reference by escaping all additional characters as follows: ]

1. Each additional character is converted to UTF-8 [Unicode 3.2] as one or
more bytes.

2. The resulting bytes are escaped with the URI escaping mechanism (that is,
converted to %HH, where HH is the hexadecimal notation of the byte value).

The original character is replaced by the resulting character sequence.


Noting that RDF Core WG has declined a comment suggesting using the term IRI
thoughout, so that the  definition would remain a definition of "RDF URI

A specific question is ctrl characters - should they be allowed or not?

Received on Thursday, 11 September 2003 10:38:10 UTC

