Re: escaping % in RDF URI references

Jeremy:
> Personally my preference would be to follow Martin Durst's advice ...
[here
> at least :) ].

Brian:
> Are you suggesting soliciting further advice?

Yes - Martin any comments,

would it be better to go with our current text
[[
6.4 RDF URI References
A URI reference within an RDF graph (an RDF URI reference) is a Unicode
string [UNICODE] that would produce a valid URI character sequence (per
RFC2396 [URI], sections 2.1) representing an absolute URI with optional
fragment identifier when subjected to the encoding described below.

The encoding consists of:

1. encoding the Unicode string as UTF-8 [RFC-2279], giving a sequence of
octet values.
%-escaping octets that do not correspond to permitted US-ASCII characters.
2. The disallowed octets that must be %-escaped include all those that do
not correspond to US-ASCII characters, and the excluded characters listed in
Section 2.4 of [URI], except for the number sign (#), percent sign (%), and
the square bracket characters re-allowed in [RFC-2732].

Disallowed octets must be escaped with the URI escaping mechanism (that is,
converted to %HH, where HH is the 2-digit hexadecimal numeral corresponding
to the octet value).

Two RDF URI references are equal if and only if they compare as equal,
character by character, as Unicode strings.

Note: RDF URI references are compatible with the anyURI datatype as defined
by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather
than a relative URI reference.

Note: RDF URI references are compatible with International Resource
Identifiers as defined by [XML Namespaces 1.1].

Note: The restriction to absolute URI references is found in this abstract
syntax. When there is a well-defined base URI, concrete syntaxes, such as
RDF/XML, may permit relative URIs as a shorthand for such absolute URI
references.
]]

or text based on
http://www.w3.org/TR/xml-names11/#IRIs

[[
Work is currently in progress to produce an RFC defining Internationalized
Resource Identifiers (IRIs). Since this work is not yet complete, in this
section we give a syntactic definition of IRIs for the purposes of this
specification. We expect to issue an erratum replacing this section with a
reference to the RFC when it is published. Users defining namespaces are
advised to restrict namespace names to URIs until software supporting IRIs
is in common use.

For a more general definition and discussion of IRIs see [IRI draft] (work
in progress).

URI references are restricted to a subset of the ASCII characters; IRI
references allow some of the disallowed ASCII characters as well as most
Unicode characters from #xA0 onwards.

[Definition: The additional characters allowed in IRIs are: ]

+ space #x20

+ the delimiters < #x3C, > #x3E and " #x22

+ the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and ` #x60

+ the Unicode plane 0 characters #xA0 - #xD7FF, #xF900-#xFDCF, #xFDF0-#xFFEF

+ the Unicode plane 1-14 characters #x10000-#x1FFFD ... #xE000-#xEFFD

[Definition: An IRI reference is a string that can be converted to a URI
reference by escaping all additional characters as follows: ]

1. Each additional character is converted to UTF-8 [Unicode 3.2] as one or
more bytes.

2. The resulting bytes are escaped with the URI escaping mechanism (that is,
converted to %HH, where HH is the hexadecimal notation of the byte value).

The original character is replaced by the resulting character sequence.


]]

Noting that RDF Core WG has declined a comment suggesting using the term IRI
thoughout, so that the  definition would remain a definition of "RDF URI
references".

A specific question is ctrl characters - should they be allowed or not?

Jeremy

Received on Thursday, 11 September 2003 10:38:10 UTC