W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > November 2005

Use of '[' and ']' in URIs [was RE: issues for tomorrow]

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Tue, 22 Nov 2005 12:38:27 -0000
Message-ID: <12E2E288-1815-40E6-A1FD-4097B10C5816@s15.mail.x-port.net>
To: "'Ben Adida'" <ben@MIT.EDU>
Cc: "'public-rdf-in-xhtml task force'" <public-rdf-in-xhtml-tf@w3.org>

Hi Ben,

> If you have time before the telecon to prepare these, this is 
> just a reminder of your two pending actions:
> 
> [NEW] ACTION: Mark investigate authoritative specifications for '['  
> as a URI character

Yes, I posted some comments into the IRC at the end of the last call, and
then myself and Steven carried on talking about it afterwards.

The story is this: When I originally suggested their use for escaping
CURIEs, I was reading an old URI specification [1]. '[' and ']' used to be
discouraged from use (see Page 10):

  Other characters are excluded because gateways and other transport
  agents are known to sometimes modify such characters, or they are
  used as delimiters.

  unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"


However, Steven pointed out to me that I should be looking at a newer spec
[2], and in this one both characters are allowed [3]:

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

The general approach is that these characters play the role of delimiters in
*all* URI schemes, and if there is any possibility of confusion they should
be percent-encoded.

However, on closer inspection, the only reason they are allowed is to allow
IPv6 values to be specified, so the only legal place that they can appear is
in the authority part. This means that you will never get a valid URI with a
square bracket at the beginning. This is discussed in section 3.2.2, Host
[4]:

  A host identified by an Internet Protocol literal address, version 6
  [RFC3513] or later, is distinguished by enclosing the IP literal within
  square brackets ("[" and "]"). This is the only place where square
  bracket characters are allowed in the URI syntax. 

Regards,

Mark

[1] http://www.ietf.org/rfc/rfc2396.txt
[2] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html
[3] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#reserved
[4] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#host

Mark Birbeck
CEO
x-port.net Ltd.

e: Mark.Birbeck@x-port.net
t: +44 (0) 20 7689 9232
w: http://www.formsPlayer.com/

Download our XForms processor from
http://www.formsPlayer.com/
Received on Tuesday, 22 November 2005 12:38:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:15:00 GMT