Re: Use of '[' and ']' in URIs [was RE: issues for tomorrow]

In general, if a URI has a ":" in it, and there is no "/" before the ":" 
then the left hand side of the ":" is the scheme name, which is highly 
restricted. Thus we have a range of characters that could be used on the 
left hand side of ":" which force a CURIE to not be a URI.

Jeremy


Mark Birbeck wrote:
> Hi Ben,
> 
>> If you have time before the telecon to prepare these, this is 
>> just a reminder of your two pending actions:
>>
>> [NEW] ACTION: Mark investigate authoritative specifications for '['  
>> as a URI character
> 
> Yes, I posted some comments into the IRC at the end of the last call, and
> then myself and Steven carried on talking about it afterwards.
> 
> The story is this: When I originally suggested their use for escaping
> CURIEs, I was reading an old URI specification [1]. '[' and ']' used to be
> discouraged from use (see Page 10):
> 
>   Other characters are excluded because gateways and other transport
>   agents are known to sometimes modify such characters, or they are
>   used as delimiters.
> 
>   unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
> 
> 
> However, Steven pointed out to me that I should be looking at a newer spec
> [2], and in this one both characters are allowed [3]:
> 
>   reserved    = gen-delims / sub-delims
> 
>   gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
> 
>   sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
>               / "*" / "+" / "," / ";" / "="
> 
> The general approach is that these characters play the role of delimiters in
> *all* URI schemes, and if there is any possibility of confusion they should
> be percent-encoded.
> 
> However, on closer inspection, the only reason they are allowed is to allow
> IPv6 values to be specified, so the only legal place that they can appear is
> in the authority part. This means that you will never get a valid URI with a
> square bracket at the beginning. This is discussed in section 3.2.2, Host
> [4]:
> 
>   A host identified by an Internet Protocol literal address, version 6
>   [RFC3513] or later, is distinguished by enclosing the IP literal within
>   square brackets ("[" and "]"). This is the only place where square
>   bracket characters are allowed in the URI syntax. 
> 
> Regards,
> 
> Mark
> 
> [1] http://www.ietf.org/rfc/rfc2396.txt
> [2] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html
> [3] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#reserved
> [4] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#host
> 
> Mark Birbeck
> CEO
> x-port.net Ltd.
> 
> e: Mark.Birbeck@x-port.net
> t: +44 (0) 20 7689 9232
> w: http://www.formsPlayer.com/
> 
> Download our XForms processor from
> http://www.formsPlayer.com/
> 
> 
> 

Received on Tuesday, 22 November 2005 16:43:10 UTC