- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Tue, 22 Nov 2005 16:41:44 +0000
- To: Mark Birbeck <mark.birbeck@x-port.net>
- CC: "'Ben Adida'" <ben@MIT.EDU>, "'public-rdf-in-xhtml task force'" <public-rdf-in-xhtml-tf@w3.org>
In general, if a URI has a ":" in it, and there is no "/" before the ":" then the left hand side of the ":" is the scheme name, which is highly restricted. Thus we have a range of characters that could be used on the left hand side of ":" which force a CURIE to not be a URI. Jeremy Mark Birbeck wrote: > Hi Ben, > >> If you have time before the telecon to prepare these, this is >> just a reminder of your two pending actions: >> >> [NEW] ACTION: Mark investigate authoritative specifications for '[' >> as a URI character > > Yes, I posted some comments into the IRC at the end of the last call, and > then myself and Steven carried on talking about it afterwards. > > The story is this: When I originally suggested their use for escaping > CURIEs, I was reading an old URI specification [1]. '[' and ']' used to be > discouraged from use (see Page 10): > > Other characters are excluded because gateways and other transport > agents are known to sometimes modify such characters, or they are > used as delimiters. > > unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`" > > > However, Steven pointed out to me that I should be looking at a newer spec > [2], and in this one both characters are allowed [3]: > > reserved = gen-delims / sub-delims > > gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" > > sub-delims = "!" / "$" / "&" / "'" / "(" / ")" > / "*" / "+" / "," / ";" / "=" > > The general approach is that these characters play the role of delimiters in > *all* URI schemes, and if there is any possibility of confusion they should > be percent-encoded. > > However, on closer inspection, the only reason they are allowed is to allow > IPv6 values to be specified, so the only legal place that they can appear is > in the authority part. This means that you will never get a valid URI with a > square bracket at the beginning. This is discussed in section 3.2.2, Host > [4]: > > A host identified by an Internet Protocol literal address, version 6 > [RFC3513] or later, is distinguished by enclosing the IP literal within > square brackets ("[" and "]"). This is the only place where square > bracket characters are allowed in the URI syntax. > > Regards, > > Mark > > [1] http://www.ietf.org/rfc/rfc2396.txt > [2] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html > [3] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#reserved > [4] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#host > > Mark Birbeck > CEO > x-port.net Ltd. > > e: Mark.Birbeck@x-port.net > t: +44 (0) 20 7689 9232 > w: http://www.formsPlayer.com/ > > Download our XForms processor from > http://www.formsPlayer.com/ > > >
Received on Tuesday, 22 November 2005 16:43:10 UTC