- From: Richard Tobin <richard@inf.ed.ac.uk>
- Date: Tue, 26 Jun 2007 10:31:50 +0100 (BST)
- To: public-xml-core-wg@w3.org
As Martin quite rightly points out, many of the characters allowed in HRRIs but not in IRIs are really poor choices, and though we have to allow them for compatibility with the existing specs, we should certainly discourage their use. We also have to add various characters pointed out by Martin to the list of characters that must be escaped. We should perhaps also list explicitly the non-characters such as surrogates that cannot occur in HRRIs. These are of course not allowed by XML, but we don't want to make the definition of HRRI depend on the definition of XML. I suggest we add three productions, "reasonable", "unreasonable", and "disallowed", listing these characters. What follows should replace the current list of characters to be %-encoded. reasonable = #x20 | "<" | ">" | #x22 | "{" | "}" | "|" | "\" | "^" | "`" unreasonable = #x0 - #x1F | /* C0 controls */ #x7F - #x9F | /* DEL and C1 controls */ #xE000 - #xF8FF | /* private use */ #xFDD0 - #xFDEF | /* non-characters */ #x1FFFE - #x1FFFF | /* non-characters */ ... #x10FFFE - #x10FFFF | /* non-characters */ #xE0000 - #xE0FFF | /* tags - I don't understand these */ #xF0000 - #xFFFFD | /* private use */ #x100000 - #x10FFFD /* private use */ disallowed = #xD800 - #xDFFF | /* surrogates */ #xFFFE | #xFFFF The disallowed characters must not occur in HRRIs. The reasonable and unreasonable characters may, though they may be unavailable for other reasons - for example, #x0 is not allowed in XML. The use of the unreasonable character is discouraged, and their use may have security implications. To convert an HRRI to an IRI reference, the reasonable and unreasonable characters must be %-encoded, except for private use characters appearing in the query part. -- Richard
Received on Tuesday, 26 June 2007 09:32:04 UTC