- From: Adam M. Costello BOGUS address, see signature <BOGUS@BOGUS.nicemice.net>
- Date: Thu, 19 Feb 2004 11:50:16 +0000
- To: uri@w3.org, public-iri@w3.org
Graham Klyne <GK@ninebynine.org> wrote: > > The rule can be: percent-encoding is allowed everywhere except the > > scheme, and individual schemes cannot make exceptions to this rule. > > I think that's pretty close to what we have (if permitted > normalization is taken into account - section 6.2.2.2). I was talking about IRIs there, not URIs. [This thread was cross-posted to both mailing lists early on, when it was talking about both URIs and IRIs, but it has progressively shifted attention more toward IRIs. I forgot to manually fix the Reply-To: field in my last message to point to both lists, so your message went only to the URI list. I've restored the former To: and Reply-To: fields for this message, but someone please tell me if it's time to stop the cross-posting.] URIs already have a legacy of schemes that prohibit percent-encoding in some components (all schemes that cite RFC-2396 and contain a hostname component prohibit percent-encoding in the hostname). The main point I was trying to make in the quotation above is that the IRI spec should avoid that pitfall by explicitly requiring all IRI consumers to expect percent-encoding in all components (except the scheme component) of all schemes. Individual schemes should not be able to ban percent-encoding anywhere in IRIs. > > If an individual scheme restricts a component to contain only ASCII > > characters, then scheme-specific IRI consumers would be required > > to check the component before using it, and fail gracefully if any > > non-ASCII characters are found. > > > > That's much simpler, requiring only one bit of knowledge about the > > syntax of the component (whether it allows non-ASCII). > > this is about *generic* URI syntax, and I'm currently implementing a > *generic* URI parser. How am I supposed to know whether a particular > scheme restricts a particular component in any particular way? Please note the phrase "scheme-specific IRI consumers" in the quoted passage. The proposed rule is not about URI parsers at all, nor is it about generic IRI parsers; it is about scheme-specific IRI parsers. (It's too late to add a similar requirement for scheme-specific URI consumers, because there's already a huge installed base.) The proposed rule, in other words, is that if you're going to use scheme-specific knowledge to use a component in a scheme-specific way, then you must first use your scheme-specific knowledge to check for the occurrence of non-ASCII characters where they don't belong. On the other hand, if you have no scheme-specific knowledge then you're incapable of performing that check, but that's okay because you're also incapable of doing what the check prevents: feeding the component to a scheme-specific ASCII-assuming operation. The only operations you know are generic IRI-component operations, all of which are designed to handle non-ASCII. AMC http://www.nicemice.net/amc/
Received on Thursday, 19 February 2004 06:50:18 UTC