- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Tue, 24 Jan 2006 10:36:37 +0900
- To: Bjoern Hoehrmann <derhoermi@gmx.net>, Jeremy Carroll <jjc@hpl.hp.com>
- Cc: public-iri@w3.org
[btw, what's the original list this quiz was published? In the future, if something is IRI-related, please copy public-iri@w3.org from the start.] At 18:58 06/01/23, Bjoern Hoehrmann wrote: > >* Jeremy Carroll wrote: >>It seems to me that there are many other constraints in RFC 3987 that >>are not captured by this regex. e.g. no bidi control chars. e.g. >>constraints on r-t-l chars. e.g. the IDNA area where correct >>implementation seems to require scheme specific knowledge .... > >It's a straight translation from the BNF with the lower-case hex digit >error added. At least, I think it's a straight translation, there might >well be bugs in the translator. If there are any constraints that could >be expressed in ABNF but aren't part of the spec, that's a flaw in the >specification really. One such flaw seems to be that %xx escapes in the >(i)reg-name component are not constrained to be legal UTF-8 sequences. Modularity and readability are often important for a specification. If done well, this may also lead to better implementations. Of course, there are always tradeoffs, and any comment or suggestion on this topic (ideally with actual ABNF rules and/or text) is always welcome. In the specific case at hand, repeating the definition of UTF-8 byte sequences, cooked down to %HH form, in the URI spec, would in my view only have been a waste of space. The IETF has it's definition of UTF-8, which is referenced. Trying to stuff everything and anything into the ABNF would also create the impression that the rest of the text is irrelevant. That would be dangerous indeed. >Things that might change can't easily be captured here, and regarding >the reguirements for specific schemes, well, I know RFC 3987 requires >that these are met, but as most schemes do not allow non-ascii >characters, I'm not sure what the actual requirement might be. Perhaps >RFC 3987 defines this by now though. Yes, it does. You cite the relevant paragraph yourself: >>>> Date: Mon, 23 Jan 2006 16:28:16 +0100 Message-ID: <ses9t19pce9pcbvcdhobnu3gmmdvetfqfb@hive.bjoern.hoehrmann.de> References: <mnlcs1hm9vabepsmndkddnqadq6vb4v2nb@hive.bjoern.hoehrmann.de> <43D4A42B.2070104@hpl.hp.com> <ta99t1t5oiq5a7cq8jupdugt47k31a57a9@hive.bjoern.hoehrmann.de> <43D4E484.27D9@xyzzy.claranet.de> In-Reply-To: <43D4E484.27D9@xyzzy.claranet.de> It also says Scheme-specific restrictions are applied to IRIs by converting IRIs to URIs and checking the URIs against the scheme-specific restrictions. >>>> >For things that could be expressed in the ABNF of RFC 3987 but are not >currently, I would appreciate if a proposal is made to change the ABNF >to fully express the constraints. Why don't you go ahead and make such a proposal? Ideally, it would be modularized so that it addresses different issues separately. And please make sure you send it to public-iri@w3.org, the list for such discussions. Regards, Martin.
Received on Tuesday, 24 January 2006 04:49:24 UTC