Re: IRI regex quiz!

[btw, what's the original list this quiz was published?
In the future, if something is IRI-related, please copy
public-iri@w3.org from the start.]

At 18:58 06/01/23, Bjoern Hoehrmann wrote:
 >
 >* Jeremy Carroll wrote:
 >>It seems to me that there are many other constraints in RFC 3987 that
 >>are not captured by this regex. e.g. no bidi control chars. e.g.
 >>constraints on r-t-l chars. e.g. the IDNA area where correct
 >>implementation seems to require scheme specific knowledge ....
 >
 >It's a straight translation from the BNF with the lower-case hex digit
 >error added. At least, I think it's a straight translation, there might
 >well be bugs in the translator. If there are any constraints that could
 >be expressed in ABNF but aren't part of the spec, that's a flaw in the
 >specification really. One such flaw seems to be that %xx escapes in the
 >(i)reg-name component are not constrained to be legal UTF-8 sequences.

Modularity and readability are often important for a specification.
If done well, this may also lead to better implementations. Of course,
there are always tradeoffs, and any comment or suggestion on this
topic (ideally with actual ABNF rules and/or text) is always
welcome.

In the specific case at hand, repeating the definition of UTF-8 byte
sequences, cooked down to %HH form, in the URI spec, would in my view
only have been a waste of space. The IETF has it's definition of
UTF-8, which is referenced.

Trying to stuff everything and anything into the ABNF would also
create the impression that the rest of the text is irrelevant.
That would be dangerous indeed.


 >Things that might change can't easily be captured here, and regarding
 >the reguirements for specific schemes, well, I know RFC 3987 requires
 >that these are met, but as most schemes do not allow non-ascii
 >characters, I'm not sure what the actual requirement might be. Perhaps
 >RFC 3987 defines this by now though.

Yes, it does. You cite the relevant paragraph yourself:

 >>>>
Date: Mon, 23 Jan 2006 16:28:16 +0100
Message-ID: <ses9t19pce9pcbvcdhobnu3gmmdvetfqfb@hive.bjoern.hoehrmann.de>
References: <mnlcs1hm9vabepsmndkddnqadq6vb4v2nb@hive.bjoern.hoehrmann.de> 
<43D4A42B.2070104@hpl.hp.com> 
<ta99t1t5oiq5a7cq8jupdugt47k31a57a9@hive.bjoern.hoehrmann.de> 
<43D4E484.27D9@xyzzy.claranet.de>
In-Reply-To: <43D4E484.27D9@xyzzy.claranet.de>

It also says

   Scheme-specific restrictions are applied to IRIs by converting
   IRIs to URIs and checking the URIs against the scheme-specific
   restrictions.
 >>>>


 >For things that could be expressed in the ABNF of RFC 3987 but are not
 >currently, I would appreciate if a proposal is made to change the ABNF
 >to fully express the constraints.

Why don't you go ahead and make such a proposal? Ideally, it would be
modularized so that it addresses different issues separately. And please
make sure you send it to public-iri@w3.org, the list for such discussions.


Regards,   Martin. 

Received on Tuesday, 24 January 2006 04:49:24 UTC