Re: IRI regex quiz!

* Jeremy Carroll wrote:
>It seems to me that there are many other constraints in RFC 3987 that 
>are not captured by this regex. e.g. no bidi control chars. e.g. 
>constraints on r-t-l chars. e.g. the IDNA area where correct 
>implementation seems to require scheme specific knowledge ....

It's a straight translation from the BNF with the lower-case hex digit
error added. At least, I think it's a straight translation, there might
well be bugs in the translator. If there are any constraints that could
be expressed in ABNF but aren't part of the spec, that's a flaw in the
specification really. One such flaw seems to be that %xx escapes in the
(i)reg-name component are not constrained to be legal UTF-8 sequences.

Things that might change can't easily be captured here, and regarding
the reguirements for specific schemes, well, I know RFC 3987 requires
that these are met, but as most schemes do not allow non-ascii
characters, I'm not sure what the actual requirement might be. Perhaps
RFC 3987 defines this by now though.

For things that could be expressed in the ABNF of RFC 3987 but are not
currently, I would appreciate if a proposal is made to change the ABNF
to fully express the constraints. This would then help a lot with con-
formance testing of resource identifiers, e.g. in the Markup Validator
at <http://validator.w3.org>.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Monday, 23 January 2006 09:58:18 UTC