Re: IRI regex quiz!

* Frank Ellermann wrote:
>Bjoern Hoehrmann wrote:
>
>> I'm not sure what the actual requirement might be. Perhaps
>> RFC 3987 defines this by now though.
>
>You lost me here.  3987 explains how to transform an IRI into
>an URI.  Something like (legacy ->) NFC -> UTF-8 followed by
>further processing for the "authority" part using IDNA.
>
>But it does not say "any URI with %C0 is invalid, because %C0
>can't be UTF-8".  

There are two issues here,

  http://bj%f6rn.example.org/
  http://example.org/~björn/

The former is not allowed per RFC 3986 and RFC 3987 but matches the ABNF
grammar of both; the latter is not allowed per RFC 2396, RFC 2616, RFC
3986, but allowed per ABNF and prose of RFC 3987 except that RFC 3987
requires in the prose to meet the constraints in RFC 2616, e.g.

  When stored or transmitted in digital representation, bidirectional
  IRIs MUST be in full logical order and MUST conform to the IRI syntax
  rules (which includes the rules relevant to their scheme).
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It also says

  Scheme-specific restrictions are applied to IRIs by converting
  IRIs to URIs and checking the URIs against the scheme-specific
  restrictions. 

This goes along with several occurences of the term "IRI scheme" in RFC
3987, one of which says 'here is no such thing as an "IRI scheme"' which
makes the other occurences of this term look odd. I'm not sure yet what
to make of this. I agree that at the moment http://example.org/%C0 is
not illegal per any RFC though.

>> I would appreciate if a proposal is made to change the ABNF
>> to fully express the constraints.
>
>There are no constraints on general URIs in addition to STD 66,
>anything more depends on the scheme.  A scheme could restrict
>e.g. the path to "MUST be percent-encoded UTF-8", and then any
>%C0 is an error.  I don't see how a 3987bis DS could do more
>than it does now.  Did I drop a ball or miss a clue somewhere ?

I said the ABNF, not the specification. The ABNF does not capture that 
http://bj%f6rn.example.org/ is not allowed; it's not allowed though and
it seems this could be expressed in the ABNF. I'm not sure about the
other issues Jeremy mentioned.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Monday, 23 January 2006 15:27:45 UTC