- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Mon, 23 Jan 2006 13:37:00 -0800
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: public-iri@w3.org
Bjoern Hoehrmann wrote: > Hi, > > I made the following regular expression for the IRI token as defined > in RFC 3987 but I think I made a mistake? Who can spot it? It seems to me that there are many other constraints in RFC 3987 that are not captured by this regex. e.g. no bidi control chars. e.g. constraints on r-t-l chars. e.g. the IDNA area where correct implementation seems to require scheme specific knowledge .... As to your question, an obvious mistake in the first line is %[0-9A-F]{2} should read as %[0-9A-Fa-f]{2} or possibly as (?:%[0-9A-F]{2}|(%[0-9A-Fa-f]{2})) with some sort of warning condition if the capturing group matches. A further area of doubt in the first line is allowing the "-" in the scheme name. RFC 2717 reserves "-" in scheme names for non-IANA trees, and as far as I can tell none have been registered, and until there is such a registration - should not be found in a scheme name. (No provision is made in RFC 3986 for the use of schemes that do not conform with RFC 2717). Registration of "x-" as an alternative private use tree would seem to make sense to me. Jeremy > (You have to > remove all white-space to make it a legal expression of course). > > [A-Za-z][-+.0-9A-Za-z]*:(?:/(?:(?:%[0-9A-F]{2}|[^\x00-\x20"#%/<>?[- > ^`{-}\x7F-\x9F\x{D800}-\x{F8FF}\x{FDD0}-\x{FDEF}\x{FFF0}-\x{FFFF}\x > {1FFFE}\x{1FFFF}\x{2FFFE}\x{2FFFF}\x{3FFFE}\x{3FFFF}\x{4FFFE}\x{4FF > FF}\x{5FFFE}\x{5FFFF}\x{6FFFE}\x{6FFFF}\x{7FFFE}\x{7FFFF}\x{8FFFE}\ > x{8FFFF}\x{9FFFE}\x{9FFFF}\x{AFFFE}\x{AFFFF}\x{BFFFE}\x{BFFFF}\x{CF > FFE}\x{CFFFF}\x{DFFFE}-\x{E0FFF}\x{EFFFE}-\x{10FFFF}])+(?:/(?:%[0-9 > A-F]{2}|[^\x00-\x20"#%/<>?[-^`{-}\x7F-\x9F\x{D800}-\x{F8FF}\x{FDD0} > -\x{FDEF}\x{FFF0}-\x{FFFF}\x{1FFFE}\x{1FFFF}\x{2FFFE}\x{2FFFF}\x{3F > FFE}\x{3FFFF}\x{4FFFE}\x{4FFFF}\x{5FFFE}\x{5FFFF}\x{6FFFE}\x{6FFFF} > \x{7FFFE}\x{7FFFF}\x{8FFFE}\x{8FFFF}\x{9FFFE}\x{9FFFF}\x{AFFFE}\x{A > FFFF}\x{BFFFE}\x{BFFFF}\x{CFFFE}\x{CFFFF}\x{DFFFE}-\x{E0FFF}\x{EFFF > E}-\x{10FFFF}])*)*)?|(?:%[0-9A-F]{2}|[^\x00-\x20"#%/<>?[-^`{-}\x7F- > \x9F\x{D800}-\x{F8FF}\x{FDD0}-\x{FDEF}\x{FFF0}-\x{FFFF}\x{1FFFE}\x{ > 1FFFF}\x{2FFFE}\x{2FFFF}\x{3FFFE}\x{3FFFF}\x{4FFFE}\x{4FFFF}\x{5FFF > E}\x{5FFFF}\x{6FFFE}\x{6FFFF}\x{7FFFE}\x{7FFFF}\x{8FFFE}\x{8FFFF}\x > {9FFFE}\x{9FFFF}\x{AFFFE}\x{AFFFF}\x{BFFFE}\x{BFFFF}\x{CFFFE}\x{CFF > FF}\x{DFFFE}-\x{E0FFF}\x{EFFFE}-\x{10FFFF}])*|(?://(?:(?:%[0-9A-F]{ > 2}|[^\x00-\x20"#%/<>-@[-^`{-}\x7F-\x9F\x{D800}-\x{F8FF}\x{FDD0}-\x{ > FDEF}\x{FFF0}-\x{FFFF}\x{1FFFE}\x{1FFFF}\x{2FFFE}\x{2FFFF}\x{3FFFE} > \x{3FFFF}\x{4FFFE}\x{4FFFF}\x{5FFFE}\x{5FFFF}\x{6FFFE}\x{6FFFF}\x{7 > FFFE}\x{7FFFF}\x{8FFFE}\x{8FFFF}\x{9FFFE}\x{9FFFF}\x{AFFFE}\x{AFFFF > }\x{BFFFE}\x{BFFFF}\x{CFFFE}\x{CFFFF}\x{DFFFE}-\x{E0FFF}\x{EFFFE}-\ > x{10FFFF}])*@)?(?:\[(?:(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0 > -9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|(?:(?:[0-9A- > F]{1,4}:){6}|::(?:[0-9A-F]{1,4}:){5}|(?:[0-9A-F]{1,4})?::(?:[0-9A-F > ]{1,4}:){4}|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4} > :){3}|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){ > 2}|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::[0-9A-F]{1,4}:|(?:(?:[ > 0-9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}| > (?:(?:1[0-9]{2}|2(?:[0-4][0-9]|5[0-5])|(?:|[1-9])[0-9])\.){3}(?:1[0 > -9]{2}|2(?:[0-4][0-9]|5[0-5])|(?:|[1-9])[0-9]))|v[0-9A-F]+\.[!$&-.0 > -;=A-Z_a-z~]+)]|(?:(?:1[0-9]{2}|2(?:[0-4][0-9]|5[0-5])|(?:|[1-9])[0 > -9])\.){3}(?:1[0-9]{2}|2(?:[0-4][0-9]|5[0-5])|(?:|[1-9])[0-9])|(?:% > [0-9A-F]{2}|[^\x00-\x20"#%/:<>-@[-^`{-}\x7F-\x9F\x{D800}-\x{F8FF}\x > {FDD0}-\x{FDEF}\x{FFF0}-\x{FFFF}\x{1FFFE}\x{1FFFF}\x{2FFFE}\x{2FFFF > }\x{3FFFE}\x{3FFFF}\x{4FFFE}\x{4FFFF}\x{5FFFE}\x{5FFFF}\x{6FFFE}\x{ > 6FFFF}\x{7FFFE}\x{7FFFF}\x{8FFFE}\x{8FFFF}\x{9FFFE}\x{9FFFF}\x{AFFF > E}\x{AFFFF}\x{BFFFE}\x{BFFFF}\x{CFFFE}\x{CFFFF}\x{DFFFE}-\x{E0FFF}\ > x{EFFFE}-\x{10FFFF}])*)(?::[0-9]*)?|(?:%[0-9A-F]{2}|[^\x00-\x20"#%/ > <>?[-^`{-}\x7F-\x9F\x{D800}-\x{F8FF}\x{FDD0}-\x{FDEF}\x{FFF0}-\x{FF > FF}\x{1FFFE}\x{1FFFF}\x{2FFFE}\x{2FFFF}\x{3FFFE}\x{3FFFF}\x{4FFFE}\ > x{4FFFF}\x{5FFFE}\x{5FFFF}\x{6FFFE}\x{6FFFF}\x{7FFFE}\x{7FFFF}\x{8F > FFE}\x{8FFFF}\x{9FFFE}\x{9FFFF}\x{AFFFE}\x{AFFFF}\x{BFFFE}\x{BFFFF} > \x{CFFFE}\x{CFFFF}\x{DFFFE}-\x{E0FFF}\x{EFFFE}-\x{10FFFF}])+)(?:/(? > :%[0-9A-F]{2}|[^\x00-\x20"#%/<>?[-^`{-}\x7F-\x9F\x{D800}-\x{F8FF}\x > {FDD0}-\x{FDEF}\x{FFF0}-\x{FFFF}\x{1FFFE}\x{1FFFF}\x{2FFFE}\x{2FFFF > }\x{3FFFE}\x{3FFFF}\x{4FFFE}\x{4FFFF}\x{5FFFE}\x{5FFFF}\x{6FFFE}\x{ > 6FFFF}\x{7FFFE}\x{7FFFF}\x{8FFFE}\x{8FFFF}\x{9FFFE}\x{9FFFF}\x{AFFF > E}\x{AFFFF}\x{BFFFE}\x{BFFFF}\x{CFFFE}\x{CFFFF}\x{DFFFE}-\x{E0FFF}\ > x{EFFFE}-\x{10FFFF}])*)*)(?:\?(?:%[0-9A-F]{2}|[^\x00-\x20"#%<>[-^`{ > -}\x7F-\x9F\x{D800}-\x{DFFF}\x{FDD0}-\x{FDEF}\x{FFF0}-\x{FFFF}\x{1F > FFE}\x{1FFFF}\x{2FFFE}\x{2FFFF}\x{3FFFE}\x{3FFFF}\x{4FFFE}\x{4FFFF} > \x{5FFFE}\x{5FFFF}\x{6FFFE}\x{6FFFF}\x{7FFFE}\x{7FFFF}\x{8FFFE}\x{8 > FFFF}\x{9FFFE}\x{9FFFF}\x{AFFFE}\x{AFFFF}\x{BFFFE}\x{BFFFF}\x{CFFFE > }\x{CFFFF}\x{DFFFE}-\x{E0FFF}\x{EFFFE}\x{EFFFF}\x{FFFFE}\x{FFFFF}\x > {10FFFE}\x{10FFFF}])*)?(?:#(?:%[0-9A-F]{2}|[^\x00-\x20"#%<>[-^`{-}\ > x7F-\x9F\x{D800}-\x{F8FF}\x{FDD0}-\x{FDEF}\x{FFF0}-\x{FFFF}\x{1FFFE > }\x{1FFFF}\x{2FFFE}\x{2FFFF}\x{3FFFE}\x{3FFFF}\x{4FFFE}\x{4FFFF}\x{ > 5FFFE}\x{5FFFF}\x{6FFFE}\x{6FFFF}\x{7FFFE}\x{7FFFF}\x{8FFFE}\x{8FFF > F}\x{9FFFE}\x{9FFFF}\x{AFFFE}\x{AFFFF}\x{BFFFE}\x{BFFFF}\x{CFFFE}\x > {CFFFF}\x{DFFFE}-\x{E0FFF}\x{EFFFE}-\x{10FFFF}])*)? > > Thanks,
Received on Monday, 23 January 2006 21:37:05 UTC