- From: Mark Nottingham <mnot@mnot.net>
- Date: Tue, 23 Dec 2014 14:07:52 -0500
- To: Sam Ruby <rubys@intertwingly.net>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, "public-ietf-w3c@w3.org" <public-ietf-w3c@w3.org>
Interesting. Happy to update where there are mismatches to the abnf, of course.
At first glance, it appears like a lot of the valid URI/invalid URL outcomes are because url LS is doing scheme-specific processing; is that the case? (Currently working with limited net access + heavy jet lag)
Sent from my iPhone
> On 23 Dec 2014, at 1:48 pm, Sam Ruby <rubys@intertwingly.net> wrote:
>
>
>
>> On 12/23/2014 09:42 AM, Bjoern Hoehrmann wrote:
>> * Sam Ruby wrote:
>>>> On 12/22/2014 06:36 PM, Mark Nottingham wrote:
>>>> See also
>>>> https://gist.github.com/mnot/138549
>>>
>>> Thanks!
>>>
>>> Here's a result I didn't expect:
>>>
>>>> python uri_validate.py "http://user:password@example.com/"
>>>> testing: "http://user:password@example.com/"
>>>> URI: no
>>>> URI reference: no
>>>> Absolute URI: no
>>>
>>> Am I doing something wrong?
>>
>> Given
>>
>> # reg-name = *( unreserved / pct-encoded / sub-delims )
>> reg_name = r"(?: %(unreserved)s | %(pct_encoded)s | %(sub_delims)s )*" % locals()
>>
>> # userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
>> userinfo = r"(?: %(unreserved)s | %(pct_encoded)s | %(sub_delims)s | : )" % locals()
>>
>> I would say there is a missing `*` at the end of `userinfo`.
>
> Thanks!
>
> I believe I found a second problem with the following:
>
>> # IPv6address = 6( h16 ":" ) ls32
>> # / "::" 5( h16 ":" ) ls32
>> # / [ h16 ] "::" 4( h16 ":" ) ls32
>> # / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
>> # / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
>> # / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32
>> # / [ *4( h16 ":" ) h16 ] "::" ls32
>> # / [ *5( h16 ":" ) h16 ] "::" h16
>> # / [ *6( h16 ":" ) h16 ] "::"
>> IPv6address = r"""(?: (?: %(h16)s : ){6} %(ls32)s |
>> :: (?: %(h16)s : ){5} %(ls32)s |
>> %(h16)s :: (?: %(h16)s : ){4} %(ls32)s |
>> (?: %(h16)s : ) %(h16)s :: (?: %(h16)s : ){3} %(ls32)s |
>> (?: %(h16)s : ){2} %(h16)s :: (?: %(h16)s : ){2} %(ls32)s |
>> (?: %(h16)s : ){3} %(h16)s :: %(h16)s : %(ls32)s |
>> (?: %(h16)s : ){4} %(h16)s :: %(ls32)s |
>> (?: %(h16)s : ){5} %(h16)s :: %(h16)s |
>> (?: %(h16)s : ){6} %(h16)s ::
>> )
>> """ % locals()
>
> I believe the *n notation in ABNF corresponds to {0,n} in regular expressions. A corrected version:
>
>> IPv6address = r"""(?: (?: %(h16)s : ){6} %(ls32)s |
>> :: (?: %(h16)s : ){5} %(ls32)s |
>> %(h16)s :: (?: %(h16)s : ){4} %(ls32)s |
>> (?: %(h16)s : ){0,1} %(h16)s :: (?: %(h16)s : ){3} %(ls32)s |
>> (?: %(h16)s : ){0,2} %(h16)s :: (?: %(h16)s : ){2} %(ls32)s |
>> (?: %(h16)s : ){0,3} %(h16)s :: %(h16)s : %(ls32)s |
>> (?: %(h16)s : ){0,4} %(h16)s :: %(ls32)s |
>> (?: %(h16)s : ){0,5} %(h16)s :: %(h16)s |
>> (?: %(h16)s : ){0,6} %(h16)s ::
>> )
>> """ % locals()
>
> I've made these changes and updated the results:
>
> http://intertwingly.net/tmp/urlvsuri.html
>
> Looking at the first section, I believe that the URL Standard should change so that the conformance criteria for URLs becomes a strict subset of the URI validity criteria. Accordingly, I've opened the following bug report:
>
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=27687
>
> - Sam Ruby
Received on Tuesday, 23 December 2014 19:08:25 UTC