Re: [url] Requests for Feedback (was Feedback from TPAC) from Mark Nottingham on 2014-12-25 (public-ietf-w3c@w3.org from December 2014)

From: Mark Nottingham <mnot@mnot.net>
Date: Thu, 25 Dec 2014 05:00:41 -0500
To: Sam Ruby <rubys@intertwingly.net>
Cc: "public-ietf-w3c@w3.org" <public-ietf-w3c@w3.org>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, "Julian F. Reschke" <julian.reschke@gmx.de>
Message-Id: <D76CBE0D-3D34-4671-96B0-9A872881D13C@mnot.net>

I’ve added http_URI and https_URI, sourced from RFC7230, to filter out some of these false positives.

https://gist.github.com/mnot/138549


> On 23 Dec 2014, at 2:47 pm, Sam Ruby <rubys@intertwingly.net> wrote:
> 
> On 12/23/2014 02:07 PM, Mark Nottingham wrote:
>> 
>> At first glance, it appears like a lot of the valid URI/invalid URL
>> outcomes are because url LS is doing scheme-specific processing; is
>> that the case? (Currently working with limited net access + heavy jet
>> lag)
> 
> That certainly explains a number of differences.  Additionally:
> 
> 1) There are cases that ABNF can't capture.  I tend to agree with Julian[1] that the ABNF should be treated as rough syntax only, and that additional constraints should be specified in prose.  That's effectively how the webplatform URL draft is structured[2].
> 
> 2) The URL LS is IDNA and Unicode more aware than RFC 3986 is.  Clearly, this is by design, but I will suggest that there is an important lesson to be learned by the effort to split out RFC 3987 into a separate RFC: I think that unintentionally had the effect of "ghettoizing" IRIs.  I might be misreading Martin, but perhaps that's why he suggested RFC 3986 errata as the way to handle bidi?[3]
> 
> - Sam Ruby
> 
> [1] http://lists.w3.org/Archives/Public/public-ietf-w3c/2014Dec/0079.html
> [2] https://specs.webplatform.org/url/webspecs/develop/#parsing-rules
> [3] http://www.ietf.org/mail-archive/web/apps-discuss/current/msg13516.html

--
Mark Nottingham   http://www.mnot.net/

Received on Thursday, 25 December 2014 10:01:09 UTC