Re: [url] Requests for Feedback (was Feedback from TPAC)

See also
  https://gist.github.com/mnot/138549

Fwiw

Sent from my iPhone

> On 23 Dec 2014, at 2:01 am, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> 
> * Julian Reschke wrote:
>>> On 2014-12-22 14:43, Sam Ruby wrote:
>>> If there is a program I can use to mechanically check for RFC 3986
>>> compliance and shows how a given URI is to be interpreted (scheme, host,
>>> path, query, fragment, etc.), I'll gladly update my results.
>> 
>> RFC 3986 has a regexp that's expected to parse valid URIs consistent 
>> with the ABNF; see 
>> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B>.
>> 
>> To change that to a validity checker, we probably just need to restrict 
>> the character classes so that non-ASCII characters never match.
> 
> http://www.websitedev.de/temp/rfc3986-check.html.gz has a mechanically
> generated DFA for all productions used in RFC 3986 that should be easy
> to extract. https://github.com/hoehrmann/demo-parselov does, too, and
> also gives you a correct parse tree; if `example` is a UTF-8 encoded
> file containing `http://:?`, then
> 
>  % node demo-parselov.js rfc3986.URI.json.gz example -json
> 
> would print
> 
>  ["URI", [
>    ["scheme", [
>      ["ALPHA", [], 0, 1],
>      ["ALPHA", [], 1, 2],
>      ["ALPHA", [], 2, 3],
>      ["ALPHA", [], 3, 4]], 0, 4],
>    ["hier-part", [
>      ["authority", [
>        ["host", [
>          ["reg-name", [], 7, 7]], 7, 7],
>        ["port", [], 8, 8]], 7, 8],
>      ["path-abempty", [], 8, 8]], 5, 8],
>    ["query", [], 9, 9]], 0, 9]
> 
> I can make a data file for `URI-reference` and IRIs if anyone wants
> to use this in its current work-in-progress form.
> -- 
> Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
> D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de
> Available for hire in Berlin (early 2015)  · http://www.websitedev.de/ 

Received on Monday, 22 December 2014 23:37:19 UTC