Re: [url] Requests for Feedback (was Feedback from TPAC)

* Julian Reschke wrote:
>On 2014-12-22 14:43, Sam Ruby wrote:
>> If there is a program I can use to mechanically check for RFC 3986
>> compliance and shows how a given URI is to be interpreted (scheme, host,
>> path, query, fragment, etc.), I'll gladly update my results.
>
>RFC 3986 has a regexp that's expected to parse valid URIs consistent 
>with the ABNF; see 
><http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B>.
>
>To change that to a validity checker, we probably just need to restrict 
>the character classes so that non-ASCII characters never match.

http://www.websitedev.de/temp/rfc3986-check.html.gz has a mechanically
generated DFA for all productions used in RFC 3986 that should be easy
to extract. https://github.com/hoehrmann/demo-parselov does, too, and
also gives you a correct parse tree; if `example` is a UTF-8 encoded
file containing `http://:?`, then

  % node demo-parselov.js rfc3986.URI.json.gz example -json

would print

  ["URI", [
    ["scheme", [
      ["ALPHA", [], 0, 1],
      ["ALPHA", [], 1, 2],
      ["ALPHA", [], 2, 3],
      ["ALPHA", [], 3, 4]], 0, 4],
    ["hier-part", [
      ["authority", [
        ["host", [
          ["reg-name", [], 7, 7]], 7, 7],
        ["port", [], 8, 8]], 7, 8],
      ["path-abempty", [], 8, 8]], 5, 8],
    ["query", [], 9, 9]], 0, 9]

I can make a data file for `URI-reference` and IRIs if anyone wants
to use this in its current work-in-progress form.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de
 Available for hire in Berlin (early 2015)  · http://www.websitedev.de/ 

Received on Monday, 22 December 2014 15:01:47 UTC