- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Mon, 22 Dec 2014 16:01:04 +0100
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Sam Ruby <rubys@intertwingly.net>, Mark Nottingham <mnot@mnot.net>, Larry Masinter <masinter@adobe.com>, "public-ietf-w3c@w3.org" <public-ietf-w3c@w3.org>, Wendy Seltzer <wseltzer@w3.org>, Philippe Le Hégaret <plh@w3.org>, Barry Leiba <barryleiba@computer.org>, Pete Resnick <presnick@qualcomm.com>, Daniel Appelquist <appelquist@gmail.com>
* Julian Reschke wrote: >On 2014-12-22 14:43, Sam Ruby wrote: >> If there is a program I can use to mechanically check for RFC 3986 >> compliance and shows how a given URI is to be interpreted (scheme, host, >> path, query, fragment, etc.), I'll gladly update my results. > >RFC 3986 has a regexp that's expected to parse valid URIs consistent >with the ABNF; see ><http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B>. > >To change that to a validity checker, we probably just need to restrict >the character classes so that non-ASCII characters never match. http://www.websitedev.de/temp/rfc3986-check.html.gz has a mechanically generated DFA for all productions used in RFC 3986 that should be easy to extract. https://github.com/hoehrmann/demo-parselov does, too, and also gives you a correct parse tree; if `example` is a UTF-8 encoded file containing `http://:?`, then % node demo-parselov.js rfc3986.URI.json.gz example -json would print ["URI", [ ["scheme", [ ["ALPHA", [], 0, 1], ["ALPHA", [], 1, 2], ["ALPHA", [], 2, 3], ["ALPHA", [], 3, 4]], 0, 4], ["hier-part", [ ["authority", [ ["host", [ ["reg-name", [], 7, 7]], 7, 7], ["port", [], 8, 8]], 7, 8], ["path-abempty", [], 8, 8]], 5, 8], ["query", [], 9, 9]], 0, 9] I can make a data file for `URI-reference` and IRIs if anyone wants to use this in its current work-in-progress form. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de Available for hire in Berlin (early 2015) · http://www.websitedev.de/
Received on Monday, 22 December 2014 15:01:47 UTC