- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Mon, 22 Dec 2014 16:01:04 +0100
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Sam Ruby <rubys@intertwingly.net>, Mark Nottingham <mnot@mnot.net>, Larry Masinter <masinter@adobe.com>, "public-ietf-w3c@w3.org" <public-ietf-w3c@w3.org>, Wendy Seltzer <wseltzer@w3.org>, Philippe Le Hégaret <plh@w3.org>, Barry Leiba <barryleiba@computer.org>, Pete Resnick <presnick@qualcomm.com>, Daniel Appelquist <appelquist@gmail.com>
* Julian Reschke wrote:
>On 2014-12-22 14:43, Sam Ruby wrote:
>> If there is a program I can use to mechanically check for RFC 3986
>> compliance and shows how a given URI is to be interpreted (scheme, host,
>> path, query, fragment, etc.), I'll gladly update my results.
>
>RFC 3986 has a regexp that's expected to parse valid URIs consistent
>with the ABNF; see
><http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B>.
>
>To change that to a validity checker, we probably just need to restrict
>the character classes so that non-ASCII characters never match.
http://www.websitedev.de/temp/rfc3986-check.html.gz has a mechanically
generated DFA for all productions used in RFC 3986 that should be easy
to extract. https://github.com/hoehrmann/demo-parselov does, too, and
also gives you a correct parse tree; if `example` is a UTF-8 encoded
file containing `http://:?`, then
% node demo-parselov.js rfc3986.URI.json.gz example -json
would print
["URI", [
["scheme", [
["ALPHA", [], 0, 1],
["ALPHA", [], 1, 2],
["ALPHA", [], 2, 3],
["ALPHA", [], 3, 4]], 0, 4],
["hier-part", [
["authority", [
["host", [
["reg-name", [], 7, 7]], 7, 7],
["port", [], 8, 8]], 7, 8],
["path-abempty", [], 8, 8]], 5, 8],
["query", [], 9, 9]], 0, 9]
I can make a data file for `URI-reference` and IRIs if anyone wants
to use this in its current work-in-progress form.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de
Available for hire in Berlin (early 2015) · http://www.websitedev.de/
Received on Monday, 22 December 2014 15:01:47 UTC