- From: Mark Nottingham <mnot@mnot.net>
- Date: Tue, 23 Dec 2014 10:36:53 +1100
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: Julian Reschke <julian.reschke@gmx.de>, Sam Ruby <rubys@intertwingly.net>, Larry Masinter <masinter@adobe.com>, "public-ietf-w3c@w3.org" <public-ietf-w3c@w3.org>, Wendy Seltzer <wseltzer@w3.org>, Philippe Le Hégaret <plh@w3.org>, Barry Leiba <barryleiba@computer.org>, Pete Resnick <presnick@qualcomm.com>, Daniel Appelquist <appelquist@gmail.com>
See also https://gist.github.com/mnot/138549 Fwiw Sent from my iPhone > On 23 Dec 2014, at 2:01 am, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > > * Julian Reschke wrote: >>> On 2014-12-22 14:43, Sam Ruby wrote: >>> If there is a program I can use to mechanically check for RFC 3986 >>> compliance and shows how a given URI is to be interpreted (scheme, host, >>> path, query, fragment, etc.), I'll gladly update my results. >> >> RFC 3986 has a regexp that's expected to parse valid URIs consistent >> with the ABNF; see >> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B>. >> >> To change that to a validity checker, we probably just need to restrict >> the character classes so that non-ASCII characters never match. > > http://www.websitedev.de/temp/rfc3986-check.html.gz has a mechanically > generated DFA for all productions used in RFC 3986 that should be easy > to extract. https://github.com/hoehrmann/demo-parselov does, too, and > also gives you a correct parse tree; if `example` is a UTF-8 encoded > file containing `http://:?`, then > > % node demo-parselov.js rfc3986.URI.json.gz example -json > > would print > > ["URI", [ > ["scheme", [ > ["ALPHA", [], 0, 1], > ["ALPHA", [], 1, 2], > ["ALPHA", [], 2, 3], > ["ALPHA", [], 3, 4]], 0, 4], > ["hier-part", [ > ["authority", [ > ["host", [ > ["reg-name", [], 7, 7]], 7, 7], > ["port", [], 8, 8]], 7, 8], > ["path-abempty", [], 8, 8]], 5, 8], > ["query", [], 9, 9]], 0, 9] > > I can make a data file for `URI-reference` and IRIs if anyone wants > to use this in its current work-in-progress form. > -- > Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de > D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de > Available for hire in Berlin (early 2015) · http://www.websitedev.de/
Received on Monday, 22 December 2014 23:37:19 UTC