- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sat, 18 Jun 2011 13:56:39 +0200
- To: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Hi, some time ago I started working on a sample implementation of the RFC 3986 algorithms for parsing and resolving references. The results are over here (incl. source files for people who want to play around with it, or add more tests): http://greenbytes.de/tech/tc/uris/ Note that the Regular Expression in <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B> works with any kind of input, not just valid URIs. Also, the resolution algorithm in <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.5> does not depend on valid components. I believe this can be a basis for the algorithms the HTML5 people are looking for. What's missing is: - optional preprocessing (strip leading/trailing whitespace) - optional postprocessing (fix non-ASCII characters in query parameter when not originating from UTF-8 encoded document; maybe scheme-specific cleanup). What's also missing is a way to uniquely identify a test case; the obvious answer is to assign a unique identifier for each of them -- does anybody have a better idea that requires less work??? Feedback welcome; in particular with respect to interesting additional tests (I don't have any non-URI tests yet). Best regards, Julian
Received on Saturday, 18 June 2011 11:57:18 UTC