Re: parsing URI (references) according to RFC 3986

How does your implementation compare to existing browsers on this test suite:

http://trac.webkit.org/browser/trunk/LayoutTests/fast/url/

In particular, it would be helpful to add entries for your
implementation to the following table so that we can see whether it
makes desirable trade-offs in situations where browsers differ in
behavior:

https://raw.github.com/abarth/url-spec/master/tests/gurl-results/by-browser.txt

Thanks,
Adam


On Sat, Jun 18, 2011 at 4:56 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Hi,
>
> some time ago I started working on a sample implementation of the RFC 3986
> algorithms for parsing and resolving references. The results are over here
> (incl. source files for people who want to play around with it, or add more
> tests):
>
>        http://greenbytes.de/tech/tc/uris/
>
> Note that the Regular Expression in
> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B> works with any
> kind of input, not just valid URIs. Also, the resolution algorithm in
> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.5> does not
> depend on valid components.
>
> I believe this can be a basis for the algorithms the HTML5 people are
> looking for. What's missing is:
>
> - optional preprocessing (strip leading/trailing whitespace)
>
> - optional postprocessing (fix non-ASCII characters in query parameter when
> not originating from UTF-8 encoded document; maybe scheme-specific cleanup).
>
> What's also missing is a way to uniquely identify a test case; the obvious
> answer is to assign a unique identifier for each of them -- does anybody
> have a better idea that requires less work???
>
> Feedback welcome; in particular with respect to interesting additional tests
> (I don't have any non-URI tests yet).
>
> Best regards, Julian
>
>

Received on Saturday, 18 June 2011 13:10:42 UTC