Re: parsing URI (references) according to RFC 3986 from Adam Barth on 2011-06-18 (public-iri@w3.org from June 2011)

From: Adam Barth <ietf@adambarth.com>
Date: Sat, 18 Jun 2011 06:09:40 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <BANLkTi=Jz_PJ8Fsa8WL6btT3XRmjWXRpDQ@mail.gmail.com>

How does your implementation compare to existing browsers on this test suite:

http://trac.webkit.org/browser/trunk/LayoutTests/fast/url/

In particular, it would be helpful to add entries for your
implementation to the following table so that we can see whether it
makes desirable trade-offs in situations where browsers differ in
behavior:

https://raw.github.com/abarth/url-spec/master/tests/gurl-results/by-browser.txt

Thanks,
Adam


On Sat, Jun 18, 2011 at 4:56 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Hi,
>
> some time ago I started working on a sample implementation of the RFC 3986
> algorithms for parsing and resolving references. The results are over here
> (incl. source files for people who want to play around with it, or add more
> tests):
>
>        http://greenbytes.de/tech/tc/uris/
>
> Note that the Regular Expression in
> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B> works with any
> kind of input, not just valid URIs. Also, the resolution algorithm in
> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.5> does not
> depend on valid components.
>
> I believe this can be a basis for the algorithms the HTML5 people are
> looking for. What's missing is:
>
> - optional preprocessing (strip leading/trailing whitespace)
>
> - optional postprocessing (fix non-ASCII characters in query parameter when
> not originating from UTF-8 encoded document; maybe scheme-specific cleanup).
>
> What's also missing is a way to uniquely identify a test case; the obvious
> answer is to assign a unique identifier for each of them -- does anybody
> have a better idea that requires less work???
>
> Feedback welcome; in particular with respect to interesting additional tests
> (I don't have any non-URI tests yet).
>
> Best regards, Julian
>
>

Received on Saturday, 18 June 2011 13:10:42 UTC