Re: parsing URI (references) according to RFC 3986 from Julian Reschke on 2011-06-19 (public-iri@w3.org from June 2011)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 19 Jun 2011 19:33:38 +0200
To: Chris Weber <chris@lookout.net>
CC: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <4DFE32F2.5080006@gmx.de>

On 2011-06-18 19:15, Chris Weber wrote:
> On 6/18/2011 4:56 AM, Julian Reschke wrote:
>> What's also missing is a way to uniquely identify a test case; the
>> obvious answer is to assign a unique identifier for each of them -- does
>> anybody have a better idea that requires less work???
>>
>> Feedback welcome; in particular with respect to interesting additional
>> tests (I don't have any non-URI tests yet).
>>
>> Best regards, Julian
>>
>
> For my own testing I took your original test case format and added an
> <id>NNNN</id> element. You can see these at
> https://raw.github.com/cweb/iri-tests/master/tests.xml. A single test
> ends up including the test id in a sub-domain label of the host name.
>
> <uri>http://0002.iris.test.ing/foo/bar?query#frag</uri>
>
> My test setup requires some overhead though - I use a database, Web
> server, and a DNS server with a wildcard alias. At runtime I also
> prepend a GUID as another sub-domain label to the host name to make sure
> each generated test case instance can be uniquely identified. The end
> result is an href and img src like:
>
> http://40f72247-9ce0-41a9-bddf-1afb2a9745b9.0023.iris.test.ing/
>
> This whacky setup allows me to capture the parsing results not only from
> the DOM, but also from the raw HTTP request and the DNS query which I
> sniff off the wire. Results can be correlated by the GUIDs. Limitations
> of using this format include being a bit constrained to the http scheme
> right now - I haven't thought of a way to fit this approach to other
> schemes.

That looks like a cool approach for testing host name parsing and 
resolution!

> In regards to your question about uniquely identifying test cases - do
> you think including an id as a sub-domain label in 'http' tests would
> work? To me it seemed like the one place that had the least affect on
> constraining what could be tested (e.g. I can still test scheme, host,
> port, path, query, and fragment components as well as surrounding
> whitespace).
>
> Otherwise, what are your thoughts on including an easily identifiable
> token that could be placed anywhere in the test string?

Good question.

For generic parsing tests, there really isn't a single component on 
which presence we can rely on. For instance, we may have relative 
references (no authority), empty paths, missing queries, or missing 
fragment identifiers.

For now my plan is to group the tests into groups (like "RFC3986", 
"RFC2397"...), and build a test id by numbering within the groups; then, 
for the more "interesting" tests we can assign a unique identifier 
within that group.

Best regards, Julian

Received on Sunday, 19 June 2011 17:34:24 UTC