Re: respecting IETF customs?

On 12/12/2014 03:26 PM, Roy T. Fielding wrote:
> On Dec 12, 2014, at 9:18 AM, Sam Ruby wrote:
>> I hope that you find the following page to be easier to digest:
>>
>> https://url.spec.whatwg.org/interop/test-results/
>>
>> With this page, you can do more than simply compare user agents
>> against the reference implementation of the URL Standard.  You can
>> compare one browser against other browsers.  You can compare Perl
>> against Python. If you feel that there is a RFC 3986 compliant
>> application in the set, you can compare it against the reference
>> implementation.
>
> Nice, but it would be a lot better if abnormal URL references were
> grouped separately from normal references.  Many of the "test
> failures" are decisions by one or more of the implementations to
> reject a reference due to potential security problems (e.g., TCP
> well-known ports [0-53] that might be explicitly forbidden regardless
> of parsing) or syntax that is specifically forbidden by the scheme.
> Those should not be considered parser differences.

Here is the master set of test data:

https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt

If reverse engineering undocumented JavaScript isn't your thing (I know
it wasn't was what happy with), here is the data that the parser produces:

https://url.spec.whatwg.org/interop/urltestdata.json

Seeing that data expanded helped me "grok" the original format, which
isn't too bad.  Just be aware that two spaces after the first (i.e.
input) field means to reuse the base from above.

I encourage you to submit a pull request that sorts or splits the data
to your taste.  Additions are also welcome, and even encouraged!

As to whether or not "forbidden" syntaxes should be considered parser
differences, that's a subject of honest debate, with the key being how
likely is the input to be experienced in practice. If it is common
enough (and user facing tools are more subject to this issue than back
end servers), then the differences are an issue even if the input may be
considered forbidden.

The current draft of the URL standard is intentionally very unforgiving,
for example, when presented with a mal-formed IPv6 address; but is very
tolerant of backslashes.  I would be very amenable to rules that making 
well-known port numbers explicitly disallowed for security reasons.

> What are you using to extract the result? Beware that some
> implementations will parse and provide one URL in a javascript API,
> but will actually fix or reject that URL before using it via HTTP.
> RFC3986 only defines what would be sent.

The code used to extract the results can be found here:

https://github.com/webspecs/url/tree/develop/evaluate

The actual data collected can be found here:

https://github.com/webspecs/url/tree/develop/evaluate/useragent-results

Again pull requests are welcome!

> Also, please feel free to include my RFC test cases, located at
>
> https://svn.apache.org/repos/asf/labs/webarch/trunk/uri/test/

If you are willing to do a pull request, consider adding some or all of
these in there?

If you do, I'll capture results for all of the user agents that I've
done to date (and any others that people might suggest -- preferably in
the form of a pull request :-)), and update my results page to include a 
suitable separation.

> ....Roy

- Sam Ruby

Received on Friday, 12 December 2014 21:11:23 UTC