- From: Sam Ruby <rubys@intertwingly.net>
- Date: Sat, 06 Dec 2014 23:09:40 -0500
- To: public-ietf-w3c@w3.org
On 12/06/2014 01:40 PM, Sam Ruby wrote: > > If you take a survey of implementations, you will find that in addition > to the outliers, there are two families of implementations. One that > collect around RFC 3986 are precise (in that they tend to produce the > same results) but not necessary accurate in the face of IDNA and Unicode > considerations. And another that collect around browser results. The > latter is less precise (in that there are variations), but tend overall > to be more accurate with respect to other applicable standards. I've added Perl to my test results using the following program: https://github.com/webspecs/url/blob/develop/evaluate/testuri.pl It has been a while since I've programmed in Perl. If there are things I missed, bugs in general, or even simply better ways of doing things, please let me know. - - - I then took a look at the results, and believe that there being two families of implementations is more a matter of conventional wisdom; whereas reality isn't quite so clean. Here's an example: https://url.spec.whatwg.org/interop/urltest-results/683ac9869d Looking at this, it doesn't look like addressable or rust do IDNA processing. Rust at least fesses up to this. :-) Node.js and Perl do less IDNA processing steps than other implementations. In particular, they skip step 1, but do steps 2 and 3 of the following page: http://www.unicode.org/reports/tr46/#ToASCII Everybody else does all 3 steps. Note: this isn't necessarily because Node.js and Perl skipped a step, it may very well be that they implement an entirely different version of IDNA than everybody else does[1]. Chrome goes an extra step, and recognizes that the result is a IPv4, albeit one expressed in an uncommon way, and canonicalizes it. On the theory that canonical URIs should round-trip; the current draft of the WebPlatform URL Specification aligns with Chrome on this, even though it is the only browser that exhibits this behavior. - - - This is an example of the type of issue I'd like to explore with those interested in the topic of interoperable parsing behavior. - Sam Ruby [1] https://annevankesteren.nl/2012/11/idna-hell
Received on Sunday, 7 December 2014 04:10:03 UTC