W3C home > Mailing lists > Public > whatwg@whatwg.org > October 2014

Re: [whatwg] URL: spec review - basic_parser

From: Sam Ruby <rubys@intertwingly.net>
Date: Mon, 13 Oct 2014 19:05:47 -0400
Message-ID: <543C5ACB.2030207@intertwingly.net>
To: Anne van Kesteren <annevk@annevk.nl>
Cc: whatWG <whatwg@whatwg.org>
On 10/13/2014 10:05 AM, Anne van Kesteren wrote:
>> Not yet.  I'm still seeing a large set of differences between what I am
>> producing and what is in urltestdata.txt and need to track down whether the
>> problems are in my implementation, the spec, or in the test results.
>> Once those three are in sync; I'll try to look at the bigger picture.
> Cool. Sounds great.

New test results:


The fourth column ("Notes") indicates which properties differ between 
what my software produces and what the testdata indicates should be the 
expected results.  These fall into three basic categories:

1) rows where the notes merely say "href" are cases where parse errors 
are thrown and failure is returned.  The expected results are an object 
that returns the original href, but empty values for all other 
properties.  I don't see this behavior in the spec:


2) rows that contain "href hostname" appear to be ones where the 
expected results do not appear to be updated to include the host to IDNA 

3) rows that contain "href protocol hostname pathname" need further 
investigation.  I suspect that these are based on my using a library to 
normalize the IDNA mapping, and it "helpfully" cleans up other problems 
like removing U+0000 characters from the input.

My implementation can be found here:


Note the comments linking back to spec sections, and comments that 
identify step numbers.

- Sam Ruby

P.S.  I didn't update to the latest test data yet; but from what I can 
see the changes wouldn't materially affect the results, so I am 
publishing now.

P.P.S.  Preview of what is yet to come, ruby2js run against my 
implementation produces:


This will need some additional work to get running, for example lines 
54, 65, 82, 85, and 267 call out to libraries that aren't available to 
JavaScript.  Lines 275 to 277 are debugging lines that will be removed 
Received on Monday, 13 October 2014 23:06:19 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 17:00:24 UTC