- From: Sam Ruby <rubys@intertwingly.net>
- Date: Fri, 31 Oct 2014 01:08:08 -0700
- To: "www-archive@w3.org" <www-archive@w3.org>
First attempt apparently didn't make it to the online archives. - Sam Ruby -------- Forwarded Message -------- Subject: Re: URL Spec rewrite (bug 25946) and galimatias test results Date: Wed, 29 Oct 2014 01:14:18 +0100 From: Santiago M. Mola <santi@mola.io> To: Sam Ruby <rubys@intertwingly.net> CC: Michael(tm) Smith <mike@w3.org>, www-archive@w3.org <www-archive@w3.org> Hi, 2014-10-25 2:36 GMT+02:00 Sam Ruby <rubys@intertwingly.net <mailto:rubys@intertwingly.net>>: He suggested that I ask you for feedback on the following: http://intertwingly.net/__projects/pegurl/url.html <http://intertwingly.net/projects/pegurl/url.html> It's definitely a useful resource. I think that the parser defined in the current spec is easier to follow if you want to implement it as-is. Your approach gives a better idea about what should the parser do on a higher level. After implementing the parser following the current spec, I had a hard-time determining what should be the parsing output for some cases, These new diagrams would solve that problem. I also said that I would test galimatias for compatibility. I've posted the results here: http://intertwingly.net/__stories/2014/10/24/urltest-__results/ <http://intertwingly.net/stories/2014/10/24/urltest-results/> Thank you for taking the time to include Galimatias! A few notes: it doesn't appear to me that galimatias reports any recoverable parse errors (for example, including a tab or a linefeed inside a path). Galimatias checks every defined error, both recoverable and fatal. It provides a customizable ErrorHandler interface. The core provides a DefaultErrorHandler that just ignores any recoverable error and StrictErrorHandler, which fails on any error. The user could implement a LoggingErrorHandler that logs recoverable parsing errors, a CollectorErrorHandler that collects every error for later analysis/validation, etc. I will add more error handlers to the core if common patterns of use emerge among the user community. Also galimatias doesn't provide the interfaces that the URL Standard defines, for example to get the portname - an interface that is supposed to return null if the port matches the default port for the scheme. Right. Galimatias does not implement the URLUtils interface. I have opened an issue to keep a reference for it: https://github.com/smola/galimatias/issues/44 I'm still not sure I want to provide URLUtils interface as is. It's a browser-centric API that I don't find particularly useful outside the JavaScript-in-a-browser scope. Maybe it makes sense for standards validation code such as validator.nu <http://validator.nu>? Even with that accounted for, there still are a number of notable results: Null pointer exceptions, some examples: http://intertwingly.net/__stories/2014/10/24/urltest-__results/bf8630587b <http://intertwingly.net/stories/2014/10/24/urltest-results/bf8630587b>, http://intertwingly.net/__stories/2014/10/24/urltest-__results/4038fcfa6d <http://intertwingly.net/stories/2014/10/24/urltest-results/4038fcfa6d>, http://intertwingly.net/__stories/2014/10/24/urltest-__results/275612041a <http://intertwingly.net/stories/2014/10/24/urltest-results/275612041a>, http://intertwingly.net/__stories/2014/10/24/urltest-__results/2f33177681 <http://intertwingly.net/stories/2014/10/24/urltest-results/2f33177681>, http://intertwingly.net/__stories/2014/10/24/urltest-__results/e630bf59c6 <http://intertwingly.net/stories/2014/10/24/urltest-results/e630bf59c6>, http://intertwingly.net/__stories/2014/10/24/urltest-__results/a16d100f3 <http://intertwingly.net/stories/2014/10/24/urltest-results/a16d100f37> Some of these seem caused because the call to url.host().toString() in your test case. In these cases, host is null. This was the intended behaviour. Returning an empty string instead of null for fragment: http://intertwingly.net/__stories/2014/10/24/urltest-__results/1b77231365 <http://intertwingly.net/stories/2014/10/24/urltest-results/1b77231365> AFAIK this is consistent with the standard (fragment, not hash). You can change your testing class to: result.put("hash", (url.fragment()!=null && !url.fragment().isEmpty()) ? "#"+url.fragment() : ""); Returning an empty string instead of null for query: http://intertwingly.net/__stories/2014/10/24/urltest-__results/24f081633d <http://intertwingly.net/stories/2014/10/24/urltest-results/24f081633d> Again, this is the standard (query, not search). You can change your testing class to: result.put("search", (url.query()!=null && !url.query().isEmpty()) ? "?"+url.query() : ""); ipv6 addresses not wrapped in []: http://intertwingly.net/__stories/2014/10/24/urltest-__results/54f86d22f2 <http://intertwingly.net/stories/2014/10/24/urltest-results/54f86d22f2> Right. IPv6 addresses are wrapped in [] when serialized as part of an URL, but they are not wrapped when printed as standalone entities. I'll fix it: https://github.com/smola/galimatias/issues/45 difference in case: http://intertwingly.net/__stories/2014/10/24/urltest-__results/e40dedda84 <http://intertwingly.net/stories/2014/10/24/urltest-results/e40dedda84>, http://intertwingly.net/__stories/2014/10/24/urltest-__results/9a4e54b1c3 <http://intertwingly.net/stories/2014/10/24/urltest-results/9a4e54b1c3> Galimatias is biased towards URL normalization. It tries to minimize the creation of URLs that are equivalent according to the standard. A setting to disable this percent-encoding normalization behaviour will be provided if there is a real world use case for it. If you want to review how I captured these results, the program I used can be found here: http://intertwingly.net/__stories/2014/10/24/urltest.__java <http://intertwingly.net/stories/2014/10/24/urltest.java> Please let me know if you identify any problems with that program, and I will be glad to rerun the tests. I have added a test to check with http://intertwingly.net/stories/2014/10/05/urltestdata.json Do you have more up-to-date data? I found Galimatias failed with this: http://intertwingly.net/stories/2014/10/24/urltest-results/bc6ea8bdf8 {"input":"http://%30%78%63%30%2e%30%32%35%30.01%2e","base":"http://other.com/","scheme":"","username":"","password":null,"host":"","port":"","path":"","query":"","fragment":""} My latest code parses this URL as http://0xc0.0250.01./ This seems in line with the standard, since it does not perform sanity checks for DNS rules. See: https://github.com/smola/galimatias/issues/26 It also fails with this one: {"input":"http://192.168.0.257","base":"http://other.com/","scheme":"","username":"","password":null,"host":"","port":"","path":"","query":"","fragment":""} But 192.168.0.257 is a valid domain name at DNS-level and it's not forbidden by the URL standard. Apart from these two cases, Galimatias passes all test cases if query/search fragment/hash differences are considered. Please, let me know if there is any failure in the future or if you have any feedback. Thank you again! Best, Santiago
Received on Friday, 31 October 2014 08:08:39 UTC