- From: Sam Ruby <rubys@intertwingly.net>
- Date: Tue, 28 Oct 2014 19:29:05 -0700
- To: "Santiago M. Mola" <santi@mola.io>
- CC: "Michael(tm) Smith" <mike@w3.org>, "www-archive@w3.org" <www-archive@w3.org>
On 10/28/14 5:14 PM, Santiago M. Mola wrote: > Hi, > > 2014-10-25 2:36 GMT+02:00 Sam Ruby <rubys@intertwingly.net > <mailto:rubys@intertwingly.net>>: > > > He suggested that I ask you for feedback on the following: > > http://intertwingly.net/__projects/pegurl/url.html > <http://intertwingly.net/projects/pegurl/url.html> > > > It's definitely a useful resource. > > I think that the parser defined in the current spec is easier to follow > if you want to implement it as-is. Having implemented it both ways, I respectfully disagree. :-) http://intertwingly.net/stories/2014/10/13/url_rb.html https://github.com/rubys/url#readme I'll also note that the spec isn't intended to be implemented as is, at least not for code intended to be run in production environments. > Your approach gives a better idea about what should the parser do on a > higher level. After implementing the parser following the current spec, > I had a hard-time determining what should be the parsing output for some > cases, These new diagrams would solve that problem. I also had a hard time, and that's why I assert that the current spec isn't exactly easy to follow. In fact, I will openly confess that I have looked into your implementation a number of times to see how the current spec should be interpreted, and attempted with my specification to make such areas clearer. > I also said that I would test galimatias for compatibility. I've > posted the results here: > > http://intertwingly.net/__stories/2014/10/24/urltest-__results/ > <http://intertwingly.net/stories/2014/10/24/urltest-results/> > > Thank you for taking the time to include Galimatias! > > A few notes: it doesn't appear to me that galimatias reports any > recoverable parse errors (for example, including a tab or a linefeed > inside a path). It does now. :-) http://intertwingly.net/projects/pegurl/urltest-results/ > Galimatias checks every defined error, both recoverable and fatal. It > provides a customizable ErrorHandler interface. The core provides a > DefaultErrorHandler that just ignores any recoverable error and > StrictErrorHandler, which fails on any error. The user could implement a > LoggingErrorHandler that logs recoverable parsing errors, a > CollectorErrorHandler that collects every error for later > analysis/validation, etc. I will add more error handlers to the core if > common patterns of use emerge among the user community. And I believe that any new recoverable errors you define should find their way back into the specification. As near as I can tell, the following is an example of one: http://intertwingly.net/projects/pegurl/urltest-results/bc6ea8bdf8 I say this because the error isn't defined here: https://url.spec.whatwg.org/#host-state And the following only defines fatal errors (e.g. step 5); https://url.spec.whatwg.org/#concept-host-parser But as I said, I have found the current specification to be hard to follow, so I may have missed something. > Also galimatias doesn't provide the interfaces that the URL Standard > defines, for example to get the portname - an interface that is > supposed to return null if the port matches the default port for the > scheme. > > Right. Galimatias does not implement the URLUtils interface. I have > opened an issue to keep a reference for it: > https://github.com/smola/galimatias/issues/44 > > I'm still not sure I want to provide URLUtils interface as is. It's a > browser-centric API that I don't find particularly useful outside the > JavaScript-in-a-browser scope. Maybe it makes sense for standards > validation code such as validator.nu <http://validator.nu>? As a point of reference, node.js came to the opposite conclusion: http://nodejs.org/api/url.html > Even with that accounted for, there still are a number of notable > results: > > Null pointer exceptions, some examples: > http://intertwingly.net/__stories/2014/10/24/urltest-__results/bf8630587b > <http://intertwingly.net/stories/2014/10/24/urltest-results/bf8630587b>, > http://intertwingly.net/__stories/2014/10/24/urltest-__results/4038fcfa6d > <http://intertwingly.net/stories/2014/10/24/urltest-results/4038fcfa6d>, > http://intertwingly.net/__stories/2014/10/24/urltest-__results/275612041a > <http://intertwingly.net/stories/2014/10/24/urltest-results/275612041a>, > http://intertwingly.net/__stories/2014/10/24/urltest-__results/2f33177681 > <http://intertwingly.net/stories/2014/10/24/urltest-results/2f33177681>, > http://intertwingly.net/__stories/2014/10/24/urltest-__results/e630bf59c6 > <http://intertwingly.net/stories/2014/10/24/urltest-results/e630bf59c6>, > http://intertwingly.net/__stories/2014/10/24/urltest-__results/a16d100f3 > <http://intertwingly.net/stories/2014/10/24/urltest-results/a16d100f37> > > Some of these seem caused because the call to url.host().toString() in > your test case. In these cases, host is null. This was the intended > behaviour. Yes, all were my fault. :-) > Returning an empty string instead of null for fragment: > http://intertwingly.net/__stories/2014/10/24/urltest-__results/1b77231365 > <http://intertwingly.net/stories/2014/10/24/urltest-results/1b77231365> > > AFAIK this is consistent with the standard (fragment, not hash). You can > change your testing class to: > > result.put("hash", (url.fragment()!=null && !url.fragment().isEmpty()) ? "#"+url.fragment() : ""); Here is what I am currently using: https://github.com/rubys/url/blob/peg.js/evaluate/testgalimatias.java > Returning an empty string instead of null for query: > http://intertwingly.net/__stories/2014/10/24/urltest-__results/24f081633d > <http://intertwingly.net/stories/2014/10/24/urltest-results/24f081633d> > > Again, this is the standard (query, not search). You can change your > testing class to: > > result.put("search", (url.query()!=null && !url.query().isEmpty()) ? "?"+url.query() : ""); Again, see above. > ipv6 addresses not wrapped in []: > http://intertwingly.net/__stories/2014/10/24/urltest-__results/54f86d22f2 > <http://intertwingly.net/stories/2014/10/24/urltest-results/54f86d22f2> > > Right. IPv6 addresses are wrapped in [] when serialized as part of an > URL, but they are not wrapped when printed as standalone entities. I'll > fix it: > https://github.com/smola/galimatias/issues/45 Actually, toHumanString() does what I need. > difference in case: > http://intertwingly.net/__stories/2014/10/24/urltest-__results/e40dedda84 > <http://intertwingly.net/stories/2014/10/24/urltest-results/e40dedda84>, > http://intertwingly.net/__stories/2014/10/24/urltest-__results/9a4e54b1c3 > <http://intertwingly.net/stories/2014/10/24/urltest-results/9a4e54b1c3> > > Galimatias is biased towards URL normalization. It tries to minimize the > creation of URLs that are equivalent according to the standard. A > setting to disable this percent-encoding normalization behaviour will be > provided if there is a real world use case for it. The URL Standard is also biased towards URL normalization. It is also very specific about the case of the output. My point is that you are not conforming to the standard here. Either your implementation should be updated, or the standard should be updated, or you should explicitly mark this as an area where you don't intend to comply with the standard. > If you want to review how I captured these results, the program I > used can be found here: > > http://intertwingly.net/__stories/2014/10/24/urltest.__java > <http://intertwingly.net/stories/2014/10/24/urltest.java> > > Please let me know if you identify any problems with that program, > and I will be glad to rerun the tests. > > I have added a test to check with > http://intertwingly.net/stories/2014/10/05/urltestdata.json > > Do you have more up-to-date data? Yes: http://intertwingly.net/projects/pegurl/urltestdata.json > I found Galimatias failed with this: > > http://intertwingly.net/stories/2014/10/24/urltest-results/bc6ea8bdf8 > {"input":"http://%30%78%63%30%2e%30%32%35%30.01%2e","base":"http://other.com/","scheme":"","username":"","password":null,"host":"","port":"","path":"","query":"","fragment":""} > > My latest code parses this URL as http://0xc0.0250.01./ > This seems in line with the standard, since it does not perform sanity > checks for DNS rules. See: https://github.com/smola/galimatias/issues/26 > It also fails with this one: > {"input":"http://192.168.0.257","base":"http://other.com/","scheme":"","username":"","password":null,"host":"","port":"","path":"","query":"","fragment":""} > > But 192.168.0.257 is a valid domain name at DNS-level and it's not > forbidden by the URL standard. I got Anne to fix the test data: https://github.com/w3c/web-platform-tests/commit/fe078df8f3ab7d6c9df1b6d80b119e3216179a0e#diff-7d84adcee757bea416deedfb7007e090 And you can find a newer version of the test data in JSON format at the URL mentioned above. > Apart from these two cases, Galimatias passes all test cases if > query/search fragment/hash differences are considered. Cool. Per above, in one case the test case was identified as incorrect and has been updated. I'll leave it up to you to decide whether you wish to Galimatias to be non-spec compliant in the other case, or to make the case that the spec should be changed. > Please, let me know if there is any failure in the future or if you have > any feedback. > > Thank you again! I definitely will do. > Best, > > Santiago - Sam Ruby
Received on Wednesday, 29 October 2014 02:29:57 UTC