- From: Maciej Stachowiak <mjs@apple.com>
- Date: Sun, 25 Jul 2010 23:00:26 -0700
On Jul 25, 2010, at 5:57 AM, Adam Barth wrote: > 2010/7/24 Maciej Stachowiak <mjs at apple.com>: >> On Jul 24, 2010, at 9:55 AM, Adam Barth wrote: >>> 2010/7/23 Ian Fette (????????) <ifette at google.com>: >>>> http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists >>>> some interesting cases we've come across on the anti-phishing team in >>>> Google. To the extent you're concerned with / interested in >>>> canonicalizaiton, it may be worth taking a look at (not to suggest you >>>> follow that in determining how to parse/canonicalize URLs, but rather to >>>> make sure that you have some "correct" way of handling the listed URLs). >>> >>> Thanks. That's helpful. >>> >>>> BTW, are you covering canonicalization? >>> >>> Yes. The three main things I'm hoping to cover are parsing, >>> canonicalization, and resolving relative URLs. >> >> Is there any place in the Web platform where "canonicalize" is exposed by itself in a Web-facing way? I think resolve against a base and parse into components are the only algorithms whose effects can be observed directly. I think we only need to spec "canonicalize" if it turns out to be a useful subroutine. > > As far as I know, you can only see f(x) = > canonicalize(parse(resolve(x))) and also some breakdown components of > f(x) in HTMLAnchorElement and window.location.hash (and friends). > > Conceptually, it's a bit easier to think about them as three separate > functions. The main difference between parse and canonicalize is that > parse segments the input and canonicalize takes the segments, mutates > them, and assembles them into a new string. > > I haven't studied resolve in as much detail yet, so I'm less clear how > that fits into the puzzle. I would consider canonicalize() to be part of resolve(). Every time you retrieve a "cooked" URL (as opposed to original source text), you both resolve it against a possible base and canonicalize it as a single step. The two are not exposed separately. It's not clear to me that making this operation into three separate steps with a parse in the middle is helpful, or even representative of a good implementation strategy. I would think of parse() as something that happens after canonicalization in the cases where single components of the URL are exposed. Regards, Maciej
Received on Sunday, 25 July 2010 23:00:26 UTC