- From: Ian Fette <ifette@google.com>
- Date: Fri, 23 Jul 2010 21:15:12 -0700
http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists some interesting cases we've come across on the anti-phishing team in Google. To the extent you're concerned with / interested in canonicalizaiton, it may be worth taking a look at (not to suggest you follow that in determining how to parse/canonicalize URLs, but rather to make sure that you have some "correct" way of handling the listed URLs). BTW, are you covering canonicalization? -Ian On Fri, Jul 23, 2010 at 9:02 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote: > On 7/23/10 11:59 PM, Silvia Pfeiffer wrote: > >> Is that URLs as values of attributes in HTML or is that URLs as pasted >> into the address bar? I believe their processing differs... >> > > It certainly does in Firefox (the latter have a lot more fixup done to > them, and there are also differences in terms of how character encodings are > handled). > > I would be particularly interested in data on this last, across different > browsers, operating systems, and locales... There seem to be servers out > there expecting their URIs in UTF-8 and others expecting them in ISO-8859-1, > and it's not clear to me how to make things work with them all. > > -Boris > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100723/66308637/attachment.htm>
Received on Friday, 23 July 2010 21:15:12 UTC