W3C home > Mailing lists > Public > whatwg@whatwg.org > July 2010

[whatwg] [URL] Starting work on a URL spec

From: Adam Barth <w3c@adambarth.com>
Date: Sun, 25 Jul 2010 05:57:50 -0700
Message-ID: <AANLkTi=g3jrC8bvoigrQBY-SQ-4GeoM+mZtasp2V-Rmf@mail.gmail.com>
2010/7/24 Maciej Stachowiak <mjs at apple.com>:
> On Jul 24, 2010, at 9:55 AM, Adam Barth wrote:
>> 2010/7/23 Ian Fette (????????) <ifette at google.com>:
>>> http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists
>>> some interesting cases we've come across on the anti-phishing team in
>>> Google. To the extent you're concerned with / interested in
>>> canonicalizaiton, it may be worth taking a look at (not to suggest you
>>> follow that in determining how to parse/canonicalize URLs, but rather to
>>> make sure that you have some "correct" way of handling the listed URLs).
>>
>> Thanks.  That's helpful.
>>
>>> BTW, are you covering canonicalization?
>>
>> Yes.  The three main things I'm hoping to cover are parsing,
>> canonicalization, and resolving relative URLs.
>
> Is there any place in the Web platform where "canonicalize" is exposed by itself in a Web-facing way? I think resolve against a base and parse into components are the only algorithms whose effects can be observed directly. I think we only need to spec "canonicalize" if it turns out to be a useful subroutine.

As far as I know, you can only see f(x) =
canonicalize(parse(resolve(x))) and also some breakdown components of
f(x) in HTMLAnchorElement and window.location.hash (and friends).

Conceptually, it's a bit easier to think about them as three separate
functions.  The main difference between parse and canonicalize is that
parse segments the input and canonicalize takes the segments, mutates
them, and assembles them into a new string.

I haven't studied resolve in as much detail yet, so I'm less clear how
that fits into the puzzle.

> There's also the related question of what browsers should do with input typed into the URL field. Other than establishing that these rules may be different between the URL field and URLs present in content, I'm not sure this is amenable to spec. But perhaps a survey of what browsers do would be useful.

I wasn't planning to cover that because it's not a critical to
interoperability, at least not in the same way understanding what do
do with the href attribute of the <a> tag is.  There are also other
considerations there because the URLs are displayed to users as
security indicators.

Adam
Received on Sunday, 25 July 2010 05:57:50 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:25 UTC