Re: canonical form and scheme-specific processing rules for URI/IRI spec from Boris Zbarsky on 2011-06-29 (public-iri@w3.org from June 2011)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Wed, 29 Jun 2011 11:00:26 -0400
To: public-iri@w3.org
Message-ID: <4E0B3E0A.40506@mit.edu>

On 6/29/11 8:20 AM, Julian Reschke wrote:
> How is "canonicalization" different from "normalization", as defined in
> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.6> and
> <http://tools.ietf.org/html/rfc3987#section-5>?

"normalization" talks about comparing URIs and when different forms are 
equivalent.

"canonicalization" is about picking one particular form from the set of 
equivalent forms and putting your URIs in that form.

So you can implement normalization by canonicalizing URIs and then doing 
string comparisons, for example.

I suspect we need both.  We _definitely_ need normalization.

> Also, is there really a "single" type of canonicalization browsers need?

Ideally, yes.  Note that browsers do scheme-based normalization, hence 
the quotes around "single", I assume?

> Where is it used?
>
> - same-origin checks?
> - same-document checks?
> - ...more?

Those are normalization use cases.  A few other normalization use cases:

- Same-path checks for cookies.
- http://dev.w3.org/csswg/css3-conditional/#at-document

Canonicalization use cases:

- Consistency in the string after "GET" for HTTP GET requests (because
   servers that should be doing normalization in practice often do not).
- Consistency in Referer headers
- Consistency in Origin heders
- Consistency in what the location object looks like (web pages
   commonly grab various properties from it and depend on them having
   particular forms).
- Consistency in what various DOM getters return (.href on anchors,
   .documentURI on documents, and so forth).

There are probably more; this is off the top of my head.

-Boris

Received on Wednesday, 29 June 2011 15:00:55 UTC