Re: canonical form and scheme-specific processing rules for URI/IRI spec

When different things have the same effect for a particular purpose they are said to be equivalent (for that purpose). Different purposes may require different equivalence relationships.

If, for a given equivalence relationship, you have a way of choosing, for any set of equivalent elements, one of them, that choice is known as the normal or canonical form.

"Normalization" and "canonicalization" are used equivalently, and neither term is canonical.

One can derive an equivalence relationship from a canonicalization method: two elements are equivalent if they have the same canonical form.

For IRIs there are several equivalence relationships, useful for different purposes. Defining a canonical form (choosing a canonical canonicalization) doesn't seem necessary, although it might be useful. But you would need a different canonicalization for every equivalence relationship.

Connected by DROID on Verizon Wireless


-----Original message-----
From: Boris Zbarsky <bzbarsky@MIT.EDU>
To: "public-iri@w3.org" <public-iri@w3.org>
Sent: Wed, Jun 29, 2011 15:11:21 GMT+00:00
Subject: Re: canonical form and scheme-specific processing rules for URI/IRI spec

On 6/29/11 8:20 AM, Julian Reschke wrote:
> How is "canonicalization" different from "normalization", as defined in
> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.6> and
> <http://tools.ietf.org/html/rfc3987#section-5>?

"normalization" talks about comparing URIs and when different forms are
equivalent.

"canonicalization" is about picking one particular form from the set of
equivalent forms and putting your URIs in that form.

So you can implement normalization by canonicalizing URIs and then doing
string comparisons, for example.

I suspect we need both.  We _definitely_ need normalization.

> Also, is there really a "single" type of canonicalization browsers need?

Ideally, yes.  Note that browsers do scheme-based normalization, hence
the quotes around "single", I assume?

> Where is it used?
>
> - same-origin checks?
> - same-document checks?
> - ...more?

Those are normalization use cases.  A few other normalization use cases:

- Same-path checks for cookies.
- http://dev.w3.org/csswg/css3-conditional/#at-document


Canonicalization use cases:

- Consistency in the string after "GET" for HTTP GET requests (because
   servers that should be doing normalization in practice often do not).
- Consistency in Referer headers
- Consistency in Origin heders
- Consistency in what the location object looks like (web pages
   commonly grab various properties from it and depend on them having
   particular forms).
- Consistency in what various DOM getters return (.href on anchors,
   .documentURI on documents, and so forth).

There are probably more; this is off the top of my head.

-Boris

Received on Wednesday, 6 July 2011 18:00:49 UTC