W3C home > Mailing lists > Public > uri@w3.org > August 2004

Re: Helping out with canonicalization of URIs

From: Sam Ruby <rubys@intertwingly.net>
Date: Mon, 09 Aug 2004 08:54:33 -0400
Message-ID: <41177409.1080504@intertwingly.net>
To: uri@w3.org
CC: Atom WG <atom-syntax@imc.org>

Paul Hoffman / IMC wrote:
> Greetings again. In the discussion of PaceCanonicalIds, some questions 
> were brought up about what draft-fielding-uri-rfc2396bis really says 
> about canonicalization. Section 6 of that draft says a few different 
> things. At the URI BOF at the IETF meeting last week, I volunteered the 
> Atompub WG to be reviewers for that document. :-)
> So, all you canonicalization folks: please review the document, 
> particularly section 6, and send comments to uri@w3.org (archived at 
> <http://lists.w3.org/Archives/Public/uri/>). Just like on this list, if 
> you see something you consider wrong, suggest new text. Your comments 
> will be considered for the soon-to-happen IETF last call on the document.

Excerpts from sections 3 "Syntax Components":

       \_/   \______________/\_________/ \_________/ \__/
        |           |            |            |        |
     scheme     authority       path        query   fragment

    authority   = [ userinfo "@" ] host [ ":" port ]

    userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )

Excerpt from section 6.3 "Canonical Form":

   # Always provide the URI scheme in lowercase characters.
   # Always provide the host, if any, in lowercase characters.
   # Only perform percent-encoding where it is essential.
   # Always use uppercase A-through-F characters when percent-encoding.
   # Prevent dot-segments appearing in non-relative URI paths.
   # For schemes that define a default authority, use an empty authority
     if the default is desired.
   # For schemes that define an empty path to be equivalent to a path of
    "/", use "/".

These rules completely cover scheme, path, and partially cover 
authority.  Here are some URIs that I can't determine if they are in 
canonical form based solely on the rules listed in rfc2396-bis:


My initial inclination would be to declare all of these as 
non-canonical, but there is enough common practice of the last example 
that it probably should be an exception.

- Sam Ruby
Received on Monday, 9 August 2004 12:54:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:08 UTC