Re: Helping out with canonicalization of URIs

Paul Hoffman / IMC wrote:
> 
> Greetings again. In the discussion of PaceCanonicalIds, some questions 
> were brought up about what draft-fielding-uri-rfc2396bis really says 
> about canonicalization. Section 6 of that draft says a few different 
> things. At the URI BOF at the IETF meeting last week, I volunteered the 
> Atompub WG to be reviewers for that document. :-)
> 
> So, all you canonicalization folks: please review the document, 
> particularly section 6, and send comments to uri@w3.org (archived at 
> <http://lists.w3.org/Archives/Public/uri/>). Just like on this list, if 
> you see something you consider wrong, suggest new text. Your comments 
> will be considered for the soon-to-happen IETF last call on the document.

Excerpts from sections 3 "Syntax Components":

       foo://example.com:8042/over/there?name=ferret#nose
       \_/   \______________/\_________/ \_________/ \__/
        |           |            |            |        |
     scheme     authority       path        query   fragment

    authority   = [ userinfo "@" ] host [ ":" port ]

    userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )

Excerpt from section 6.3 "Canonical Form":

   # Always provide the URI scheme in lowercase characters.
   # Always provide the host, if any, in lowercase characters.
   # Only perform percent-encoding where it is essential.
   # Always use uppercase A-through-F characters when percent-encoding.
   # Prevent dot-segments appearing in non-relative URI paths.
   # For schemes that define a default authority, use an empty authority
     if the default is desired.
   # For schemes that define an empty path to be equivalent to a path of
    "/", use "/".

These rules completely cover scheme, path, and partially cover 
authority.  Here are some URIs that I can't determine if they are in 
canonical form based solely on the rules listed in rfc2396-bis:

   http://:@example.com/
   http://example.com:80/
   http://example.com/gateway.cgi?
   http://www.w3.org/2000/01/rdf-schema#

My initial inclination would be to declare all of these as 
non-canonical, but there is enough common practice of the last example 
that it probably should be an exception.

- Sam Ruby

Received on Monday, 9 August 2004 12:54:33 UTC