Draft - URI Usability

This is a draft webmaster tip, under work and review by the Quality Assurance Team, and shouldn’t be considered as an official tip from W3C while it remains a draft.

File Extensions

URIs must normally not have extensions. File extensions must be in the following order:

Natural Languages

  1. The main natural languages of the webpage must be specified in order of decreasing importance
    1. Example: A webpage with text in German (as used in Germany) and English (as used in the USA), with German used more often, must have the extension .de-de.en-us
  2. The extension must be as specific as possible
    1. Example: .en-us must be used instead of .en
    2. Reasoning:
      1. The client must be told the variant of the language used
      2. The server must be able to do content negotiation between variants of the primary language
  3. A Digression: Accept-Language must be as general as possible
    1. Example: If the client understands all variants of English, it must be specified as Accept-Language: en. Specifying Accept-Language: en-us is a mistake as it means that the client understands only en-us
  4. Position: This extension is first to allow clients to bypass content negotiation of language
    1. Example: the client has Accept-Language: de, en;q=0.9. (The client understands German, with 100% quality; and English, with 90% quality.) Requesting webpage.en-us retrieves the en-us version even though a de-de version is available
  5. This must be followed also for non-markup documents that contain text. Example: Images containing text

MIME Type

  1. This may be used to send a particular MIME type to a client with a wrong Accept header
  2. Example:
    1. An XHTML (application/xhtml+xml) webpage is also made available with the HTML MIME type (text/html) to allow it to be used by older user agents
    2. The webpage links to a machine translation system which negotiates to take XHTML but refuses to process XHTML
    3. The link to this system may be given as webpage.en-us.html instead of webpage to force the webpage to be served as HTML
  3. Position: This extension is second because clients are more likely to specify a natural language than a MIME type

Character Encoding

  1. Example: A webpage is available as webpage.de-de.xhtml.utf8 and as webpage.de-de.xhtml.iso8859-1
  2. Position: This extension is third because it is the least likely (among Natural languages, MIME type and Character encoding) to be specified by the client

Server-side Technology

  1. Example: A webpage with Server Side Includes (SSI) is named as webpage.en-us.xhtml.utf8.ssi
  2. Position: This extension is last because the client is not involved in any server-side technology

Apache

Apache allows the configuration to be modified at the server-wide level (httpd.conf) as well as at the directory level (.htaccess). A sample Apache configuration file:

DefaultLanguage en-US
ForceLanguagePriority None
AddLanguage en-US .en-us
AddLanguage de-DE .de-de

AddDefaultCharset UTF-8
AddType application/xhtml+xml .xhtml
AddOutputFilter INCLUDES .ssi
MultiviewsMatch Any

Hiding Extensions

Assumption: If a directory is requested, its index webpage is served.

Non-Markup Content

  1. index filenames must not be used
  2. Reasoning: non-markup content will not require sub-webpages later

Markup

  1. An index filename in the directory for the webpage (say, webpage) must be used
  2. Reasoning: This is suitable for markup because sub-webpages may be needed later, which can be provided as sub-directories
  3. If this method is used, then the URI must be webpage/ and not webpage
    1. Reasoning: If the client requests webpage and the server wishes to serve webpage/index.en-us.xhtml.utf8.ssi, it redirects the client to webpage/, resulting in one extra request and one extra response
    2. Reasoning: The client must be informed that ]]> (in the server’s response) points to webpage/otherpage.en-us.xhtml.utf8.ssi and not to otherpage.en-us.xhtml.utf8.ssi

Time

ISO 8601

  1. Widely used with computers and on the Internet
  2. Lexicographic sort automatically sorts by time

URI Structure for ISO 8601

Two cases arise (The examples show 2004 Mar. 20):

  1. Webpages published frequently
    1. A hierarchy must be provided
    2. The format with hyphens and colons (2004-03-20) must be used
    3. Non-numeric characters must be replaced by slashes (2004/03/20) to provide the hierarchy
  2. Webpages published rarely
    1. Hierarchy must not be provided
    2. The format without hyphens and colons (20040320) must be used

Usernames

  1. Must be of the form given names . family name
  2. All alphabets must be made lowercase
  3. -s must be retained; other symbols must be replaced by _
  4. Spaces in family name must be replaced by _
  5. Spaces between given names must be replaced by .
  6. Examples:
    1. Noam Chomskynoam.chomsky
    2. John McCarthyjohn.mccarthy
    3. Howard Hathaway Aikenhoward.hathaway.aiken
    4. Alan M. Turingalan.m.turing
    5. W. Bruce Croftw.bruce.croft
    6. John von Neumannjohn.von_neumann
    7. Hector Garcia-Molinahector.garcia-molina
  7. Use ~ in URIs if and only if the URI is that of an individual

Further Reading

  1. Apache HTTP Server
  2. Choose URIs Wisely
  3. Common HTTP Implementation Problems
  4. Cool URIs don’t change
  5. ISO 8601 — Representation of Dates and Times
  6. RFC 2616 — Hypertext Transfer Protocol — HTTP/1.1
  7. RFC 2616 Section 12 — Content Negotiation

Valid XHTML 1.1!
Created Date: 2004-03-20 by Rajasekaran Deepak
Last modified $Date: 2003/10/17 03:27:44 $ by $Author: ot $

Copyright © 2000-2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.