Draft - URI Usability
This is a draft webmaster tip
, under work and review by the Quality Assurance Team,
and shouldn’t be considered as an official tip from W3C while it remains a draft.
File Extensions
URIs must normally not have extensions. File extensions must be in the
following order:
Natural Languages
- The main natural languages of the webpage must be specified in order
of decreasing importance
- Example: A webpage with text in German (as used in Germany)
and English (as used in the USA), with German used more often, must
have the extension
.de-de.en-us
- The extension must be as specific as possible
- Example:
.en-us
must be used instead
of .en
- Reasoning:
- The client must be told the variant of the language used
- The server must be able to do content negotiation between variants
of the primary language
- A Digression:
Accept-Language
must be as
general as possible
- Example: If the client understands all variants of English,
it must be specified as
Accept-Language: en
. Specifying
Accept-Language: en-us
is a mistake as it
means that the client understands only
en-us
- Position: This extension is first to allow clients to bypass
content negotiation of language
- Example: the client has
Accept-Language: de, en;q=0.9
. (The client understands
German, with 100% quality; and English, with 90% quality.) Requesting
webpage.en-us
retrieves the en-us
version even though a de-de
version is
available
- This must be followed also for non-markup documents that contain text.
Example: Images containing text
MIME Type
- This may be used to send a particular MIME type to a client with a
wrong
Accept
header
- Example:
- An XHTML (
application/xhtml+xml
) webpage is also made
available with the HTML MIME type (text/html
) to allow it
to be used by older user agents
- The webpage links to a machine translation system which negotiates
to take XHTML but refuses to process XHTML
- The link to this system may be given as
webpage.en-us.html
instead of
webpage
to force the webpage to be served as
HTML
- Position: This extension is second because clients are more
likely to specify a natural language than a MIME type
Character Encoding
- Example: A webpage is available as
webpage.de-de.xhtml.utf8
and as
webpage.de-de.xhtml.iso8859-1
- Position: This extension is third because it is the least
likely (among Natural languages, MIME type and Character encoding) to be
specified by the client
Server-side Technology
- Example: A webpage with Server Side Includes (SSI) is named
as
webpage.en-us.xhtml.utf8.ssi
- Position: This extension is last because the client is not
involved in any server-side technology
Apache
Apache allows the configuration to be modified at the server-wide level
(httpd.conf
) as well as at the directory level
(.htaccess
). A sample Apache configuration file:
DefaultLanguage en-US
ForceLanguagePriority None
AddLanguage en-US .en-us
AddLanguage de-DE .de-de
AddDefaultCharset UTF-8
AddType application/xhtml+xml .xhtml
AddOutputFilter INCLUDES .ssi
MultiviewsMatch Any
Hiding Extensions
Assumption: If a directory is requested, its index
webpage is served.
Non-Markup Content
index
filenames must not be used
- Reasoning: non-markup content will not require sub-webpages
later
Markup
- An
index
filename in the directory for the webpage
(say, webpage
) must be used
- Reasoning: This is suitable for markup because sub-webpages
may be needed later, which can be provided as sub-directories
- If this method is used, then the URI must be
webpage/
and not
webpage
- Reasoning: If the client requests
webpage
and
the server wishes to serve
webpage/index.en-us.xhtml.utf8.ssi
, it redirects
the client to webpage/
, resulting in one extra
request and one extra response
- Reasoning: The client must be informed that
]]>
(in
the server’s response) points to
webpage/otherpage.en-us.xhtml.utf8.ssi
and not
to otherpage.en-us.xhtml.utf8.ssi
Time
ISO 8601
- Widely used with computers and on the Internet
- Lexicographic sort automatically sorts by time
URI Structure for ISO 8601
Two cases arise (The examples show 2004 Mar. 20):
- Webpages published frequently
- A hierarchy must be provided
- The format with hyphens and colons (
2004-03-20
) must be
used
- Non-numeric characters must be replaced by slashes
(
2004/03/20
) to provide the hierarchy
- Webpages published rarely
- Hierarchy must not be provided
- The format without hyphens and colons (
20040320
) must
be used
Usernames
- Must be of the form given names . family name
- All alphabets must be made lowercase
- -s must be retained; other symbols must be replaced by _
- Spaces in family name must be replaced by _
- Spaces between given names must be replaced by .
- Examples:
- Noam Chomsky → noam.chomsky
- John McCarthy → john.mccarthy
- Howard Hathaway Aiken → howard.hathaway.aiken
- Alan M. Turing → alan.m.turing
- W. Bruce Croft → w.bruce.croft
- John von Neumann → john.von_neumann
- Hector Garcia-Molina → hector.garcia-molina
- Use ~ in URIs if and only if the URI is that of an individual
Further Reading
- Apache HTTP Server
- Choose URIs Wisely
- Common HTTP Implementation Problems
- Cool URIs don’t change
- ISO 8601 — Representation of Dates and Times
- RFC 2616 — Hypertext Transfer Protocol — HTTP/1.1
- RFC 2616 Section 12 — Content Negotiation