Draft - Make Readable URIs
This is a draft "webmaster tip", under work and review by the Quality Assurance Team,
and shouldn't be considered as an official tip from W3C while it remains a draft.
Choosing URIs
URIs must be:
- Easy to type
- Easy to pronounce
- Easy to remember
- Meaningful
File Extensions
URIs must normally not have extensions. File extensions must be in the
following order:
Natural Languages
- The main natural languages of the webpage must be specified in order
of decreasing importance
- Example: A webpage with text in German (as used in Germany)
and English (as used in the USA), with German used more often, must
have the extension
.de-de.en-us
- The extension must be as specific as possible
- Example:
.en-us
must be used instead
of .en
- Reasoning:
- The client must be told the variant of the language used
- The server must be able to do content negotiation between variants
of the primary language
- A Digression:
Accept-Language
must be as
general as possible
- Example: If the client understands all variants of English,
it must be specified as
Accept-Language: en
. Specifying
Accept-Language: en-us
is a mistake as it
means that the client understands only
en-us
- Position: This extension is first to allow clients to bypass
content negotiation of language
- Example: the client has
Accept-Language: de, en;q=0.9
. (The client understands
German, with 100% quality; and English, with 90% quality.) Requesting
webpage.en-us
retrieves the en-us
version even though a de-de
version is
available
- This must be followed also for non-markup documents that contain text.
Example: Images containing text
MIME Type
- This may be used to send a particular MIME type to a client with a
wrong
Accept
header
- Example:
- An XHTML (
application/xhtml+xml
) webpage is also made
available with the HTML MIME type (text/html
) to allow it
to be used by older user agents
- The webpage links to a machine translation system which negotiates
to take XHTML but refuses to process XHTML
- The link to this system may be given as
webpage.en-us.html
instead of
webpage
to force the webpage to be served as
HTML
- Position: This extension is second because clients are more
likely to specify a natural language than a MIME type
Character Encoding
- Example: A webpage is available as
webpage.de-de.xhtml.utf8
and as
webpage.de-de.xhtml.iso8859-1
- Position: This extension is third because it is the least
likely (among Natural languages, MIME type and Character encoding) to be
specified by the client
Server-side Technology
- Example: A webpage with Server Side Includes (SSI) is named
as
webpage.en-us.xhtml.utf8.ssi
- Position: This extension is last because the client is not
involved in any server-side technology
Apache
Apache allows the configuration to be modified at the server-wide level
(httpd.conf
) as well as at the directory level
(.htaccess
). A sample Apache configuration file:
DefaultLanguage en-US
LanguagePriority en-US de-DE
AddLanguage en-US .en-us
AddLanguage de-DE .de-de
AddDefaultCharset UTF-8
AddCharset ISO-8859-1 .iso8859-1
AddType application/xhtml+xml .xhtml
AddType text/html .html
AddOutputFilter INCLUDES .ssi
MultiviewsMatch Any
Hiding Extensions
Assumption: If a directory is requested, its index
webpage is served.
Non-Markup Content
index
filenames must not be used
- Reasoning: non-markup content will not require sub-webpages
later
Markup
- An
index
filename in the directory for the webpage
(say, webpage
) must be used
- Reasoning: This is suitable for markup because sub-webpages
may be needed later, which can be provided as sub-directories
- If this method is used, then the URI must be
webpage/
and not
webpage
- Reasoning: If the client requests
webpage
and
the server wishes to serve
webpage/index.en-us.xhtml.utf8.ssi
, it redirects
the client to webpage/
, resulting in one extra
request and one extra response
- Reasoning: The client must be informed that
]]>
(in
the server's response) points to
webpage/otherpage.en-us.xhtml.utf8.ssi
and not
to otherpage.en-us.xhtml.utf8.ssi
Time
ISO 8601
- Widely used with computers and on the Internet
- Lexicographic sort automatically sorts by time
URI Structure for ISO 8601
Two cases arise (The examples show 2004 Mar. 20):
- Webpages published frequently
- Provide a hierarchy
- Use the format with hyphens and colons
(
2004-03-20
)
- Replace all non-numeric characters by slashes to provide the
hierarchy (
2004/03/20
)
- Webpages published rarely
- Do not provide a hierarchy
- Use the format without hyphens and colons
20040320
Further Reading
- Apache HTTP Server 2.0 Documentation
- Common HTTP Implementation Problems
- Cool URIs don't change
- ISO 8601 --- Representation of Dates and Times
- RFC 2616 --- Hypertext Transfer Protocol --- HTTP/1.1
- RFC 2616 Section 12 --- Content Negotiation