Why hexify fragments? from Chris Lilley on 2005-03-23 (www-i18n-comments@w3.org from March 2005)

From: Chris Lilley <chris@w3.org>
Date: Wed, 23 Mar 2005 16:41:27 +0100
To: www-i18n-comments@w3.org
Message-ID: <1716596513.20050323164127@w3.org>

Hello www-i18n-comments,

in the specification

Character Model for the World Wide Web 1.0: Resource Identifiers
W3C Candidate Recommendation 22 November 2004
http://www.w3.org/TR/2004/CR-charmod-resid-20041122/

>> C060 [S] Specifications that define new syntax for URIs, such as a
>> new URI scheme or a new kind of fragment identifier, MUST specify
>> that characters outside the US-ASCII repertoire are encoded using
>> UTF-8 and %HH-escaping.

>> This is in accordance with Guidelines for new URL Schemes [RFC 2718],
>> Section 2.2.5.

While working on implementing this requirement in a specification, it
was pointed out that requiring escaping for fragment identifiers, while
safe, is sort of pointless.

Using a notation where capital letters represent some characters outside
the repertoire of US-ASCII, then given this IRI

http://example.org/Zfoo.bar#ABCD

what is hexified and sent to the server is

http://example.org/Zfoo.bar

while the fragment, ABCD, is not sent to the server and is merely
applied once the resource and its Media type have been returned. Thus,
whether the protocol is 8-bit clean is irrelevant, and whether the
fragment was hexified or not is not detectable by observing the
implementation.

The guidelines make good sense for other parts of the IRI, such as
queries, etc but do not seem to be necessary or to provide any benefit
for fragments, and does not seem to be testable short of reading the
source code.

-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 W3C Graphics Activity Lead

Received on Wednesday, 23 March 2005 15:41:27 UTC