W3C home > Mailing lists > Public > www-i18n-comments@w3.org > March 2005

Re: Why hexify fragments?

From: Chris Lilley <chris@w3.org>
Date: Wed, 23 Mar 2005 18:02:13 +0100
Message-ID: <1186417021.20050323180213@w3.org>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: www-i18n-comments@w3.org

On Wednesday, March 23, 2005, 5:27:18 PM, Bjoern wrote:

BH> * Chris Lilley wrote:
>>>> C060 [S] Specifications that define new syntax for URIs, such as a
>>>> new URI scheme or a new kind of fragment identifier, MUST specify
>>>> that characters outside the US-ASCII repertoire are encoded using
>>>> UTF-8 and %HH-escaping.
>>
>>>> This is in accordance with Guidelines for new URL Schemes [RFC 2718],
>>>> Section 2.2.5.
>>
>>While working on implementing this requirement in a specification, it
>>was pointed out that requiring escaping for fragment identifiers, while
>>safe, is sort of pointless.

BH> If e.g. image/svg+xml does not define how to encode fragment identifiers
BH> then e.g. svg#Bj%F6rn might be legal and it would be highly unclear what
BH> this might match against.

Yes, you are right for the case where the IRI is converted to a URI and
stored in the XML. I was thinking of the case where the IRI is stored
directly in the XML and only hexified to cross the wire. But then I
suppose its not "A new URI format" in that case .... or is it?

BH> Equally, if it does not say that it is based
BH> on UTF-8, implementations might not consider svg#Bj%C3%B6rn to match an
BH> element with id="Björn". Reversing the %xx escaping is atm required for
BH> image/svg+xml, yet most implementations fail to do this, so this is not
BH> pointless to state explicitly.

I agree for this case, with URIs. I was referring to the case of an IRI,
so #Björn directly in the instance.

>>while the fragment, ABCD, is not sent to the server and is merely
>>applied once the resource and its Media type have been returned. Thus,
>>whether the protocol is 8-bit clean is irrelevant, and whether the
>>fragment was hexified or not is not detectable by observing the
>>implementation.

BH> This depends on information flow requirements. What is testable is
BH> whether the escaping is reversed, something like (in various encodings)

Good point.

BH>   <svg xmlns="http://www.w3.org/2000/svg" version="1.1"
BH>        xmlns:xlink="http://www.w3.org/1999/xlink">
  
BH>     <rect id="Björn" fill="green" width="100" height="100" />
BH>     <rect id="test" fill="red" width="100" height="100" />
BH>     <use xlink:href="#Bj%C3%B6rn" />
  
BH>   </svg>

BH> would do. This fails in ASV6, Batik 1.5-dev, Opera 8...

Yes, thats a good URI test. I will add it to the test suite.

BH>  For text/html
BH> see
BH> http://lists.w3.org/Archives/Public/www-html-editor/2002OctDec/0001




-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
Received on Wednesday, 23 March 2005 17:02:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 08:32:35 GMT