RE: escape-uri and utf-8

The specification [1] states:

The effect of the function is to replace any special character in the string
by an escape sequence of the form %xx%yy..., where xxyy... is the
hexadecimal representation of the octets used to represent the character in
UTF-8.

Is this not clear enough?

[1] http://www.w3.org/TR/xquery-operators/#func-escape-uri

Michael Kay
Software AG

> -----Original Message-----
> From: Wesley W. Terpstra [mailto:wesley@terpstra.ca] 
> Sent: 17 August 2002 23:14
> To: public-qt-comments@w3.org
> Subject: xf:escape-uri and utf-8
> 
> 
> 
> 
> 
> 
> How is uri-escape(str, bool) to deal with unicode strings in str?
> 
> It seems to me that it should output the string in utf-8 
> encoded as hex stream with %s.
> 
> This has several payoffs:
> 
> Transparent behaviour for uris:
> 	the examples in the current draft continue to work
> 	All of normal ascii gets encoded as expected
> 	Will behave the way most people expect
> 
> A well-defined map for all of unicode
> 	
> POST/GETs to CGIs can support internationalisation as long as 
> they realize their data is in utf-8
> 
> rfc822 mail headers for non-iso-8859 languages work simply:
> 	<xsl:text>Subject: =?utf-8?Q?</xsl:text>
> 	<xsl:value-of select="translate(uri-encode(subject, 
> true), '%', '=')"/>
> 	<xsl:text>?=<xsl:text>
> 
> I am sure many other hacks are possible when one knows that 
> the output is has charset utf-8. So, please clarify the 
> output charset used for encoding the unicode string prior to 
> hexifying it. This way all implementation will use a common 
> charset and greatly increase the utility of this function.
> 
> -- 
> Wesley W. Terpstra <wesley@terpstra.ca>
> 

Received on Monday, 26 August 2002 10:20:02 UTC