- From: Graham Klyne <gk@ninebynine.org>
- Date: Wed, 18 May 2005 16:29:08 +0100
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: uri@w3.org
At 14:12 18/05/05 +0200, Julian Reschke wrote:
>Graham Klyne wrote:
>>...
>> if ( !Character.isLetterOrDigit(c) && uriChars.indexOf(c) < 0 )
>> { // %-encode non-URI and other special characters
>> String hv = ("0"+Integer.toHexString(c)) ;
>> int lv = hv.length() ;
>> mapfilename.replace( i, i, "%"+hv.substring(lv-2,lv-1) ) ;
>> i += 3 ;
>> }
>> }
>>...
>
>How is this supposed to work with non-ASCII characters, in particular with
>character points above 255?
Oh yes, I had it in my mind that Java uses UTF-8 to represent
strings. I've no idea why, and anyway that wasn't properly thought through.
Here's my next attempt, which does the mapping on filenames converted to UTF-8.
[[
/**
* Convert filename string to a URI:
*
* Map Unicode to UTF-8.
* Map system-dependent path separator characters to '/'.
* %-escape non-USASCII and non-URI character codes in the UTF-8
* %-escape characters which get special URI interpretation
* For Unix-like systems, the absolute filename begins with a '/'
* and is preceded by "file://".
* For other systems an extra '/' must be supplied.
*/
public static String uriFromFilename(
String filename)
{
byte[] oldfilename ;
try
{
oldfilename = filename.getBytes("UTF-8") ;
}
catch ( UnsupportedEncodingException e )
{ // Make an unchecked exception: this is a fatal condition
throw new AssertionError( e.toString() ) ;
}
String uriChars = // See:
http://www.ietf.org/rfc/rfc3986.txt
// ":/?#[]@" + // gen-delims
":/@" + // selected gen-delims not %-encoded
"!$&'()*+,;=" + // sub-delims
"-._~" ; // unreserved
StringBuffer mapfilename = new StringBuffer( oldfilename.length ) ;
char pathsep = System.getProperty("path.separator").charAt(0);
for ( int i = 0 ; i < oldfilename.length ; i++ )
{
char c = (char) oldfilename[i] ;
// Assumes 0<pathsep<=127
if ( c == pathsep )
{ // Replace filename path separator with '/'
mapfilename.append( '/' ) ;
}
else
if ( (c >=128) || (!Character.isLetterOrDigit(c) &&
uriChars.indexOf(c) < 0) )
{ // %-encode non-URI and other special characters
String hv = ("0"+Integer.toHexString(c)) ;
mapfilename.append( "%"+hv.substring(hv.length()-2) ) ;
}
else
{ // No mapping required
mapfilename.append( c ) ;
}
}
if (mapfilename.charAt(0) == '/')
{
return "file://"+mapfilename.toString() ;
}
else
{
return "file:///"+mapfilename.toString() ;
}
}
]]
#g
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Wednesday, 18 May 2005 15:37:14 UTC