- From: Graham Klyne <gk@ninebynine.org>
- Date: Wed, 18 May 2005 16:29:08 +0100
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: uri@w3.org
At 14:12 18/05/05 +0200, Julian Reschke wrote: >Graham Klyne wrote: >>... >> if ( !Character.isLetterOrDigit(c) && uriChars.indexOf(c) < 0 ) >> { // %-encode non-URI and other special characters >> String hv = ("0"+Integer.toHexString(c)) ; >> int lv = hv.length() ; >> mapfilename.replace( i, i, "%"+hv.substring(lv-2,lv-1) ) ; >> i += 3 ; >> } >> } >>... > >How is this supposed to work with non-ASCII characters, in particular with >character points above 255? Oh yes, I had it in my mind that Java uses UTF-8 to represent strings. I've no idea why, and anyway that wasn't properly thought through. Here's my next attempt, which does the mapping on filenames converted to UTF-8. [[ /** * Convert filename string to a URI: * * Map Unicode to UTF-8. * Map system-dependent path separator characters to '/'. * %-escape non-USASCII and non-URI character codes in the UTF-8 * %-escape characters which get special URI interpretation * For Unix-like systems, the absolute filename begins with a '/' * and is preceded by "file://". * For other systems an extra '/' must be supplied. */ public static String uriFromFilename( String filename) { byte[] oldfilename ; try { oldfilename = filename.getBytes("UTF-8") ; } catch ( UnsupportedEncodingException e ) { // Make an unchecked exception: this is a fatal condition throw new AssertionError( e.toString() ) ; } String uriChars = // See: http://www.ietf.org/rfc/rfc3986.txt // ":/?#[]@" + // gen-delims ":/@" + // selected gen-delims not %-encoded "!$&'()*+,;=" + // sub-delims "-._~" ; // unreserved StringBuffer mapfilename = new StringBuffer( oldfilename.length ) ; char pathsep = System.getProperty("path.separator").charAt(0); for ( int i = 0 ; i < oldfilename.length ; i++ ) { char c = (char) oldfilename[i] ; // Assumes 0<pathsep<=127 if ( c == pathsep ) { // Replace filename path separator with '/' mapfilename.append( '/' ) ; } else if ( (c >=128) || (!Character.isLetterOrDigit(c) && uriChars.indexOf(c) < 0) ) { // %-encode non-URI and other special characters String hv = ("0"+Integer.toHexString(c)) ; mapfilename.append( "%"+hv.substring(hv.length()-2) ) ; } else { // No mapping required mapfilename.append( c ) ; } } if (mapfilename.charAt(0) == '/') { return "file://"+mapfilename.toString() ; } else { return "file:///"+mapfilename.toString() ; } } ]] #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
Received on Wednesday, 18 May 2005 15:37:14 UTC