W3C home > Mailing lists > Public > uri@w3.org > May 2005

Re: Update on file->URI code

From: Graham Klyne <gk@ninebynine.org>
Date: Wed, 18 May 2005 16:29:08 +0100
Message-Id: <5.1.0.14.2.20050518145326.00bdf0b0@127.0.0.1>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: uri@w3.org

At 14:12 18/05/05 +0200, Julian Reschke wrote:

>Graham Klyne wrote:
>>...
>>             if ( !Character.isLetterOrDigit(c) && uriChars.indexOf(c) < 0  )
>>                 { // %-encode non-URI and other special characters
>>                 String hv = ("0"+Integer.toHexString(c)) ;
>>                 int    lv = hv.length() ;
>>                 mapfilename.replace( i, i, "%"+hv.substring(lv-2,lv-1) ) ;
>>                 i += 3 ;
>>                 }
>>             }
>>...
>
>How is this supposed to work with non-ASCII characters, in particular with 
>character points above 255?

Oh yes, I had it in my mind that Java uses UTF-8 to represent 
strings.  I've no idea why, and anyway that wasn't properly thought through.

Here's my next attempt, which does the mapping on filenames converted to UTF-8.

[[
     /**
      * Convert filename string to a URI:
      *
      * Map Unicode to UTF-8.
      * Map system-dependent path separator characters to '/'.
      * %-escape non-USASCII and non-URI character codes in the UTF-8
      * %-escape characters which get special URI interpretation
      * For Unix-like systems, the absolute filename begins with a '/'
      * and is preceded by "file://".
      * For other systems an extra '/' must be supplied.
      */
     public static String uriFromFilename(
         String filename)
         {
         byte[] oldfilename ;
         try
             {
             oldfilename = filename.getBytes("UTF-8") ;
             }
         catch ( UnsupportedEncodingException e )
             { // Make an unchecked exception:  this is a fatal condition
             throw new AssertionError( e.toString() ) ;
             }
         String uriChars =           // See: 
http://www.ietf.org/rfc/rfc3986.txt
                   // ":/?#[]@" +    // gen-delims
                   ":/@" +           // selected gen-delims not %-encoded
                   "!$&'()*+,;=" +   // sub-delims
                   "-._~" ;          // unreserved
         StringBuffer mapfilename = new StringBuffer( oldfilename.length ) ;
         char pathsep = System.getProperty("path.separator").charAt(0);
         for ( int i = 0 ; i < oldfilename.length ; i++ )
             {
             char c = (char) oldfilename[i] ;
             // Assumes 0<pathsep<=127
             if ( c == pathsep )
                 { // Replace filename path separator with '/'
                 mapfilename.append( '/' ) ;
                 }
             else
             if ( (c >=128) || (!Character.isLetterOrDigit(c) && 
uriChars.indexOf(c) < 0) )
                 { // %-encode non-URI and other special characters
                 String hv = ("0"+Integer.toHexString(c)) ;
                 mapfilename.append( "%"+hv.substring(hv.length()-2) ) ;
                 }
             else
                 { // No mapping required
                 mapfilename.append( c ) ;
                 }
             }
         if (mapfilename.charAt(0) == '/')
             {
             return "file://"+mapfilename.toString() ;
             }
         else
             {
             return "file:///"+mapfilename.toString() ;
             }
         }

]]

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Wednesday, 18 May 2005 15:37:14 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:09 UTC