Re: encode-for-uri() and filenames?

On 22.11.2020 18:12, Martin Honnen wrote:
>>
>> Because the percent signs as they are used in the filename are 
>> incompatible with URI encoding, I expect them to be percent-encoded 
>> themselves, with the modified filename echoed to stderr (in the 
>> <p:identity> step) and used to save the test file (in the <p:store> 
>> step). What happens instead is that the percent encoded value is 
>> written, as expected, to stderr:
>>
>>     test-1a%257%25.xml
>>
>> but the file is saved to the local filesystem as if encode-for-uri() 
>> had not been applied, that is, as:
>>
>>     test-1a%7%.xml
> 
> I don't have an explanation for that, perhaps ask Achim by raising an 
> issue on Morgana on Sourceforge.
> 

This is the correct behavior that you are observing.

Quoting https://tools.ietf.org/html/rfc3986#section-2.4:

  Because the percent ("%") character serves as the indicator for
  percent-encoded octets, it must be percent-encoded as "%25" for that
  octet to be used as data within a URI.

This means that 'test-1a%7%.xml' is not a valid URI. The (relative) URI 
that corresponds to this file name is 'test-1a%257%25.xml'. When it is 
used to store the file, the percent encoding will be undone, resulting 
in a file name 'test-1a%7%.xml'.

Instead of encode-for-uri(), you can also use p:urify() 
(https://spec.xproc.org/master/head/xproc/#f.urify) that will only 
encode the parts of the file name (or URI) that need to be encoded.

For example, p:urify('c:\Users\gerrit\test-1a%7%.xml') will result in 
'file:///c:/Users/gerrit/test-1a%257%25.xml'

p:urify('c:\Users\gerrit\test-1a%257%25.xml') → 
'file:///c:/Users/gerrit/test-1a%25257%2525.xml' (the input isn’t a URI, 
therefore '%25' will be regarded as a literal part of the file name that 
must be percent-encoded as '%2525' in a URI.

p:urify('file:///c:/Users/gerrit/test-1a%257%25.xml') → 
'file:///c:/Users/gerrit/test-1a%257%25.xml' (no additional encoding of 
the '%25's because “Implementations must not percent-encode or decode 
the same string more than once” as stated in the same Sect. 2.4 of RFC 
3986).

Morgana reports 'file:///c:/Users/gerrit/test-1a%25257%2525.xml' as the 
result of the last invocation. I think this is incorrect. Otherwise, 
Morgana seems to implement p:urify() incredibly well.

Gerrit

Received on Sunday, 22 November 2020 17:56:45 UTC