- From: David Birnbaum <djbpitt@gmail.com>
- Date: Sun, 22 Nov 2020 13:39:11 -0500
- To: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@le-tex.de>, Martin Honnen <Martin.Honnen@gmx.de>
- Cc: XProc Dev <xproc-dev@w3.org>
- Message-ID: <CAP4v81qL92ufc_cJ1Bk0FHP8aGp_3haQfd_W=4sm5jyXVKLhPg@mail.gmail.com>
Dear Gerrit, Martin, and XProc Dev, Thank you for the quick responses, and for the reminder about p:urify(). The detail I had not understood is, as Gerrit writes: This means that 'test-1a%7%.xml' is not a valid URI. The (relative) URI that corresponds to this file name is 'test-1a%257%25.xml'. When it is used to store the file, the percent encoding will be undone, resulting in a file name 'test-1a%7%.xml'. I knew that 'test-1a%7%.xml' was not a valid URI, which is why I tried to pass it through encode-for-uri(), and the output I expected to emerge from that was 'test-1a%257%25.xml'. Since the percent-encoded version is a valid filename (even if not an especially user-friendly one), I expected that it would be used as created, with the percent-encoding preserved in the filename. I see, though, at https://spec.xproc.org/master/head/steps/#c.store that the value of @href on a <p:store> step is typed as xs:anyURI, and not as a string, which is obvious and natural now that I think about it. My misunderstanding lay in not expecting that the URI would be converted to a string by undoing the percent encoding when the filename was created. Upon reflection, though, I now think that's the behavior I should have expected, since applications that ingest URIs and have to map them to file system resources need to undo percent encoding as a matter of course. This XProc inquiry was a follow-up on my earlier question on the exist-open mailing list, where the eXist-db Java admit client refused to upload files with names like "test-1a%7%.xml", with an error message to the effect that the filename could not be converted to a URI. I think that behavior is incorrect, since encode-for-uri() and p:urify() can convert the string value of that filename to a percent-escaped URI, so the conversion does not appear to be impossible. I asked on the eXist-db mailing list because I thought that eXist-db should URI-ify the filename and upload the file, and when it refused to do that, I then wondered whether the source of the problem was that my XProc script should have used the percent-escaped value of the filename when it created the file in the <p:store> step. I think I am now back where I began, though, that is, that eXist-db declines to construct a URI it can use from a filename that encode-for-uri() and p:urify() are able to convert to a URI. I don't think I should expect eXist-db to refuse to construct a URI from that filename, but that's a question for the eXist-db mailing list, and I'll move it over there. Thank you all again for the quick and helpful responses. Best, David On Sun, Nov 22, 2020 at 12:56 PM Imsieke, Gerrit, le-tex < gerrit.imsieke@le-tex.de> wrote: > > > On 22.11.2020 18:12, Martin Honnen wrote: > >> > >> Because the percent signs as they are used in the filename are > >> incompatible with URI encoding, I expect them to be percent-encoded > >> themselves, with the modified filename echoed to stderr (in the > >> <p:identity> step) and used to save the test file (in the <p:store> > >> step). What happens instead is that the percent encoded value is > >> written, as expected, to stderr: > >> > >> test-1a%257%25.xml > >> > >> but the file is saved to the local filesystem as if encode-for-uri() > >> had not been applied, that is, as: > >> > >> test-1a%7%.xml > > > > I don't have an explanation for that, perhaps ask Achim by raising an > > issue on Morgana on Sourceforge. > > > > This is the correct behavior that you are observing. > > Quoting https://tools.ietf.org/html/rfc3986#section-2.4: > > Because the percent ("%") character serves as the indicator for > percent-encoded octets, it must be percent-encoded as "%25" for that > octet to be used as data within a URI. > > This means that 'test-1a%7%.xml' is not a valid URI. The (relative) URI > that corresponds to this file name is 'test-1a%257%25.xml'. When it is > used to store the file, the percent encoding will be undone, resulting > in a file name 'test-1a%7%.xml'. > > Instead of encode-for-uri(), you can also use p:urify() > (https://spec.xproc.org/master/head/xproc/#f.urify) that will only > encode the parts of the file name (or URI) that need to be encoded. > > For example, p:urify('c:\Users\gerrit\test-1a%7%.xml') will result in > 'file:///c:/Users/gerrit/test-1a%257%25.xml' > > p:urify('c:\Users\gerrit\test-1a%257%25.xml') → > 'file:///c:/Users/gerrit/test-1a%25257%2525.xml' (the input isn’t a URI, > therefore '%25' will be regarded as a literal part of the file name that > must be percent-encoded as '%2525' in a URI. > > p:urify('file:///c:/Users/gerrit/test-1a%257%25.xml') → > 'file:///c:/Users/gerrit/test-1a%257%25.xml' (no additional encoding of > the '%25's because “Implementations must not percent-encode or decode > the same string more than once” as stated in the same Sect. 2.4 of RFC > 3986). > > Morgana reports 'file:///c:/Users/gerrit/test-1a%25257%2525.xml' as the > result of the last invocation. I think this is incorrect. Otherwise, > Morgana seems to implement p:urify() incredibly well. > > Gerrit > > > >
Received on Sunday, 22 November 2020 18:39:36 UTC