- From: Norman Walsh <ndw@nwalsh.com>
- Date: Thu, 06 Oct 2011 15:44:11 -0400
- To: XProc Dev <xproc-dev@w3.org>
- Message-ID: <m2wrch52ic.fsf@nwalsh.com>
Hello world, Consider the following pipeline: <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="1.0" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:l="http://xproc.org/library"> <p:output port="result"/> <p:http-request> <p:input port="source"> <p:inline> <c:request method="get" href="http://tests.xproc.org/tests/doc/html-utf8.data"/> </p:inline> </p:input> </p:http-request> </p:declare-step> It returns a base64 encoded document: <c:body xmlns:c="http://www.w3.org/ns/xproc-step" content-type="application/octet-stream" encoding="base64">PCFET0NUWVBFIGh0bWw+CjxodG1sIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hodG1s Ij4KPGhlYWQ+Cjx0aXRsZT5QYWdlIFRpdGxlPC90aXRsZT4KPC9oZWFkPgo8Ym9keT4KPHA+UGFn ZSBjb250ZW50LjwvcD4KPC9ib2R5Pgo8L2h0bWw+Cg== </c:body> Suppose I amend the pipeline as follows: <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="1.0" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:l="http://xproc.org/library"> <p:output port="result"/> <p:http-request> <p:input port="source"> <p:inline> <c:request method="get" href="http://tests.xproc.org/tests/doc/html-utf8.data"/> </p:inline> </p:input> </p:http-request> <p:wrap wrapper="c:request" match="/"/> <p:add-attribute match="/c:request" attribute-name="href" attribute-value="http://validator.nu/?out=xml"/> <p:add-attribute match="/c:request" attribute-name="method" attribute-value="post"/> <p:http-request/> </p:declare-step> What should happen? I think the answer is that the body should be unencoded before it's sent to the server. That's not what XML Calabash (0.9.36) does, but I think that's a bug. Agreed? Does it strike you as odd that there's no charset attribute on c:body? Now consider this pipeline: <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="1.0" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:l="http://xproc.org/library"> <p:output port="result"/> <p:http-request> <p:input port="source"> <p:inline> <c:request method="get" href="http://tests.xproc.org/tests/doc/html-utf8.data"/> </p:inline> </p:input> </p:http-request> <p:unescape-markup/> </p:declare-step> What do you think it produces? XML Calabash produces a copy of the input. It doesn't decode the data because we didn't tell the step the encoding: <p:unescape-markup encoding="base64"/> Now you think it's going to do the right thing, but it doesn't because we didn't specify a charset. This is a damn shame because I haven't a clue what the charset is. But let's muddle on. <p:unescape-markup encoding="base64" charset="utf-8"/> This fails too because it tries to use an XML parser. I'm not sure if that's a bug or not. I think I could try an HTML parser for application/octet-stream and still be conformant. Finally, this works: <p:unescape-markup content-type="text/html" encoding="base64" charset="utf-8"/> But it sure seems like it's making me work awfully hard. Especially if you consider that I'd need a choose to select the encoding or not attribute as encoding="" would not work. I think... 1. c:body should be allowed to have a charset parameter 2. If the charset parameter isn't known/specified, we default to...ISO Latin 1, or whatever the Internet tells us the default is for text/* documents that don't specify a charset. 3. If the input to p:unescape-markup is a c:body element then we should use the content-type, encoding, and charset attributes from that element if they aren't specified on the step. Thoughts? Be seeing you, norm -- Norman Walsh Lead Engineer MarkLogic Corporation Phone: +1 413 624 6676 www.marklogic.com
Received on Thursday, 6 October 2011 19:44:42 UTC