- From: Alex Milowski <alex@milowski.org>
- Date: Fri, 6 Jul 2007 07:25:49 -0700
- To: public-xml-processing-model-wg@w3.org
On 7/6/07, Norman Walsh <ndw@nwalsh.com> wrote: > / Innovimax SARL <innovimax@gmail.com> was heard to say: > | I think Murray is right in many sense > | > | 1) we need to be consistent on the use of "load-like" component > > I think our consistent story should be "use XML". There are, at the > moment, only two places where it seems to me that we might want to > make special rules for supporting HTML content: in http-request > (because the pipeline author has essentially no control over what > comes back) and in unescape-markup (because it's broken by design). > > If we can't draw a relatively small, clear boundary around the areas > where we're going to provide special support, then I'll probably feel > better if we don't do anything at all. > > | 2)...but we need to take care, that the user should be able to know > | that the input was "changed" to be processed > > I don't really understand this concern. We're only making the changes > that the user requests. If they don't request any, we won't make any. > > 1. User runs unescape markup without asking for special support. > a. The escaped markup is WF XML, life is good > b. The escaped markup is not WF XML, step fails > > 2. User runs unescape markup and *asks for* special support. > a. The escaped markup is WF XML, life is good > b. The escaped markup is not WF XML, step fixes it so it's WF XML > > Likewise, > > 3. User runs http-request without asking for special support. > a. What comes back is WF XML, life is good > b. What comes back is not WF XML, step fails > > 4. User runs http-request and *asks for* special support. > a. What comes back is WF XML, life is good > b. What comes back is not WF XML, step fixes it so it's WF XML I had imagined that: * http-request doesn't have special support for non-XML media types. * http-request already supports returning non-XML media types. If the content type is not a text media type, you get base64 encoded content * you then need to run unescape-markup on the c:body element returned by http-request. Given that any "application/.." media type will be encoded as base64, we probably need to add a "encoding" option to unescape-markup so that we can decode base64 content. We'll also need a charset parameter as well as support for the charset parameter on content type. So, if you had "application/stuff; charset=utf-8" returned and you wanted to parse that as HTML, you could do: <p:unescape-markup> <p:option name="charset" value="utf-8"/> <p:option name="encoding" value="base64"/> <p:option name="content-type" value="text/html"/> </p:unescape-markup> Similarly, you could pull the charset from the content-type returned by the http-request in that http-request will return something like: <c:http-request status="200"> <c:body content-type="application/stuff; charset=utf-8"> ... </c:body> </c:http-request> and so you'd do: <p:unescape-markup> <p:option name="charset" select="substring-after(/c:body/@content-type,'charset=')"/> <p:option name="encoding" value="base64"/> <p:option name="content-type" value="text/html"/> </p:unescape-markup> > I don't want to support anything special on load. > > My feeling is, if we support HTML on load, we'll have to support it on > p:document. If we support HTML on load and document and unescape > markup and http-request, why not support it on every step? If we > support not WF XML on every step, how is this an XML pipeline > language? I was trying to avoid proliferation of content-type handling into all of our steps. One possibility for load is to do what http-request currently does and return an element with encoded/escaped content. The you can use whatever step to process that non-XML into XML. If we support some options on p:unescape-markup as I've suggested (and what is in the current draft), then the HTML case is covered in that every HTML document is a two-step process: * p:load followed by a p:unescape-markup * p:http-request followed by a p:unescape-markup. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics
Received on Friday, 6 July 2007 14:26:02 UTC