Re: Allowing for naive/simple p:http-request from Alex Milowski on 2007-07-06 (public-xml-processing-model-wg@w3.org from July 2007)

From: Alex Milowski <alex@milowski.org>
Date: Fri, 6 Jul 2007 07:25:49 -0700
To: public-xml-processing-model-wg@w3.org
Message-ID: <28d56ece0707060725j17fbd3bob0390807d67abab9@mail.gmail.com>
On 7/6/07, Norman Walsh <ndw@nwalsh.com> wrote:
> / Innovimax SARL <innovimax@gmail.com> was heard to say:
> | I think Murray is right in many sense
> |
> | 1) we need to be consistent on the use of "load-like" component
>
> I think our consistent story should be "use XML". There are, at the
> moment, only two places where it seems to me that we might want to
> make special rules for supporting HTML content: in http-request
> (because the pipeline author has essentially no control over what
> comes back) and in unescape-markup (because it's broken by design).
>
> If we can't draw a relatively small, clear boundary around the areas
> where we're going to provide special support, then I'll probably feel
> better if we don't do anything at all.
>
> | 2)...but we need to take care, that the user should be able to know
> | that the input was "changed" to be processed
>
> I don't really understand this concern. We're only making the changes
> that the user requests. If they don't request any, we won't make any.
>
> 1. User runs unescape markup without asking for special support.
>    a. The escaped markup is WF XML, life is good
>    b. The escaped markup is not WF XML, step fails
>
> 2. User runs unescape markup and *asks for* special support.
>    a. The escaped markup is WF XML, life is good
>    b. The escaped markup is not WF XML, step fixes it so it's WF XML
>
> Likewise,
>
> 3. User runs http-request without asking for special support.
>    a. What comes back is WF XML, life is good
>    b. What comes back is not WF XML, step fails
>
> 4. User runs http-request and *asks for* special support.
>    a. What comes back is WF XML, life is good
>    b. What comes back is not WF XML, step fixes it so it's WF XML

I had imagined that:

   * http-request doesn't have special support for non-XML media types.
   * http-request already supports returning non-XML media types.  If the
     content type is not a text media type, you get base64 encoded content
   * you then need to run unescape-markup on the c:body element returned
     by http-request.

Given that any "application/.." media type will be encoded as base64, we
probably need to add a "encoding" option to unescape-markup so that
we can decode base64 content.   We'll also need a charset parameter as
well as support for the charset parameter on content type.

So, if you had "application/stuff; charset=utf-8" returned and you wanted to
parse that as HTML, you could do:

<p:unescape-markup>
   <p:option name="charset" value="utf-8"/>
   <p:option name="encoding" value="base64"/>
   <p:option name="content-type" value="text/html"/>
</p:unescape-markup>

Similarly, you could pull the charset from the content-type returned by
the http-request in that http-request will return something like:

<c:http-request status="200">
<c:body content-type="application/stuff; charset=utf-8">
...
</c:body>
</c:http-request>

and so you'd do:

<p:unescape-markup>
   <p:option name="charset"
select="substring-after(/c:body/@content-type,'charset=')"/>
   <p:option name="encoding" value="base64"/>
   <p:option name="content-type" value="text/html"/>
</p:unescape-markup>

> I don't want to support anything special on load.
>
> My feeling is, if we support HTML on load, we'll have to support it on
> p:document. If we support HTML on load and document and unescape
> markup and http-request, why not support it on every step? If we
> support not WF XML on every step, how is this an XML pipeline
> language?

I was trying to avoid proliferation of content-type handling into all of our
steps.  One possibility for load is to do what http-request currently does
and return an element with encoded/escaped content.  The you can use
whatever step to process that non-XML into XML.

If we support some options on p:unescape-markup as I've suggested (and
what is in the current draft), then the HTML case is covered in that every
HTML document is a two-step process:

   * p:load followed by a p:unescape-markup
   * p:http-request followed by a p:unescape-markup.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics
Received on Friday, 6 July 2007 14:26:02 UTC