- From: Alex Milowski <alex@milowski.org>
- Date: Fri, 6 Jul 2007 07:25:49 -0700
- To: public-xml-processing-model-wg@w3.org
On 7/6/07, Norman Walsh <ndw@nwalsh.com> wrote:
> / Innovimax SARL <innovimax@gmail.com> was heard to say:
> | I think Murray is right in many sense
> |
> | 1) we need to be consistent on the use of "load-like" component
>
> I think our consistent story should be "use XML". There are, at the
> moment, only two places where it seems to me that we might want to
> make special rules for supporting HTML content: in http-request
> (because the pipeline author has essentially no control over what
> comes back) and in unescape-markup (because it's broken by design).
>
> If we can't draw a relatively small, clear boundary around the areas
> where we're going to provide special support, then I'll probably feel
> better if we don't do anything at all.
>
> | 2)...but we need to take care, that the user should be able to know
> | that the input was "changed" to be processed
>
> I don't really understand this concern. We're only making the changes
> that the user requests. If they don't request any, we won't make any.
>
> 1. User runs unescape markup without asking for special support.
> a. The escaped markup is WF XML, life is good
> b. The escaped markup is not WF XML, step fails
>
> 2. User runs unescape markup and *asks for* special support.
> a. The escaped markup is WF XML, life is good
> b. The escaped markup is not WF XML, step fixes it so it's WF XML
>
> Likewise,
>
> 3. User runs http-request without asking for special support.
> a. What comes back is WF XML, life is good
> b. What comes back is not WF XML, step fails
>
> 4. User runs http-request and *asks for* special support.
> a. What comes back is WF XML, life is good
> b. What comes back is not WF XML, step fixes it so it's WF XML
I had imagined that:
* http-request doesn't have special support for non-XML media types.
* http-request already supports returning non-XML media types. If the
content type is not a text media type, you get base64 encoded content
* you then need to run unescape-markup on the c:body element returned
by http-request.
Given that any "application/.." media type will be encoded as base64, we
probably need to add a "encoding" option to unescape-markup so that
we can decode base64 content. We'll also need a charset parameter as
well as support for the charset parameter on content type.
So, if you had "application/stuff; charset=utf-8" returned and you wanted to
parse that as HTML, you could do:
<p:unescape-markup>
<p:option name="charset" value="utf-8"/>
<p:option name="encoding" value="base64"/>
<p:option name="content-type" value="text/html"/>
</p:unescape-markup>
Similarly, you could pull the charset from the content-type returned by
the http-request in that http-request will return something like:
<c:http-request status="200">
<c:body content-type="application/stuff; charset=utf-8">
...
</c:body>
</c:http-request>
and so you'd do:
<p:unescape-markup>
<p:option name="charset"
select="substring-after(/c:body/@content-type,'charset=')"/>
<p:option name="encoding" value="base64"/>
<p:option name="content-type" value="text/html"/>
</p:unescape-markup>
> I don't want to support anything special on load.
>
> My feeling is, if we support HTML on load, we'll have to support it on
> p:document. If we support HTML on load and document and unescape
> markup and http-request, why not support it on every step? If we
> support not WF XML on every step, how is this an XML pipeline
> language?
I was trying to avoid proliferation of content-type handling into all of our
steps. One possibility for load is to do what http-request currently does
and return an element with encoded/escaped content. The you can use
whatever step to process that non-XML into XML.
If we support some options on p:unescape-markup as I've suggested (and
what is in the current draft), then the HTML case is covered in that every
HTML document is a two-step process:
* p:load followed by a p:unescape-markup
* p:http-request followed by a p:unescape-markup.
--
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."
Bertrand Russell in a footnote of Principles of Mathematics
Received on Friday, 6 July 2007 14:26:02 UTC