Re: p:http-request from Alex Milowski on 2007-04-30 (public-xml-processing-model-wg@w3.org from April 2007)

From: Alex Milowski <alex@milowski.org>
Date: Mon, 30 Apr 2007 08:24:24 -0700
To: public-xml-processing-model-wg@w3.org
Message-ID: <28d56ece0704300824m209d4f4p57c06fb3f267e2ce@mail.gmail.com>

On 4/30/07, Norman Walsh <ndw@nwalsh.com> wrote:
>
> The encoding of the response from an http-request is described as:
>
>   Any content returned that has an XML mime type is returned as a
>   child of the 'body' element. If the response is a text mime type and
>   can be encoded in unicode, the content is encoded as the text
>   children of the body element. If the content is none of these, the
>   response is encoded as base64 data textually represented as the text
>   content of the body element.
>
> How can I tell if the content "can be encoded in Unicode"?

I've actually been thinking about this should work just today... :)

The problem here is that of detecting charsets.   For example,  let's just
assume the simplest case of the "application/xml" media type.

When the response is send, a good service will attach a charset
parameter.  The actual "Content-Type" header will have something like:

   Content-Type: application/xml; charset=UTF-8

Decoding the entity body properly just requires looking at the
charset parameter and assuming UTF-8 as a safe default (which is
recommended these days).

If the entity body is encoded using any standard Unicode encoding,
you'll be able to generate a stream of unicode characters from the
bytes that form the entity body.  Otherwise, it should be treated
as a binary object.

Now, given that you have a sequence of unicode characters, you
can parse anything media type that has "text/xml", "application/xml",
or ends with "+xml" into an XML document.  I had assumed, but it
probably isn't clear, that you'd parse those media types to produce
children of the 'c:body' element.

Instead, we could just present the unicode character sequence
as the text value of the c:body and then pipelines authors have
the option of using the p:unescape-markup step to parse the result.

All of these needs to be clarified in the document.  That's on my list (near
the top now) to do.  So, this is a good time to tell me what you think.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Monday, 30 April 2007 15:24:29 UTC