- From: Alex Milowski <alex@milowski.org>
- Date: Mon, 30 Apr 2007 08:24:24 -0700
- To: public-xml-processing-model-wg@w3.org
- Message-ID: <28d56ece0704300824m209d4f4p57c06fb3f267e2ce@mail.gmail.com>
On 4/30/07, Norman Walsh <ndw@nwalsh.com> wrote: > > The encoding of the response from an http-request is described as: > > Any content returned that has an XML mime type is returned as a > child of the 'body' element. If the response is a text mime type and > can be encoded in unicode, the content is encoded as the text > children of the body element. If the content is none of these, the > response is encoded as base64 data textually represented as the text > content of the body element. > > How can I tell if the content "can be encoded in Unicode"? I've actually been thinking about this should work just today... :) The problem here is that of detecting charsets. For example, let's just assume the simplest case of the "application/xml" media type. When the response is send, a good service will attach a charset parameter. The actual "Content-Type" header will have something like: Content-Type: application/xml; charset=UTF-8 Decoding the entity body properly just requires looking at the charset parameter and assuming UTF-8 as a safe default (which is recommended these days). If the entity body is encoded using any standard Unicode encoding, you'll be able to generate a stream of unicode characters from the bytes that form the entity body. Otherwise, it should be treated as a binary object. Now, given that you have a sequence of unicode characters, you can parse anything media type that has "text/xml", "application/xml", or ends with "+xml" into an XML document. I had assumed, but it probably isn't clear, that you'd parse those media types to produce children of the 'c:body' element. Instead, we could just present the unicode character sequence as the text value of the c:body and then pipelines authors have the option of using the p:unescape-markup step to parse the result. All of these needs to be clarified in the document. That's on my list (near the top now) to do. So, this is a good time to tell me what you think. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics
Received on Monday, 30 April 2007 15:24:29 UTC