RE: Charsets, encodings, http-request, unescape-markup, and convenience, oh my! from vojtech.toman@emc.com on 2011-10-10 (public-xml-processing-model-comments@w3.org from October 2011)

From: <vojtech.toman@emc.com>
Date: Mon, 10 Oct 2011 07:35:49 -0400
To: <public-xml-processing-model-comments@w3.org>
Message-ID: <3799D0FD120AD940B731A37E36DAF3FE33DAE9F804@MX20A.corp.emc.com>

>   In my humble opinion, I think those problems wouldn't happen if HTML
> content was parsed as a document node directly by the http-request
> step.  The step can access the HTTP response context (including the
> charset if any) and parse the HTML content directly into a document
> node, e.g. following the same rules as in escape-markup.  Or did I
> miss something?

I guess we could give HTML extra significance in p:http-request (similar to application/xml) and make the step behave as p:unescape-markup for HTML responses... But my personal feeling is that the less magic happens in p:http-request the better. I think that p:http-request should really only give you the 'raw' data that came with the response. If you want to treat the response data as HTML, you can apply p:unescape-markup to it. But if you want to treat the (HTML) response data as a sequence of bytes, you should still be able to do that.

Regards,
Vojtech

--
Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group
vojtech.toman@emc.com
http://developer.emc.com/xmltech

Received on Monday, 10 October 2011 11:36:28 UTC