- From: Norman Walsh <ndw@nwalsh.com>
- Date: Wed, 23 May 2007 10:12:50 -0400
- To: public-xml-processing-model-wg@w3.org
- Message-ID: <873b1n7hnh.fsf@nwalsh.com>
/ Alex Milowski <alex@milowski.org> was heard to say: | On 5/23/07, Norman Walsh <ndw@nwalsh.com> wrote: | |> Indeed. The more places we need it, the more I feel we should keep it |> dead simple. I'm now feeling more strongly in favor of just having a |> boolean option to do the cleanup. If we need more control in V2, we can |> add new options. | | I want to be clear that I'm advocating cleanup in the case of HTML documents | or chunks of HTML documents. Anything that is an XML media type should | be considered XML and parsed as such. There must be no "cleanup" of | XML. I understood that. | We could specialize an option such as: | | * "parse-as-html" with a value of "yes" and "no" | | since HTML is often malformed and we need to convert it to XHTML and there | are many HTML->XHTML parsers that do cleanup as well (tidy & tagsoup), we | could roll that functionality into one option. What constitutes "html" in unescaped markup? Is this HTML: <p:unescape-markup> <p:input port="source"> <p:inline> <foo> <book><title>Book title</book> </foo> </p:inline> </p:input> </p:unescape-markup> I think the "cleanup" option has to operate on whatever it gets without attempting to determine if the thing it got was or was not HTML. | If you don't support HTML parsing, you get a dynamic error. | | If you don't support malformed HTML handling, you get a dynamic error. If supporting handling malformed HTML is optional then I want the entire step to be optional. I don't want non-interoperable required steps. | In some cases (e.g. p:load and p:http-request), you may get a media type | from the resource that tells you that it is HTML. As such, we could say that | if you support HTML parsing, you should do that. Otherwise, you get the | same result for non-XML media types. As such, we could get away | with no "parse-as-html" option on those steps. For file: URIs, I don't think p:load can be relied up on to give you a media type. Be seeing you, norm -- Norman Walsh <ndw@nwalsh.com> | If you settle for what they're giving http://nwalsh.com/ | you, you deserve what you get.
Received on Wednesday, 23 May 2007 14:13:14 UTC