- From: Norman Walsh <ndw@nwalsh.com>
- Date: Wed, 23 May 2007 10:12:50 -0400
- To: public-xml-processing-model-wg@w3.org
- Message-ID: <873b1n7hnh.fsf@nwalsh.com>
/ Alex Milowski <alex@milowski.org> was heard to say:
| On 5/23/07, Norman Walsh <ndw@nwalsh.com> wrote:
|
|> Indeed. The more places we need it, the more I feel we should keep it
|> dead simple. I'm now feeling more strongly in favor of just having a
|> boolean option to do the cleanup. If we need more control in V2, we can
|> add new options.
|
| I want to be clear that I'm advocating cleanup in the case of HTML documents
| or chunks of HTML documents. Anything that is an XML media type should
| be considered XML and parsed as such. There must be no "cleanup" of
| XML.
I understood that.
| We could specialize an option such as:
|
| * "parse-as-html" with a value of "yes" and "no"
|
| since HTML is often malformed and we need to convert it to XHTML and there
| are many HTML->XHTML parsers that do cleanup as well (tidy & tagsoup), we
| could roll that functionality into one option.
What constitutes "html" in unescaped markup? Is this HTML:
<p:unescape-markup>
<p:input port="source">
<p:inline>
<foo>
<book><title>Book title</book>
</foo>
</p:inline>
</p:input>
</p:unescape-markup>
I think the "cleanup" option has to operate on whatever it gets
without attempting to determine if the thing it got was or was not
HTML.
| If you don't support HTML parsing, you get a dynamic error.
|
| If you don't support malformed HTML handling, you get a dynamic error.
If supporting handling malformed HTML is optional then I want the
entire step to be optional. I don't want non-interoperable required
steps.
| In some cases (e.g. p:load and p:http-request), you may get a media type
| from the resource that tells you that it is HTML. As such, we could say that
| if you support HTML parsing, you should do that. Otherwise, you get the
| same result for non-XML media types. As such, we could get away
| with no "parse-as-html" option on those steps.
For file: URIs, I don't think p:load can be relied up on to give you a
media type.
Be seeing you,
norm
--
Norman Walsh <ndw@nwalsh.com> | If you settle for what they're giving
http://nwalsh.com/ | you, you deserve what you get.
Received on Wednesday, 23 May 2007 14:13:14 UTC