- From: Alex Milowski <alex@milowski.org>
- Date: Tue, 3 Jul 2007 12:24:42 -0700
- To: public-xml-processing-model-wg@w3.org
On 7/3/07, Norman Walsh <ndw@nwalsh.com> wrote: > / Alex Milowski <alex@milowski.org> was heard to say: > | At the end of our e-mail discussion in May I suggested we have a separate > | step for parsing HTML. I still think this is a good idea. Anyone else? > > So this is the equivalent of "tidy" not the equivalent of "tagsoup", > right? > > I guess I'm ok with this, but I wonder if we'll need a > vocabulary-agnostic cleanup step too. Maybe not. > > I guess the next step is to propose a specific step with a description > and the options you think it needs. In proofing the steps, we already have this for p:unescape-markup: "If the 'content-type' option is specified, an implementation can use a different parser to produce XML content. Such a behavior is implementation defined. For example, for the mime type 'text/html', an implementation might provide an HTML to XHTML parser (e.g. Tidy)." That means if you want to parse HTML into XHTML, you just set the 'content-type' to 'text/html' on a p:escape-markup and hope for the best. With this as the status quo we could: 1. Remove the 'content-type' option and create a new step type. 2. Specify some kind of text/html processing for p:unescape-markup. I don't think we want to do both. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics
Received on Tuesday, 3 July 2007 19:24:54 UTC