- From: Norman Walsh <Norman.Walsh@Sun.COM>
- Date: Thu, 01 Feb 2007 13:22:59 -0800
- To: public-xml-processing-model-wg@w3.org
- Message-ID: <873b5ptvws.fsf@nwalsh.com>
/ Alex Milowski <alex@milowski.org> was heard to say: |> Another answer, I think, is that components can produce some sort of |> quoting element (I forget what name Alex proposed) like |> |> <p:quoted-content type="text/html"> |> ... |> </p:quoted-content> |> |> If we adopted this, I think I'd want some sort of user option to |> enable it. | | As some kind of pipeline option? No, as a component option: <p:step type="p:xslt" quote-non-xml-resources="yes"> ... </p:step> | The last answer I can think of is that we could try to tidy/tagsoup. |> |> I suppose, if we can't agree on the simplest answer, I'm inclined to |> say we do the quoted conent thing and have a standard component that |> takes a quoted content thing and attempts (through an implementation |> defined mechanism) to turn it into well-formed XML. | | The big snag comes in when we consider the HTTP request. There you | need a way to deal with making requests that aren't a simple XML | post and deal with responses that aren't XML or may not have any | content. Only if we consider the non-XML cases critical for V1. | While I've considered using a "quoted content" element, I haven't really | spent the implementation time to go there. What I have done is is | look at the mime-type or component parameters are run the appropriate | "make this HTML goo XML" component (e.g. TagSoup). I have also But as I said before, there are no standards we can point to for the "make this HTML goo XML" algorithm and I don't want the results to be implementation dependent. | allow you to just ask for the HTTP response codes and the return | the entity body as quoted data. | | I think the cases we need to consider are specific to components we're | contemplating: | | * What happens when an XSLT transformation specifies an output | mode of 'html' or 'text' ? | | * Can you use the parse component to handle HTML content? | | * What does the Load or "Http Request" component do when the mime-type | (or assumed mime type) is not an XML type? Those are three good examples, but I don't want to solve this on a component-by-component basis. We should be consistent. | It would be unfortunate for an implementor not to have an option to | extend the behavior of our core components to allow handling of HTML or | other media types. The standard components have to be interoperable so either we define a precise mechanism for handling these cases or the implementor must write custom components to do the extended behavior (IMHO). | That is, more specifically, if an implementor was required to create a | different component to do "parse HTML into XML when you see HTML" | then authors would be forced to switch to use the non-standard | component in all cases (assumed they wanted to be assured that | the pipeline would succeed). | | On the other hand, if the "Load" or "Http Request" component was | allowed to handle HTML in some implementations then we'd have a | interoperability problem. | | In the end, I'm torn. I'm going to have the "handle HTML" components | in my implementation somehow because I need that feature. I'd love | to have an "optional" feature that falls back to XML parsing. That way | there would be interoperability amongst the implementations who | choose to have that option. | | ...keep in mind that HTML isn't an edge case as there is a lot of it | hanging around that needs to be processed. By XProc, by all implementations, in V1? Be seeing you, norm -- Norman Walsh XML Standards Architect Sun Microsystems, Inc.
Received on Thursday, 1 February 2007 21:55:59 UTC