- From: Alex Milowski <alex@milowski.org>
- Date: Mon, 14 May 2007 07:16:17 -0700
- To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
- Message-ID: <28d56ece0705140716y49dca463xfd4f85d107801d85@mail.gmail.com>
On 5/14/07, Alessandro Vernet <avernet@orbeon.com> wrote: > > > On 5/8/07, Alex Milowski <alex@milowski.org> wrote: > > In theory, the same is true for RSS. So, for example, you could write > > an XProc pipeline that checks whether all the description elements > > are correctly escaped XHTML by using unescape-markup and try/catch. > > In theory. But like they say, in practice theory often doesn't hold. I > am trying to think of cases where I am parsing escaped XHTML embedded > in XML. In most cases, I can't assume the XHTML is well-formed, and > have to use something like JTidy/TagSoup. So I agree with Norm: I > think it would be convenient to have the option > "force-markup-to-be-well-formed" right there. Ah... I missed that last bit. Maybe we should have a "content-type" option that would allow you to specify something like "text/html". What happens for HTML would have to be implementation defined because there is no definition of what "make it well-formed" means. I think if you are parsing an XML-typed media, it should be at least well-formed in accordance with the XML 1.0/1.1 specifications. If you specify a non-XML media type, then anything appropriate for that media type can happen. This gives implementors the option of using unregister media types like: "application/x-random-junk" or "application/vnd-tidy-html". -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics
Received on Monday, 14 May 2007 14:16:51 UTC