Re: unescaping markup from Alex Milowski on 2007-05-08 (public-xml-processing-model-wg@w3.org from May 2007)

From: Alex Milowski <alex@milowski.org>
Date: Tue, 8 May 2007 07:49:07 -0700
To: public-xml-processing-model-wg@w3.org
Message-ID: <28d56ece0705080749yf8f945bma8848f20ce2ea4b8@mail.gmail.com>

On 5/7/07, Norman Walsh <ndw@nwalsh.com> wrote:
>
> Not that I want to sound obsessed or anything, but given that the
> motivation for escaping markup in formats like RSS is that it can't
> be relied upon to be well-formed, what's the point of unescaping
> it in XProc? It'll immediately cause the pipeline to crash.
>
> Should we have a "force-markup-to-be-well-formed" option or something?

I've used unescaping of RSS descriptions to process random RSS
feeds into XHTML representations.  For example, if you want to
run XSLT on an RSS feed you need to pre-process it with an
unescape-markup step.

Now, to ensure that pipeline doesn't fail, you wrap the unescape-markup
step with a try/catch and then have some fallback for those you can't
process.

There are other protocols out there where markup is escaped for
other reasons and the input is expected, by the protocol, to be
well-formed.  If not, that's a bad message.

In theory, the same is true for RSS.  So, for example, you could write
an XProc pipeline that checks whether all the description elements
are correctly escaped XHTML by using unescape-markup and try/catch.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Tuesday, 8 May 2007 14:49:33 UTC