Re: unescaping markup

On 5/16/07, Norman Walsh <ndw@nwalsh.com> wrote:
> As I understand it, we'd say something like this:
>
> The process of unescaping markup depends on the content-type requested.
> Processors are required to recognize application/xml, application/*+xml,
> and text/html.
>
> For application/xml and application/*+xml, the only operation performed
> is unescaping. If the result is not well-formed, the step must fail.
>
> For text/html, the content is first unescaped and then examined for
> well-formedness. For the purpose of well-formedness checking, the
> elements named "IMG", "BR", "HR", (etc.) are treated as empty.
>
> If the resulting document is not well-formed, the processor applies
> an implementation-dependent process to assure that the result is well
> formed.
>
> For all other content types, it is a dynamic error (XXX) if the
> processor does not support the content type. If the content type is
> supported, then it is unescaped and converted to well-formed XML using
> an implementation-dependent algorithm.

Let's consider a use case I see frequently: parsing an HTML fragment,
and I'd like to transform that fragment into an XHTML fragment (which
is also an XML document). In this case, text/html is not appropriate
as I wouldn't want to have an html/body added around my fragment to
make it valid XHTML. I just want the HTML fragment to be transformed
into XML.

My question was: what content type should I use in this case? With the
above description it seems the answer is: that will depend on the
implementation, and possibly not be possible with some
implementations. That wouldn't be very satisfying.

> Despite what I wrote above, I'm still sympathetic to a simple flag
> myself. We're designing for extensibility that we don't need. Of course,
> that's the nature of extensibility, isn't it?

Indeed!

Alex
-- 
Orbeon Forms - Web 2.0 Forms for the Enterprise
http://www.orbeon.com/

Received on Wednesday, 16 May 2007 15:50:33 UTC