Re: unescaping markup

On 5/16/07, Alessandro Vernet <avernet@orbeon.com> wrote:
>
> On 5/16/07, Norman Walsh <ndw@nwalsh.com> wrote:
> > As I understand it, we'd say something like this:
> >
> > The process of unescaping markup depends on the content-type requested.
> > Processors are required to recognize application/xml, application/*+xml,
> > and text/html.
> >
> > For application/xml and application/*+xml, the only operation performed
> > is unescaping. If the result is not well-formed, the step must fail.
> >
> > For text/html, the content is first unescaped and then examined for
> > well-formedness. For the purpose of well-formedness checking, the
> > elements named "IMG", "BR", "HR", (etc.) are treated as empty.
> >
> > If the resulting document is not well-formed, the processor applies
> > an implementation-dependent process to assure that the result is well
> > formed.
> >
> > For all other content types, it is a dynamic error (XXX) if the
> > processor does not support the content type. If the content type is
> > supported, then it is unescaped and converted to well-formed XML using
> > an implementation-dependent algorithm.
>
> Let's consider a use case I see frequently: parsing an HTML fragment,
> and I'd like to transform that fragment into an XHTML fragment (which
> is also an XML document). In this case, text/html is not appropriate
> as I wouldn't want to have an html/body added around my fragment to
> make it valid XHTML. I just want the HTML fragment to be transformed
> into XML.

The RFC for the text/html media type does not say that:

   http://www.ietf.org/rfc/rfc2854.txt

The HTML standard certainly describes HTML document as starting
with the 'html' element.

We're not asserting HTML validity but just a certain set of known elements.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Wednesday, 16 May 2007 17:34:37 UTC