Re: p:unescape-markup (p:parse)

On 4/26/07, Norman Walsh <ndw@nwalsh.com> wrote:
>
> Given
>
>   For example, with the 'namespace' option set to the XHTML namespace,
>   the following input:
>
>   <description>
>   &lt;p>This is a chunk.&lt;/p>
>   &lt;p>This is a another chunk.&lt;/p>
>   </description>


Why does unescaping markup produce this:
>
>   <description>
>   <p xmlns="http://www.w3.org/1999/xhtml">This is a chunk.</p>
>   <p xmlns="http://www.w3.org/1999/xhtml">This is a another chunk.</p>
>   </description>
>
> and not this:
>
>   <description xmlns="http://www.w3.org/1999/xhtml">
>   <p>This is a chunk.</p>
>   <p>This is a another chunk.</p>
>   </description>



The escaped markup is:

  <p>This is a chunk.</p>
  <p>This is a another chunk.</p>

To parse this correctly, you must wrap it in something:

  <root>
    <p>This is a chunk.</p>
    <p>This is a another chunk.</p>
  </root>

and then throw away 'root'.

Given that the namespace option is specified, the wrapper should look like:

   <root xmlns="http://www.w3.org/1999/xhtml">
    <p>This is a chunk.</p>
    <p>This is a another chunk.</p>
  </root>

That generates two elements named 'p' in the XHTML namespace
with no associated prefix in the infoset.  The parent element that had
the default namespace declaration is gone, so any self-respecting
serialization engine does a bit of namespace "fix up" and realizes
that you can just add a default namespace declaration on the 'p'
elements.

Certainly a human would have put the namespace declaration on the
'description'
element if they could, but in this example, the 'description' element's
namespace really
isn't declared.  If the 'description' element is in no namespace, then you
*can't* put it on the 'description' element.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Sunday, 29 April 2007 20:51:46 UTC