Re: Namespace Fixup Proposal from Henry S. Thompson on 2007-09-06 (public-xml-processing-model-wg@w3.org from September 2007)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Thu, 06 Sep 2007 15:20:42 +0100
To: Norman Walsh <ndw@nwalsh.com>
Cc: public-xml-processing-model-wg@w3.org
Message-ID: <f5b4pi7j2o5.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Norman Walsh writes:

> HST wrote:

> | Yes, but it still won't necessarily serialise without work, and it's
> | possible that serialising will introduce failure to round-trip.
> | Suppose the matrix has an ns-attribute for the default namespace, but
> | the included bit consists entirely of no-namespace elts.  The
> | serialised result will be borked.  To detect this, you have to look at
> | every node in the inserted tree.
>
> I suppose the default namespace *is* a special case. But I don't think
> that's a problem.
>
> Here's a document:
>
>   <rootelem xmlns="rootns">
>     <div xmlns="xhtml">
>       <target/>
>     </div>
>   </rootelem>
>
> Suppose I want to replace target with some subtree. When do I ever have
> to look at the subtree's descendants?
>
> To insert
>
>   <x:otherroot xmlns:x="xxxns">
>     <nons/>
>   </x:otherroot>
>
> I simply make sure that if there's a default namespace where 'target'
> appears, I undeclare it. Everything else "just works". No?

Yes.

Now consider this case:

<p:rename match="my:foo" new-name="foo" xmlns:my="http://www.example.com/ns"/>

when the imput is

 <foo xmlns="http://www.example.com/ns">
  <baz>...</baz>
  <baz>...</baz>
  <baz>...</baz>
  <baz>...</baz>
  <baz>...</baz>
  <baz>...</baz>
  <baz>...</baz>
  <baz>...</baz>
 </foo>

Fixup in this case will have to not only rename the element from {},
but also remove the xmlns [namespace attribute] from the <foo> elt and
push it down on to _all_ the <baz> elements.

Namespace fixup is _full_ of these silly fiddly messy corner cases,
and I think we will not be thanked by implementors if we make them do
it at every step.  I particularly _don't_ want to get into the
business of trying to specify in detail what checks and fixes each
step which _might_ mess things up must do.  I think putting the
requirement on serialization, at the margins, is going to be much
simpler to state, understand and implement.

> |> Allowing un-fixed-up markup to flow between steps lets it get deeply
> |> burried in documents through operations that wouldn't normally cause
> |> fixup to be necessary.
> |
> | I don't understand.
>
> My point is just that there are a few steps that allow namespaces to
> get out of wack. If we don't mandate that namespaces are fixed up *on
> those steps*, then *every* step can produce documents that have broken
> namespaces. That just seems awful.

As you point out, the crucial bits will _never_ get screwed up.  That
is, the [local name]s and [namespace name]s of elements and attributes
themselves.  That means, right there, that we've covered the 99%
case.  Getting the [namespace attribute], [in-scope namespaces] and
[prefix] properties right is on the one hand _much_ harder, and on the
other _much_ less important, until and unless you get to serialization.

> |> On a separate, but related, topic, I'm confused about how the SAX
> |> argument plays out. Why is it hard to do this fixup with SAX? When do
> |> you ever have to buffer more than one start element event?
> |
> | SAX filters just pass along what you give them.  If we require NS fixup
> | between steps, everyone using a SAX substrate will have to put an NS
> | fixup filter _every_ pair of steps, won't they?
>
> I don't think so. It just means that, *in steps where namespaces can
> get broken*, *the step* will have to make sure that it doesn't output
> broken elements. But it'll never have to buffer more than one start
> tag to do that, I think.

See above example.

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFG4Ay6kjnJixAXWBoRAqGyAJ0TjHuL4Y4VHarc2sA902dZ8nDGHwCfcrhg
ytP/brZLEph5LztLlPWziNg=
=sdm4
-----END PGP SIGNATURE-----
Received on Thursday, 6 September 2007 14:21:16 UTC