- From: Per Bothner <per@bothner.com>
- Date: Tue, 21 Oct 2003 11:35:45 -0700
- To: www-ql@w3.org
I've been thinking about how to efficiently implement namespace "nodes".
(This is for serialization and get-in-scope-namespace only - I already
have efficient code for namespace resolution when parsing XML, and for
matching QNames.)
Does the following sound like it would work ok? (The last point is the
critical non-obvious one.)
* Each element has a "namespace mapping", which maps prefixes to uris,
and which can be implemented as a hash-table or a vector (propery-list).
(Since the namespace mapping is primarily used for serialization, it
makes more sense to use a space-efficient vector.)
* Once created a namespace mapping is immutable, and so can be shared
between element nodes.
* When parsing an XML document, if an element has no namespace
attributes we re-use the namespace mapping of its parent. If it has
namespace attributes, we create a new namespace mapping which is the
combination of the parent's namespace mapping with the new namespace
attributes.
* When serializing an element, we print all the namespaces in the
element's namespace mapping, except for ones that are redundant because
they have already been serialized in an enclosing element.
* When an element is constructed, its namespace mapping includes all the
"active namespaces nodes" (in the sense of the specification) plus any
of the namespaces in the prologue or predefined that are referenced in
the current element *or* (if this is a direct element constructor) in
any enclosed direct element constructors. (This rule is meant to
minimize the number of distinct namespace mapping we have to create.
The implemengtatin may need to be a little bit clever here.)
* When an element is (conceptually) copied (re-parented), we use its
existing namespace mapping. We do *not* create a new namespace mapping
to incorporate any namespace in the parent.
The last point may cause some slightly surprising behavior. Consider:
let $a := <a xmlns:ns1="NS1"><b/></a>
let $b := $x/b
let $c := <c xmlsns:ns2="NS2">{$b}</c>
let $d := <d xmlsns:ns3="NS3">{$c}</d>
Serializing gives us:
$a -> <a xmlns:ns1="NS1"><b/></a>
$b -> <b xmlns:ns1="NS1"/>
$c -> <c xmlsns:ns2="NS2"><b xmlns:ns1="NS1"/></c>
$c/b -> <b xmlns:ns1="NS1"/>
$d -> <d xmlsns:ns3="NS3"><c xmlsns:ns2="NS2"><b xmlns:ns1="NS1"/></c></d>
$d/c -> <c xmlsns:ns2="NS2"><b xmlns:ns1="NS1"/></c>
$d/c/b -> <b xmlns:ns1="NS1"/>
I.e. $b "inherits" ns1 from <a> and keeps it even "removed" from <a>,
but it does not "inherit" ns2 from <c> in the same way. This may
be counter-intuitive.
get-in-scope-namespaces($a) -> "ns1"
get-in-scope-namespaces($b) -> "ns1"
get-in-scope-namespaces($c) -> "ns2", "ns1"
get-in-scope-namespaces($c/b) -> "ns1"
get-in-scope-namespaces($d) -> "ns3", "ns2", "ns1"
get-in-scope-namespaces($d/c) -> "ns2", "ns1"
get-in-scope-namespaces($d/c/b) -> "ns1"
$b is $c/b -> false
deep-equals($b, $c/b) -> true
I think this produces correct and reasonable output for a modest
implementation price, but perhaps I'm missing something.
--
--Per Bothner
per@bothner.com http://per.bothner.com/
Received on Tuesday, 21 October 2003 14:41:21 UTC