override
problemsOur goal is to construct a tree of include and overide connections with no duplicate leaves. However we will speak as if we're dealing with a graph, because the fact that we're building a tree is something that eventually will need to be proved, not just assumed/asserted. The nodes of the graph (which we'll call G) are schema documents, identified by their base URIs, which we take to be the URIs of the schema documents as retrieved after all redirection has occurred.
Notation:
.sd
is the
schema document of node n, n.uri
is its
base URI and n.markers
is its set of markers (see
below). Node(u)
is a new node of G
composed of a newly-retrieved schema document whose base URI is u
and an empty marker set.e, f, ... will be variables over edges.
Edge(n,m,l)
is a new edge
of G from
node n to node m with label l. e.from
, e.to
and e.label
access the
start node, end node and label of an edge, respectively. n.out
and
n.in
are the sets of edges from/to node n, respectively.
include
onlyThis algorithm reconstructs XSV's handling of
re-entrant and circular includes
.
= Node(root)
to G = {r}
and a set of
nodes already processed P = {}
.uri
=
n.uri
and m.markers
=
n.markers
, then
.to =
n, remove e from Ginclude
EII or import
EII with a
schemaLocation
attribute (call that EII i) in n.sd.[children]
=
Node(
i/@schemaLocation)
to Ginclude
, add all the members of n.markers
to m.markers
include
and its inclusion of m.sd
is
a chameleon inclusion, then add
tns(
n.sd)
to m.markers
Edge(
n,
m,"include" or
"import" as appropriate)
to GThere is slightly more mechanism here than is needed (edge labels are not used, 6.b.ii and 6.b.iii are mutually exclusive and the marker set of a node is always either empty or consists of a single URI). This is to prepare the way for handling override as well.
On the other hand, the treatment of import
is
oversimplified, and redefine
is not mentioned at all.
I think it's reasonably obvious that the result of the above is a tree rooted at r and that the leaves are unique. Representation checking and schema construction can then proceed bottom-up with respect to that tree. That doesn't mean that a given schema document will only be processed once: if is chameleon-included with multiple including namespaces, or is both chameleon and non-chameleon included, it will occur at multiple leaves (with different values for its markers) and be processed multiple times, but necessarily so, and with different results in each case.
(In XSV (vintage Schema 1.0), the actual algorithm dispenses with
Q, and simply recurses at 6.b.iv. This means that once it gets to
the end of the include [children]
in step 6, it can check and process
the rest of the [children]
and add the corresponding components
to the schema component for the appropriate namespace, and no spurious
duplicates will ever by encountered.)