Re: Circular Imports from Henry S. Thompson on 2007-11-15 (public-xml-processing-model-wg@w3.org from November 2007)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Thu, 15 Nov 2007 14:33:46 +0000
To: "Alex Milowski" <alex@milowski.org>
Cc: "XProc WG" <public-xml-processing-model-wg@w3.org>
Message-ID: <f5blk8zzigl.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alex Milowski writes:

> [about circular imports]

We discussed this at the f2f, but that discussion doesn't seem to have
made it into the minutes.

I'll try to reconstruct the salient points.

1) Not only circularity, but re-entrancy, is relevant.  That is, not
   just A imports B imports . . . A, but also A imports B and C, B and
   C both import D.

2) Although multiple passes are required, the number is finite,
   i.e. each pass is for a distinct kind of processing, and each pass
   is linear in the total number of documents involved.  Details
   below.

> I propose we add the following to handle this:
>
>    * In section 5.10 p:import, we need to define the resolved
> location based on the 'href' attribute.  The 'href' attribute must
> be resolved against the base URI of the p:import element.  The
> resulting URI is the "resolved location" of the import.

Hmm, not quite -- I think the "resolved location" is whatever your URI
library comes back with as the location, so that redirects and conneg
are allowed for.

>    * When loading a pipeline, a processor must keep track of the
> resolved location of imports (e.g. a simple stack of URI values) and
> detect circular imports by literal comparison of URI values as
> strings.  If two URI values differ by encoding, the processor must
> treat them as different imports.

A stack isn't enough, that won't catch the re-entrant case.

>    * When a circular import is detected, the processor must provide the
>      same step declarations from the original, earlier import.
>
> What this means is that in the case of a circular import, a
> processor must determine the step declarations provided by a
> pipeline or library before completely loading the importing
> construct.  The good news is a it suffices to provide references to
> the correct step declaration (e.g. one from the earlier import) and
> then, once all parts are loaded successfully, the complete step can
> be determined.

So I think we agreed that this has to be expanded and changed a bit,
along the following lines:

 1) Do a 'minimal load' of every pipeline and pipeline library
    document, by processing all p:imports, recording their resolved
    locations, and declining to load anything twice;

 2) Determine the type names *defined for export* by each top-level
    scope == top-level pipeline or pipeline library == entry in the
    resolved location table and every, as follows (see [1]):

    a) The pipeline name itself, if the scope is a pipeline;
    b) The step types and pipelines defined therein, if the scope is a
       library.

 3) Associate with every pipeline and library the resolved location of
    all its imports, and from any of those which are libraries, the
    resolved locations of all _their_ top-level imports, recursively.
    That is, the transitive closure of p:import, _not_ descending into
    p:pipeline.

 4) Determine the type names visible within each library, as follows:

    a) The step types and pipelines defined locally within the library;
    c) The types _defined for export_ by all the resolved locations
       identified for the library in (3) above.

 5) Determine the type names visible (in addition to the builtins) in
    each pipeline, as follows (see [1]):

    a) The pipeline name itself;
    b) The step types defined locally within the pipeline;
    c) The types _defined for export_ by all the resolved locations
       identified for the pipeline in (3) above;
    d) The type names visible within its enclosing library, if it's in
       one.

This does more work than strictly necessary, and is liable to
optimisation in the absence of circularity, but it's simpler to state
w/o optimisation.

On balance, I think this is too detailed to fit well in the
spec. itself.  I think what we need in the spec., pbly in section 3.2,
is [1] and something like this:

  Circular and re-entrant chains of p:import are not errors, and
  implementations MUST take the necessary steps, based on testing
  string equality of the actual URIs involved (that is, after
  absolutization and redirection, if any), to avoid infinite loops or
  spurious duplicate-definition errors.  Appendix XXX provides
  high-level description of one algorithm which does this.

and then put the above 5-step story in (non-normative) Appendix XXX.

ht

[1] http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2007Oct/0049.html
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFHPFjKkjnJixAXWBoRAojlAJ9RVzBVQTS/ZYT7aCTjQrgYCgJepQCfeGKP
9UOi3A2O395hTGylZIBewGc=
=Bv5G
-----END PGP SIGNATURE-----
Received on Thursday, 15 November 2007 14:33:58 UTC