Re: Multiple XML schema files for a common target namespace (PROV-ISSUE-608)

On Tue, Feb 5, 2013 at 7:29 PM, Stephan Zednik <> wrote:
> This does not follow the pattern Stian suggested of updating Document so
> that bundles are required at the bottom of the document.
> Stian, does this make sense?  Do you still prefer the other pattern you
> suggested in the earlier email?

Well to me it does not really matter if xs:any can appear anywhere in
<document> or just at the bottom of the <document> - but I think your
current solution means that you are allowed to put anything anywhere
in <document>, but in <bundle> you can only put the extensions after
<prov:value> but before  the documentelements, which is a bit odd.

It might be 'cleaner' to only allow extension stuff at the bottom, but
that could make it tricky for the bundle as it (now) specializes the
prov:Entity type and therefore the additional elements of Bundle come
below the <xs:any> from entity.

> Also, I think that we put the abstract element after the choice in document
> Elements because it caused problems with schema validation, but I can double
> check on that and see if it can be included in the choice.

I know, those things can get tricky.. it's another problem with XSD
and its particle separation.

I tried some example of making an extension:


Here in <custom.xsd> I was *NOT* able to use
substitutionGroup="prov:abstractElement", because I get:

  Can't include the substitutionGroup as it causes:
  and WC[##other:""] (or elements from their
  group) violate "Unique Particle Attribution".

Basically this means that the only way to use the
substitutionGroup="prov:abstractElement" is to stay within the PROV
namespace.  This might not be obvious to someone looking at our
schema. So I'm having doubts now.

However, the general extension mechanism through xsd:any do work well,
and can validate also my non-prov elements -<custom-example.xml>, even
when I inserted those elements inside <prov:document>.

In <with-extensions.xml> I tried reusing some schemas of the shelf,
XHTML, MathML and DC Terms.  This works fine thanks to xs:any as well.
I was even able to do nested inclusion reusing prov: elements, ie:

       <prov:entity prov:ref="formula"></prov:entity>
            <prov:agent prov:ref="fred"/>
<!-- ... -->

(Those internal prov: elements should probably in most cases NOT be
considered part of the <prov:document> !)

Now you can argue whether this would make sense or not, but that is
the downside of xsd:any - anything (in non-prov namespaces, in this
case) is allowed, not just content that should make sense by
declaration of substitution groups. The more xsd:any - the less you
have a schema and more you just have lots of fragmented types.

However I was unable to reuse namespaces like FOAF, because it does
not have an XSD schema. So sadly this is not allowed:

  <prov:person prov:id="johndoe">
        <foaf:name>John Doe</foaf:name>

I think this is too strict, and I suggest changing the xsd:any of
<prov:entity> and friends to processContent="lax" - this would only
validate against a schema if it's known.

We could rename prov:abstractElement to prov:internal or something to
make it less 'tempting' for external use.

We could in theory get rid of the whole documentElements and use only xs:any:

  <xs:element name="document" type="prov:Document" />
  <xs:complexType name="Document">
  <xs:choice maxOccurs="unbounded">
   <xs:any namespace="##targetNamespace" processContents="strict" />
   <xs:any namespace="##other" processContents="lax" />

And then no substition groups is needed in our PROV extensions, any
declared <xs:element> would be allowed. For consistency I've set
processContent=lax even for content of <prov:document> but we might
want to instead say that it should be strict, to encourage
PROV-extensions (rather than just providing attributes) to at least
declare a schema.

This would mean you could also insert <prov:value> inside
<prov:document> and so we would have to ensure that only "proper"
elements are declared as named <xs:element>.  I tried changing them to
xs:group's and group refs which works fine.

The above is quite tricky to get to work inside a <prov:bundle>
because all its prov elements are optional, and we get a clash between
those and the optional xs:any in the prov namespace.

This is a bit odd anyway because <prov:bundle> plays a dual role with
both being a way to say an entity which is a bundle, but also just
lists its content flatly, and so we can't know if something listed is
part of the bundle or an attribute of the bundle - specially for

Saying something is a bundle could also be done as:


(I am a  bit confused now, as the PROV-XML document says this is how
it should be done)

.. but I know the XML schema has similar 'helpers' for types like
prov:Person and prov:Revision so let's assume we keep the
<prov:bundle> entity.

I then would propose changing the bundle to be:

  <prov:label>A bundle</prov:bundle>
  <dcterms:description>Still not part of the bundle</dcterms:description>
      <!-- the bundle content -->
      <prov:activity />
      <!-- .. -->

(We can argue about the name prov:provenanceDescriptions - I went for
something close to PROV-DM)

So this works fine:

  <xs:complexType name="Bundle">
  <xs:extension base="prov:Entity">
    <xs:element name="provenanceDescriptions" minOccurs="0">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
       <xs:any namespace="##targetNamespace" processContents="strict" />
       <xs:any namespace="##other" processContents="lax" />

Now the xsd:any from prov:Entity does not cause any problems, except
that they have to be stated BEFORE <prov:provenanceDescriptions>. To
change this we would have to do a copy/paste from prov:Entity instead
and move the xsd:any down.

So it's possible, and not that unclean, to get rid of the substitution
groups, but it would allow non-PROV garbage (ie. schema elements which
were not intended as PROV extensions, like my MathML example above)
within <prov:document> and <prov:bundle>.

I don't know what is the groups thoughts on extensions we should allow
for those, but at least it would be consistent with what PROV-N allows
- and then perhaps any PROV-N document could be translatable to
PROV-XML even without knowing the extensions.

If you wish I can commit my version of the schemas which does the
above (but slightly tidied up), either to the tip or a new branch.

Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Tuesday, 12 February 2013 14:30:39 UTC