Re: Multiple XML schema files for a common target namespace (PROV-ISSUE-608)

On Tue, Feb 5, 2013 at 7:29 PM, Stephan Zednik <zednis@rpi.edu> wrote:
> This does not follow the pattern Stian suggested of updating Document so
> that bundles are required at the bottom of the document.
>
> Stian, does this make sense?  Do you still prefer the other pattern you
> suggested in the earlier email?

Well to me it does not really matter if xs:any can appear anywhere in
<document> or just at the bottom of the <document> - but I think your
current solution means that you are allowed to put anything anywhere
in <document>, but in <bundle> you can only put the extensions after
<prov:value> but before  the documentelements, which is a bit odd.

It might be 'cleaner' to only allow extension stuff at the bottom, but
that could make it tricky for the bundle as it (now) specializes the
prov:Entity type and therefore the additional elements of Bundle come
below the <xs:any> from entity.




> Also, I think that we put the abstract element after the choice in document
> Elements because it caused problems with schema validation, but I can double
> check on that and see if it can be included in the choice.

I know, those things can get tricky.. it's another problem with XSD
and its particle separation.


I tried some example of making an extension:

  <https://dvcs.w3.org/hg/prov/file/0bb02b43e80b/xml/examples>

Here in <custom.xsd> I was *NOT* able to use
substitutionGroup="prov:abstractElement", because I get:

  Can't include the substitutionGroup as it causes:
"http://www.w3.org/ns/prov#":abstractElement
  and WC[##other:"http://www.w3.org/ns/prov#"] (or elements from their
substitution
  group) violate "Unique Particle Attribution".


Basically this means that the only way to use the
substitutionGroup="prov:abstractElement" is to stay within the PROV
namespace.  This might not be obvious to someone looking at our
schema. So I'm having doubts now.


However, the general extension mechanism through xsd:any do work well,
and can validate also my non-prov elements -<custom-example.xml>, even
when I inserted those elements inside <prov:document>.


In <with-extensions.xml> I tried reusing some schemas of the shelf,
XHTML, MathML and DC Terms.  This works fine thanks to xs:any as well.
I was even able to do nested inclusion reusing prov: elements, ie:

<prov:document>
   <mathml:annotation-xml>
     <prov:wasAttributedTo>
       <prov:entity prov:ref="formula"></prov:entity>
            <prov:agent prov:ref="fred"/>
            <dcterms:description>blalalla</dcterms:description>
<!-- ... -->

(Those internal prov: elements should probably in most cases NOT be
considered part of the <prov:document> !)

Now you can argue whether this would make sense or not, but that is
the downside of xsd:any - anything (in non-prov namespaces, in this
case) is allowed, not just content that should make sense by
declaration of substitution groups. The more xsd:any - the less you
have a schema and more you just have lots of fragmented types.



However I was unable to reuse namespaces like FOAF, because it does
not have an XSD schema. So sadly this is not allowed:

  <prov:person prov:id="johndoe">
        <foaf:name>John Doe</foaf:name>
  </prov:person>

I think this is too strict, and I suggest changing the xsd:any of
<prov:entity> and friends to processContent="lax" - this would only
validate against a schema if it's known.


We could rename prov:abstractElement to prov:internal or something to
make it less 'tempting' for external use.




We could in theory get rid of the whole documentElements and use only xs:any:


  <xs:element name="document" type="prov:Document" />
  <xs:complexType name="Document">
  <xs:choice maxOccurs="unbounded">
   <xs:any namespace="##targetNamespace" processContents="strict" />
   <xs:any namespace="##other" processContents="lax" />
  </xs:choice>
  </xs:complexType>

And then no substition groups is needed in our PROV extensions, any
declared <xs:element> would be allowed. For consistency I've set
processContent=lax even for content of <prov:document> but we might
want to instead say that it should be strict, to encourage
PROV-extensions (rather than just providing attributes) to at least
declare a schema.


This would mean you could also insert <prov:value> inside
<prov:document> and so we would have to ensure that only "proper"
elements are declared as named <xs:element>.  I tried changing them to
xs:group's and group refs which works fine.



The above is quite tricky to get to work inside a <prov:bundle>
because all its prov elements are optional, and we get a clash between
those and the optional xs:any in the prov namespace.

This is a bit odd anyway because <prov:bundle> plays a dual role with
both being a way to say an entity which is a bundle, but also just
lists its content flatly, and so we can't know if something listed is
part of the bundle or an attribute of the bundle - specially for
extensions.

Saying something is a bundle could also be done as:

<prov:entity>
  <prov:type>prov:Bundle</prov:type>
</prov:entity>

(I am a  bit confused now, as the PROV-XML document says this is how
it should be done)


.. but I know the XML schema has similar 'helpers' for types like
prov:Person and prov:Revision so let's assume we keep the
<prov:bundle> entity.

I then would propose changing the bundle to be:

<prov:bundle>
  <prov:label>A bundle</prov:bundle>
  <dcterms:description>Still not part of the bundle</dcterms:description>
  <prov:provenanceDescriptions>
      <!-- the bundle content -->
      <prov:activity />
      <!-- .. -->
  </prov:provenanceDescriptions>
</prov:bundle>

(We can argue about the name prov:provenanceDescriptions - I went for
something close to PROV-DM)


So this works fine:

  <xs:complexType name="Bundle">
 <xs:complexContent>
  <xs:extension base="prov:Entity">
   <xs:sequence>
    <xs:element name="provenanceDescriptions" minOccurs="0">
     <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
       <xs:any namespace="##targetNamespace" processContents="strict" />
       <xs:any namespace="##other" processContents="lax" />
      </xs:choice>
     </xs:complexType>
    </xs:element>
   </xs:sequence>
  </xs:extension>
 </xs:complexContent>
</xs:complexType>


Now the xsd:any from prov:Entity does not cause any problems, except
that they have to be stated BEFORE <prov:provenanceDescriptions>. To
change this we would have to do a copy/paste from prov:Entity instead
and move the xsd:any down.



So it's possible, and not that unclean, to get rid of the substitution
groups, but it would allow non-PROV garbage (ie. schema elements which
were not intended as PROV extensions, like my MathML example above)
within <prov:document> and <prov:bundle>.

I don't know what is the groups thoughts on extensions we should allow
for those, but at least it would be consistent with what PROV-N allows
- and then perhaps any PROV-N document could be translatable to
PROV-XML even without knowing the extensions.


If you wish I can commit my version of the schemas which does the
above (but slightly tidied up), either to the tip or a new branch.


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Tuesday, 12 February 2013 14:30:39 UTC