- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Tue, 12 Feb 2013 14:29:51 +0000
- To: Stephan Zednik <zednis@rpi.edu>
- Cc: W3C provenance WG <public-prov-wg@w3.org>
On Tue, Feb 5, 2013 at 7:29 PM, Stephan Zednik <zednis@rpi.edu> wrote:
> This does not follow the pattern Stian suggested of updating Document so
> that bundles are required at the bottom of the document.
>
> Stian, does this make sense? Do you still prefer the other pattern you
> suggested in the earlier email?
Well to me it does not really matter if xs:any can appear anywhere in
<document> or just at the bottom of the <document> - but I think your
current solution means that you are allowed to put anything anywhere
in <document>, but in <bundle> you can only put the extensions after
<prov:value> but before the documentelements, which is a bit odd.
It might be 'cleaner' to only allow extension stuff at the bottom, but
that could make it tricky for the bundle as it (now) specializes the
prov:Entity type and therefore the additional elements of Bundle come
below the <xs:any> from entity.
> Also, I think that we put the abstract element after the choice in document
> Elements because it caused problems with schema validation, but I can double
> check on that and see if it can be included in the choice.
I know, those things can get tricky.. it's another problem with XSD
and its particle separation.
I tried some example of making an extension:
<https://dvcs.w3.org/hg/prov/file/0bb02b43e80b/xml/examples>
Here in <custom.xsd> I was *NOT* able to use
substitutionGroup="prov:abstractElement", because I get:
Can't include the substitutionGroup as it causes:
"http://www.w3.org/ns/prov#":abstractElement
and WC[##other:"http://www.w3.org/ns/prov#"] (or elements from their
substitution
group) violate "Unique Particle Attribution".
Basically this means that the only way to use the
substitutionGroup="prov:abstractElement" is to stay within the PROV
namespace. This might not be obvious to someone looking at our
schema. So I'm having doubts now.
However, the general extension mechanism through xsd:any do work well,
and can validate also my non-prov elements -<custom-example.xml>, even
when I inserted those elements inside <prov:document>.
In <with-extensions.xml> I tried reusing some schemas of the shelf,
XHTML, MathML and DC Terms. This works fine thanks to xs:any as well.
I was even able to do nested inclusion reusing prov: elements, ie:
<prov:document>
<mathml:annotation-xml>
<prov:wasAttributedTo>
<prov:entity prov:ref="formula"></prov:entity>
<prov:agent prov:ref="fred"/>
<dcterms:description>blalalla</dcterms:description>
<!-- ... -->
(Those internal prov: elements should probably in most cases NOT be
considered part of the <prov:document> !)
Now you can argue whether this would make sense or not, but that is
the downside of xsd:any - anything (in non-prov namespaces, in this
case) is allowed, not just content that should make sense by
declaration of substitution groups. The more xsd:any - the less you
have a schema and more you just have lots of fragmented types.
However I was unable to reuse namespaces like FOAF, because it does
not have an XSD schema. So sadly this is not allowed:
<prov:person prov:id="johndoe">
<foaf:name>John Doe</foaf:name>
</prov:person>
I think this is too strict, and I suggest changing the xsd:any of
<prov:entity> and friends to processContent="lax" - this would only
validate against a schema if it's known.
We could rename prov:abstractElement to prov:internal or something to
make it less 'tempting' for external use.
We could in theory get rid of the whole documentElements and use only xs:any:
<xs:element name="document" type="prov:Document" />
<xs:complexType name="Document">
<xs:choice maxOccurs="unbounded">
<xs:any namespace="##targetNamespace" processContents="strict" />
<xs:any namespace="##other" processContents="lax" />
</xs:choice>
</xs:complexType>
And then no substition groups is needed in our PROV extensions, any
declared <xs:element> would be allowed. For consistency I've set
processContent=lax even for content of <prov:document> but we might
want to instead say that it should be strict, to encourage
PROV-extensions (rather than just providing attributes) to at least
declare a schema.
This would mean you could also insert <prov:value> inside
<prov:document> and so we would have to ensure that only "proper"
elements are declared as named <xs:element>. I tried changing them to
xs:group's and group refs which works fine.
The above is quite tricky to get to work inside a <prov:bundle>
because all its prov elements are optional, and we get a clash between
those and the optional xs:any in the prov namespace.
This is a bit odd anyway because <prov:bundle> plays a dual role with
both being a way to say an entity which is a bundle, but also just
lists its content flatly, and so we can't know if something listed is
part of the bundle or an attribute of the bundle - specially for
extensions.
Saying something is a bundle could also be done as:
<prov:entity>
<prov:type>prov:Bundle</prov:type>
</prov:entity>
(I am a bit confused now, as the PROV-XML document says this is how
it should be done)
.. but I know the XML schema has similar 'helpers' for types like
prov:Person and prov:Revision so let's assume we keep the
<prov:bundle> entity.
I then would propose changing the bundle to be:
<prov:bundle>
<prov:label>A bundle</prov:bundle>
<dcterms:description>Still not part of the bundle</dcterms:description>
<prov:provenanceDescriptions>
<!-- the bundle content -->
<prov:activity />
<!-- .. -->
</prov:provenanceDescriptions>
</prov:bundle>
(We can argue about the name prov:provenanceDescriptions - I went for
something close to PROV-DM)
So this works fine:
<xs:complexType name="Bundle">
<xs:complexContent>
<xs:extension base="prov:Entity">
<xs:sequence>
<xs:element name="provenanceDescriptions" minOccurs="0">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:any namespace="##targetNamespace" processContents="strict" />
<xs:any namespace="##other" processContents="lax" />
</xs:choice>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Now the xsd:any from prov:Entity does not cause any problems, except
that they have to be stated BEFORE <prov:provenanceDescriptions>. To
change this we would have to do a copy/paste from prov:Entity instead
and move the xsd:any down.
So it's possible, and not that unclean, to get rid of the substitution
groups, but it would allow non-PROV garbage (ie. schema elements which
were not intended as PROV extensions, like my MathML example above)
within <prov:document> and <prov:bundle>.
I don't know what is the groups thoughts on extensions we should allow
for those, but at least it would be consistent with what PROV-N allows
- and then perhaps any PROV-N document could be translatable to
PROV-XML even without knowing the extensions.
If you wish I can commit my version of the schemas which does the
above (but slightly tidied up), either to the tip or a new branch.
--
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Tuesday, 12 February 2013 14:30:39 UTC