Re: Multiple XML schema files for a common target namespace (PROV-ISSUE-608)

Hi Stephan,

Response interleaved.

On 12/02/13 20:57, Stephan Zednik wrote:
> A summary of the possible changes based on this discussion.  I am in favor of all three listed changes.
>
> 1) rename prov:abstractElement to prov:internalElement (or similar) to make it clear we do not expect non-PROV extensions to use this element.

It's good.
> 2) add processContents="lax" on all xs:any elements.
What was the problem with the current definition, what does this allow 
us to do?

> 3) change the definition of prov:Bundle to the following (bundleElements name is not final)
>
>    <xs:complexType name="Bundle">
>      <xs:complexContent>
>        <xs:extension base="prov:Entity">
>          <xs:sequence>
>            <xs:element name="bundleElements" minOccurs="0">
>              <xs:complexType>
>                <xs:sequence maxOccurs="unbounded">
>                  <xs:group ref="prov:documentElements"/>
>                  <xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
>                </xs:sequence>
>              </xs:complexType>
>            </xs:element>
>          </xs:sequence>
>        </xs:extension>
>      </xs:complexContent>
>    </xs:complexType>

To me, this does not correspond to prov-dm.
I regard the bundle construct as distinct from the entity construct.


Luc

> With the updated Bundle complexType the PROV-XML serialization for a bundle would look like this
>
> <prov:document
> 	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> 	xmlns:xsd="http://www.w3.org/2001/XMLSchema"
> 	xmlns:ex="http://example.com/ns/ex#"
> 	xmlns:prov="http://www.w3.org/ns/prov#">
>
> 	<prov:person prov:id="bob"/>
>
> 	<ex:label>outside-bundle-label</ex:label>
>
> 	<prov:activity prov:id="a1"/>
>
> 	<prov:bundle prov:id="bundle1">
>
> 		<prov:label>bundle1</prov:label>
> 		<ex:label>label-on-bundle-entity</ex:label>
>
> 		<prov:bundleElements>
> 			
> 			<ex:label>in-bundle-label</ex:label>
> 			
> 			<prov:entity prov:id="ex:report1">
> 				<prov:type xsi:type="xsd:QName">report</prov:type>
> 				<ex:version>1</ex:version>
> 			</prov:entity>
>
> 			<ex:version>1.0.0</ex:version>
>
> 			<prov:wasGeneratedBy>
> 				<prov:entity prov:ref="ex:report1"/>
> 				<prov:activity prov:ref="a1"/>
> 				<prov:time>2012-05-24T10:00:01</prov:time>
> 			</prov:wasGeneratedBy>
> 			
> 			<ex:content>foo</ex:content>
> 			
> 		</prov:bundleElements>
>
> 	</prov:bundle>
>
> </prov:document>
>
> I used elements from the namespace "ex" to show how non-PROV elements can be used within a bundle and as PROV attributes on the bundle entity.
>
> --Stephan
>
> On Feb 12, 2013, at 12:49 PM, Stephan Zednik <zednis@rpi.edu> wrote:
>
>> Comments in-line, last two comments are the most important.
>>
>>
>> On Feb 12, 2013, at 7:29 AM, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk> wrote:
>>
>>> On Tue, Feb 5, 2013 at 7:29 PM, Stephan Zednik <zednis@rpi.edu> wrote:
>>>> This does not follow the pattern Stian suggested of updating Document so
>>>> that bundles are required at the bottom of the document.
>>>>
>>>> Stian, does this make sense?  Do you still prefer the other pattern you
>>>> suggested in the earlier email?
>>> Well to me it does not really matter if xs:any can appear anywhere in
>>> <document> or just at the bottom of the <document> - but I think your
>>> current solution means that you are allowed to put anything anywhere
>>> in <document>, but in <bundle> you can only put the extensions after
>>> <prov:value> but before  the documentelements, which is a bit odd.
>>>
>>> It might be 'cleaner' to only allow extension stuff at the bottom, but
>>> that could make it tricky for the bundle as it (now) specializes the
>>> prov:Entity type and therefore the additional elements of Bundle come
>>> below the <xs:any> from entity.
>>>
>> Yes, originally this worked because we had multiple xs:any in the prov:Bundle (inherited from both prov:Entity and prov:documentElements) but we violated the "unique particle attribution" rule which caused xjc to fail to generate java classes from the schema.
>>
>> We changed the schema to work well with xjc but in doing so introduced the odd restriction you have noted.  I am still playing around with it to try to come up with a solution.
>>
>>>
>>>
>>>> Also, I think that we put the abstract element after the choice in document
>>>> Elements because it caused problems with schema validation, but I can double
>>>> check on that and see if it can be included in the choice.
>>> I know, those things can get tricky.. it's another problem with XSD
>>> and its particle separation.
>>>
>>>
>>> I tried some example of making an extension:
>>>
>>> <https://dvcs.w3.org/hg/prov/file/0bb02b43e80b/xml/examples>
>>>
>>> Here in <custom.xsd> I was *NOT* able to use
>>> substitutionGroup="prov:abstractElement", because I get:
>>>
>>> 		Can't include the substitutionGroup as it causes:
>>> "http://www.w3.org/ns/prov#":abstractElement
>>> 		and WC[##other:"http://www.w3.org/ns/prov#"] (or elements from their
>>> substitution
>>> 		group) violate "Unique Particle Attribution".
>>>
>>>
>>> Basically this means that the only way to use the
>>> substitutionGroup="prov:abstractElement" is to stay within the PROV
>>> namespace.  This might not be obvious to someone looking at our
>>> schema. So I'm having doubts now.
>> We can try to make this more clear in the Note.  The abstractElement is only to be intended to be used with substitionGroups that are in the PROV Namespace.
>>
>>>
>>> However, the general extension mechanism through xsd:any do work well,
>>> and can validate also my non-prov elements -<custom-example.xml>, even
>>> when I inserted those elements inside <prov:document>.
>>>
>>>
>>> In <with-extensions.xml> I tried reusing some schemas of the shelf,
>>> XHTML, MathML and DC Terms.  This works fine thanks to xs:any as well.
>>> I was even able to do nested inclusion reusing prov: elements, ie:
>>>
>>> <prov:document>
>>>   <mathml:annotation-xml>
>>>     <prov:wasAttributedTo>
>>>       <prov:entity prov:ref="formula"></prov:entity>
>>>            <prov:agent prov:ref="fred"/>
>>>            <dcterms:description>blalalla</dcterms:description>
>>> <!-- ... -->
>>>
>>> (Those internal prov: elements should probably in most cases NOT be
>>> considered part of the <prov:document> !)
>>>
>>> Now you can argue whether this would make sense or not, but that is
>>> the downside of xsd:any - anything (in non-prov namespaces, in this
>>> case) is allowed, not just content that should make sense by
>>> declaration of substitution groups. The more xsd:any - the less you
>>> have a schema and more you just have lots of fragmented types.
>>>
>> I think we are very limited in what we can say about how non-PROV extensions integrate with PROV.
>>
>>>
>>> However I was unable to reuse namespaces like FOAF, because it does
>>> not have an XSD schema. So sadly this is not allowed:
>>>
>>> <prov:person prov:id="johndoe">
>>>        <foaf:name>John Doe</foaf:name>
>>> </prov:person>
>>>
>>> I think this is too strict, and I suggest changing the xsd:any of
>>> <prov:entity> and friends to processContent="lax" - this would only
>>> validate against a schema if it's known.
>>
>>> We could rename prov:abstractElement to prov:internal or something to
>>> make it less 'tempting' for external use.
>>>
>> I am ok with this.
>>
>>>
>>>
>>> We could in theory get rid of the whole documentElements and use only xs:any:
>>>
>>>
>>> <xs:element name="document" type="prov:Document" />
>>> <xs:complexType name="Document">
>>> 		<xs:choice maxOccurs="unbounded">
>>> 			<xs:any namespace="##targetNamespace" processContents="strict" />
>>> 			<xs:any namespace="##other" processContents="lax" />
>>> 		</xs:choice>
>>> </xs:complexType>
>>>
>>> And then no substition groups is needed in our PROV extensions, any
>>> declared <xs:element> would be allowed.
>> If I understand this correctly, this would allow PROV attribute elements to be used on the document.
>>
>>> For consistency I've set
>>> processContent=lax even for content of <prov:document> but we might
>>> want to instead say that it should be strict, to encourage
>>> PROV-extensions (rather than just providing attributes) to at least
>>> declare a schema.
>> I agree that PROV extensions should declare a schema.
>>
>>>
>>> This would mean you could also insert <prov:value> inside
>>> <prov:document> and so we would have to ensure that only "proper"
>>> elements are declared as named <xs:element>.  I tried changing them to
>>> xs:group's and group refs which works fine.
>>>
>>>
>>>
>>> The above is quite tricky to get to work inside a <prov:bundle>
>>> because all its prov elements are optional, and we get a clash between
>>> those and the optional xs:any in the prov namespace.
>>>
>>> This is a bit odd anyway because <prov:bundle> plays a dual role with
>>> both being a way to say an entity which is a bundle, but also just
>>> lists its content flatly, and so we can't know if something listed is
>>> part of the bundle or an attribute of the bundle - specially for
>>> extensions.
>>>
>>> Saying something is a bundle could also be done as:
>>>
>>> <prov:entity>
>>> <prov:type>prov:Bundle</prov:type>
>>> </prov:entity>
>>>
>>> (I am a  bit confused now, as the PROV-XML document says this is how
>>> it should be done)
>> We made a change to the types some time ago which is reflected in the editors' draft.
>>
>> https://dvcs.w3.org/hg/prov/raw-file/default/xml/prov-xml.html
>>
>> Since Bundles are specializations of Entity prov:Bundle extends prov:Entity.
>>
>>>
>>> .. but I know the XML schema has similar 'helpers' for types like
>>> prov:Person and prov:Revision so let's assume we keep the
>>> <prov:bundle> entity.
>>>
>>> I then would propose changing the bundle to be:
>>>
>>> <prov:bundle>
>>> <prov:label>A bundle</prov:bundle>
>>> <dcterms:description>Still not part of the bundle</dcterms:description>
>>> <prov:provenanceDescriptions>
>>>      <!-- the bundle content -->
>>>      <prov:activity />
>>>      <!-- .. -->
>>> </prov:provenanceDescriptions>
>>> </prov:bundle>
>>>
>> I like this.
>>
>>> (We can argue about the name prov:provenanceDescriptions - I went for
>>> something close to PROV-DM)
>>>
>>>
>>> So this works fine:
>>>
>>> <xs:complexType name="Bundle">
>>> 	<xs:complexContent>
>>> 		<xs:extension base="prov:Entity">
>>> 			<xs:sequence>
>>> 				<xs:element name="provenanceDescriptions" minOccurs="0">
>>> 					<xs:complexType>
>>> 						<xs:choice minOccurs="0" maxOccurs="unbounded">
>>> 							<xs:any namespace="##targetNamespace" processContents="strict" />
>>> 							<xs:any namespace="##other" processContents="lax" />
>>> 						</xs:choice>
>>> 					</xs:complexType>
>>> 				</xs:element>
>>> 			</xs:sequence>
>>> 		</xs:extension>
>>> 	</xs:complexContent>
>>> </xs:complexType>
>>>
>>>
>>> Now the xsd:any from prov:Entity does not cause any problems, except
>>> that they have to be stated BEFORE <prov:provenanceDescriptions>. To
>>> change this we would have to do a copy/paste from prov:Entity instead
>>> and move the xsd:any down.
>> I am OK with this.
>>
>> What does the group think?
>>
>>>
>>>
>>> So it's possible, and not that unclean, to get rid of the substitution
>>> groups, but it would allow non-PROV garbage (ie. schema elements which
>>> were not intended as PROV extensions, like my MathML example above)
>>> within <prov:document> and <prov:bundle>.
>>>
>>> I don't know what is the groups thoughts on extensions we should allow
>>> for those, but at least it would be consistent with what PROV-N allows
>>> - and then perhaps any PROV-N document could be translatable to
>>> PROV-XML even without knowing the extensions.
>>>
>> I am ok with the substitution groups as they are.
>>
>> If you can present a desirable use case that is disallowed by the current modeling with substitution groups and supported by an alternate modeling than I will consider it.  I don't want to make a late change without an example use case to consider.
>>
>> --Stephan
>>
>>> If you wish I can commit my version of the schemas which does the
>>> above (but slightly tidied up), either to the tip or a new branch.
>>>
>>>
>>> -- 
>>> Stian Soiland-Reyes, myGrid team
>>> School of Computer Science
>>> The University of Manchester
>>>
>>
>>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Tuesday, 12 February 2013 21:10:00 UTC