Re: Multiple XML schema files for a common target namespace (PROV-ISSUE-608) from Luc Moreau on 2013-02-12 (public-prov-wg@w3.org from February 2013)

From: Luc Moreau <l.moreau@ecs.soton.ac.uk>
Date: Tue, 12 Feb 2013 22:14:55 +0000
To: Stephan Zednik <zednis@rpi.edu>
CC: public-prov-wg@w3.org
Message-ID: <EMEW3|888d5531cbf243a35049d0958439b2abp1BMF508l.moreau|ecs.soton.ac.uk|511ABEDF>
Hi Stephan,

Thanks for the explanation on lax. Yes this seems reasonable.

In your new propose schema, the bundleElements element correspond to the 
bundle construct
in prov-n.  The difference is that bundleElements are allowed inside 
entity, whereas the prov-n
bundle construct is only allowed at the toplevel of a document.

One strong requirement of part of the WG membership was to avoid nesting 
of bundles.
With this, you have introduced nesting of bundles.
An entity containing a bundleElements occurring inside another 
bundleElements.

I think it's a significant departure from the dm.

Also, personally, I find it useful to be able to return bundles, as a 
response to a provenance query.
With the proposed schema change, they would now be nested inside an 
entity.  Why this extra level of
nesting?

So given the above, I am not supportive of the change.

Luc

On 12/02/13 21:54, Stephan Zednik wrote:
>
> On Feb 12, 2013, at 2:09 PM, Luc Moreau <l.moreau@ecs.soton.ac.uk 
> <mailto:l.moreau@ecs.soton.ac.uk>> wrote:
>
>> Hi Stephan,
>>
>> Response interleaved.
>>
>> On 12/02/13 20:57, Stephan Zednik wrote:
>>> A summary of the possible changes based on this discussion.  I am in 
>>> favor of all three listed changes.
>>>
>>> 1) rename prov:abstractElement to prov:internalElement (or similar) 
>>> to make it clear we do not expect non-PROV extensions to use this 
>>> element.
>>
>> It's good.
>>> 2) add processContents="lax" on all xs:any elements.
>> What was the problem with the current definition, what does this 
>> allow us to do?
>
> If a non-PROV namespace does not have a corresponding schema then the 
> document will fail to validate.
>
> processContents 	Optional. Specifies how the XML processor should 
> handle validation against the elements specified by this any element. 
> Can be set to one of the following:
>
>   * strict - the XML processor must obtain the schema for the required
>     namespaces and validate the elements (this is default)
>   * lax - same as strict but; if the schema cannot be obtained, no
>     errors will occur
>   * skip - The XML processor does not attempt to validate any elements
>     from the specified namespaces
>
>
>
> This loosens our validation requirements for non-PROV elements.
>
> Stian's use case example was to use some FOAF elements but validation 
> failed because he had not specified a FOAF schema.
>
>>
>>> 3) change the definition of prov:Bundle to the following 
>>> (bundleElements name is not final)
>>>
>>>   <xs:complexType name="Bundle">
>>>     <xs:complexContent>
>>>       <xs:extension base="prov:Entity">
>>>         <xs:sequence>
>>>           <xs:element name="bundleElements" minOccurs="0">
>>>             <xs:complexType>
>>>               <xs:sequence maxOccurs="unbounded">
>>>                 <xs:group ref="prov:documentElements"/>
>>>                 <xs:any namespace="##other" processContents="lax" 
>>> minOccurs="0" maxOccurs="unbounded"/>
>>>               </xs:sequence>
>>>             </xs:complexType>
>>>           </xs:element>
>>>         </xs:sequence>
>>>       </xs:extension>
>>>     </xs:complexContent>
>>>   </xs:complexType>
>>
>> To me, this does not correspond to prov-dm.
>> I regard the bundle construct as distinct from the entity construct.
>
> Well, a Bundle is an Entity so the Bundle complexType extending the 
> Entity complexType is good.
>
> How then to have what the PROV-DM calls the 'bundle constructor'?
>
> I think of the prov:bundleElements as the bundle constructor and I 
> believe that it corresponds to PROV-DM.
>
> An alternative option would be to make a new element 
> prov:bundleConstructor and put it in the documentElements sequence. 
>  This may be more like PROV-N, but is less like XML.
>
> The PROV-DM does not specify a serialization or syntax so a XML-native 
> approach should be ok.  I think having the bundle constructor as an 
> XML element of a Bundle makes sense in XML.
>
> --Stephan
>
>>
>>
>> Luc
>>
>>> With the updated Bundle complexType the PROV-XML serialization for a 
>>> bundle would look like this
>>>
>>> <prov:document
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>> xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>>> xmlns:ex="http://example.com/ns/ex#"
>>> xmlns:prov="http://www.w3.org/ns/prov#">
>>>
>>> <prov:person prov:id="bob"/>
>>>
>>> <ex:label>outside-bundle-label</ex:label>
>>>
>>> <prov:activity prov:id="a1"/>
>>>
>>> <prov:bundle prov:id="bundle1">
>>>
>>> <prov:label>bundle1</prov:label>
>>> <ex:label>label-on-bundle-entity</ex:label>
>>>
>>> <prov:bundleElements>
>>>
>>> <ex:label>in-bundle-label</ex:label>
>>>
>>> <prov:entity prov:id="ex:report1">
>>> <prov:type xsi:type="xsd:QName">report</prov:type>
>>> <ex:version>1</ex:version>
>>> </prov:entity>
>>>
>>> <ex:version>1.0.0</ex:version>
>>>
>>> <prov:wasGeneratedBy>
>>> <prov:entity prov:ref="ex:report1"/>
>>> <prov:activity prov:ref="a1"/>
>>> <prov:time>2012-05-24T10:00:01</prov:time>
>>> </prov:wasGeneratedBy>
>>>
>>> <ex:content>foo</ex:content>
>>>
>>> </prov:bundleElements>
>>>
>>> </prov:bundle>
>>>
>>> </prov:document>
>>>
>>> I used elements from the namespace "ex" to show how non-PROV 
>>> elements can be used within a bundle and as PROV attributes on the 
>>> bundle entity.
>>>
>>> --Stephan
>>>
>>> On Feb 12, 2013, at 12:49 PM, Stephan Zednik <zednis@rpi.edu 
>>> <mailto:zednis@rpi.edu>> wrote:
>>>
>>>> Comments in-line, last two comments are the most important.
>>>>
>>>>
>>>> On Feb 12, 2013, at 7:29 AM, Stian Soiland-Reyes 
>>>> <soiland-reyes@cs.manchester.ac.uk 
>>>> <mailto:soiland-reyes@cs.manchester.ac.uk>> wrote:
>>>>
>>>>> On Tue, Feb 5, 2013 at 7:29 PM, Stephan Zednik <zednis@rpi.edu 
>>>>> <mailto:zednis@rpi.edu>> wrote:
>>>>>> This does not follow the pattern Stian suggested of updating 
>>>>>> Document so
>>>>>> that bundles are required at the bottom of the document.
>>>>>>
>>>>>> Stian, does this make sense?  Do you still prefer the other 
>>>>>> pattern you
>>>>>> suggested in the earlier email?
>>>>> Well to me it does not really matter if xs:any can appear anywhere in
>>>>> <document> or just at the bottom of the <document> - but I think your
>>>>> current solution means that you are allowed to put anything anywhere
>>>>> in <document>, but in <bundle> you can only put the extensions after
>>>>> <prov:value> but before  the documentelements, which is a bit odd.
>>>>>
>>>>> It might be 'cleaner' to only allow extension stuff at the bottom, but
>>>>> that could make it tricky for the bundle as it (now) specializes the
>>>>> prov:Entity type and therefore the additional elements of Bundle come
>>>>> below the <xs:any> from entity.
>>>>>
>>>> Yes, originally this worked because we had multiple xs:any in the 
>>>> prov:Bundle (inherited from both prov:Entity and 
>>>> prov:documentElements) but we violated the "unique particle 
>>>> attribution" rule which caused xjc to fail to generate java classes 
>>>> from the schema.
>>>>
>>>> We changed the schema to work well with xjc but in doing so 
>>>> introduced the odd restriction you have noted.  I am still playing 
>>>> around with it to try to come up with a solution.
>>>>
>>>>>
>>>>>
>>>>>> Also, I think that we put the abstract element after the choice 
>>>>>> in document
>>>>>> Elements because it caused problems with schema validation, but I 
>>>>>> can double
>>>>>> check on that and see if it can be included in the choice.
>>>>> I know, those things can get tricky.. it's another problem with XSD
>>>>> and its particle separation.
>>>>>
>>>>>
>>>>> I tried some example of making an extension:
>>>>>
>>>>> <https://dvcs.w3.org/hg/prov/file/0bb02b43e80b/xml/examples>
>>>>>
>>>>> Here in <custom.xsd> I was *NOT* able to use
>>>>> substitutionGroup="prov:abstractElement", because I get:
>>>>>
>>>>> Can't include the substitutionGroup as it causes:
>>>>> "http://www.w3.org/ns/prov#":abstractElement
>>>>> and WC[##other:"http://www.w3.org/ns/prov#"] (or elements from their
>>>>> substitution
>>>>> group) violate "Unique Particle Attribution".
>>>>>
>>>>>
>>>>> Basically this means that the only way to use the
>>>>> substitutionGroup="prov:abstractElement" is to stay within the PROV
>>>>> namespace.  This might not be obvious to someone looking at our
>>>>> schema. So I'm having doubts now.
>>>> We can try to make this more clear in the Note.  The 
>>>> abstractElement is only to be intended to be used with 
>>>> substitionGroups that are in the PROV Namespace.
>>>>
>>>>>
>>>>> However, the general extension mechanism through xsd:any do work well,
>>>>> and can validate also my non-prov elements -<custom-example.xml>, even
>>>>> when I inserted those elements inside <prov:document>.
>>>>>
>>>>>
>>>>> In <with-extensions.xml> I tried reusing some schemas of the shelf,
>>>>> XHTML, MathML and DC Terms.  This works fine thanks to xs:any as well.
>>>>> I was even able to do nested inclusion reusing prov: elements, ie:
>>>>>
>>>>> <prov:document>
>>>>>  <mathml:annotation-xml>
>>>>>    <prov:wasAttributedTo>
>>>>>      <prov:entity prov:ref="formula"></prov:entity>
>>>>>           <prov:agent prov:ref="fred"/>
>>>>>           <dcterms:description>blalalla</dcterms:description>
>>>>> <!-- ... -->
>>>>>
>>>>> (Those internal prov: elements should probably in most cases NOT be
>>>>> considered part of the <prov:document> !)
>>>>>
>>>>> Now you can argue whether this would make sense or not, but that is
>>>>> the downside of xsd:any - anything (in non-prov namespaces, in this
>>>>> case) is allowed, not just content that should make sense by
>>>>> declaration of substitution groups. The more xsd:any - the less you
>>>>> have a schema and more you just have lots of fragmented types.
>>>>>
>>>> I think we are very limited in what we can say about how non-PROV 
>>>> extensions integrate with PROV.
>>>>
>>>>>
>>>>> However I was unable to reuse namespaces like FOAF, because it does
>>>>> not have an XSD schema. So sadly this is not allowed:
>>>>>
>>>>> <prov:person prov:id="johndoe">
>>>>>       <foaf:name>John Doe</foaf:name>
>>>>> </prov:person>
>>>>>
>>>>> I think this is too strict, and I suggest changing the xsd:any of
>>>>> <prov:entity> and friends to processContent="lax" - this would only
>>>>> validate against a schema if it's known.
>>>>
>>>>> We could rename prov:abstractElement to prov:internal or something to
>>>>> make it less 'tempting' for external use.
>>>>>
>>>> I am ok with this.
>>>>
>>>>>
>>>>>
>>>>> We could in theory get rid of the whole documentElements and use 
>>>>> only xs:any:
>>>>>
>>>>>
>>>>> <xs:element name="document" type="prov:Document" />
>>>>> <xs:complexType name="Document">
>>>>> <xs:choice maxOccurs="unbounded">
>>>>> <xs:any namespace="##targetNamespace" processContents="strict" />
>>>>> <xs:any namespace="##other" processContents="lax" />
>>>>> </xs:choice>
>>>>> </xs:complexType>
>>>>>
>>>>> And then no substition groups is needed in our PROV extensions, any
>>>>> declared <xs:element> would be allowed.
>>>> If I understand this correctly, this would allow PROV attribute 
>>>> elements to be used on the document.
>>>>
>>>>> For consistency I've set
>>>>> processContent=lax even for content of <prov:document> but we might
>>>>> want to instead say that it should be strict, to encourage
>>>>> PROV-extensions (rather than just providing attributes) to at least
>>>>> declare a schema.
>>>> I agree that PROV extensions should declare a schema.
>>>>
>>>>>
>>>>> This would mean you could also insert <prov:value> inside
>>>>> <prov:document> and so we would have to ensure that only "proper"
>>>>> elements are declared as named <xs:element>.  I tried changing them to
>>>>> xs:group's and group refs which works fine.
>>>>>
>>>>>
>>>>>
>>>>> The above is quite tricky to get to work inside a <prov:bundle>
>>>>> because all its prov elements are optional, and we get a clash between
>>>>> those and the optional xs:any in the prov namespace.
>>>>>
>>>>> This is a bit odd anyway because <prov:bundle> plays a dual role with
>>>>> both being a way to say an entity which is a bundle, but also just
>>>>> lists its content flatly, and so we can't know if something listed is
>>>>> part of the bundle or an attribute of the bundle - specially for
>>>>> extensions.
>>>>>
>>>>> Saying something is a bundle could also be done as:
>>>>>
>>>>> <prov:entity>
>>>>> <prov:type>prov:Bundle</prov:type>
>>>>> </prov:entity>
>>>>>
>>>>> (I am a  bit confused now, as the PROV-XML document says this is how
>>>>> it should be done)
>>>> We made a change to the types some time ago which is reflected in 
>>>> the editors' draft.
>>>>
>>>> https://dvcs.w3.org/hg/prov/raw-file/default/xml/prov-xml.html
>>>>
>>>> Since Bundles are specializations of Entity prov:Bundle extends 
>>>> prov:Entity.
>>>>
>>>>>
>>>>> .. but I know the XML schema has similar 'helpers' for types like
>>>>> prov:Person and prov:Revision so let's assume we keep the
>>>>> <prov:bundle> entity.
>>>>>
>>>>> I then would propose changing the bundle to be:
>>>>>
>>>>> <prov:bundle>
>>>>> <prov:label>A bundle</prov:bundle>
>>>>> <dcterms:description>Still not part of the 
>>>>> bundle</dcterms:description>
>>>>> <prov:provenanceDescriptions>
>>>>>     <!-- the bundle content -->
>>>>>     <prov:activity />
>>>>>     <!-- .. -->
>>>>> </prov:provenanceDescriptions>
>>>>> </prov:bundle>
>>>>>
>>>> I like this.
>>>>
>>>>> (We can argue about the name prov:provenanceDescriptions - I went for
>>>>> something close to PROV-DM)
>>>>>
>>>>>
>>>>> So this works fine:
>>>>>
>>>>> <xs:complexType name="Bundle">
>>>>> <xs:complexContent>
>>>>> <xs:extension base="prov:Entity">
>>>>> <xs:sequence>
>>>>> <xs:element name="provenanceDescriptions" minOccurs="0">
>>>>> <xs:complexType>
>>>>> <xs:choice minOccurs="0" maxOccurs="unbounded">
>>>>> <xs:any namespace="##targetNamespace" processContents="strict" />
>>>>> <xs:any namespace="##other" processContents="lax" />
>>>>> </xs:choice>
>>>>> </xs:complexType>
>>>>> </xs:element>
>>>>> </xs:sequence>
>>>>> </xs:extension>
>>>>> </xs:complexContent>
>>>>> </xs:complexType>
>>>>>
>>>>>
>>>>> Now the xsd:any from prov:Entity does not cause any problems, except
>>>>> that they have to be stated BEFORE <prov:provenanceDescriptions>. To
>>>>> change this we would have to do a copy/paste from prov:Entity instead
>>>>> and move the xsd:any down.
>>>> I am OK with this.
>>>>
>>>> What does the group think?
>>>>
>>>>>
>>>>>
>>>>> So it's possible, and not that unclean, to get rid of the substitution
>>>>> groups, but it would allow non-PROV garbage (ie. schema elements which
>>>>> were not intended as PROV extensions, like my MathML example above)
>>>>> within <prov:document> and <prov:bundle>.
>>>>>
>>>>> I don't know what is the groups thoughts on extensions we should allow
>>>>> for those, but at least it would be consistent with what PROV-N allows
>>>>> - and then perhaps any PROV-N document could be translatable to
>>>>> PROV-XML even without knowing the extensions.
>>>>>
>>>> I am ok with the substitution groups as they are.
>>>>
>>>> If you can present a desirable use case that is disallowed by the 
>>>> current modeling with substitution groups and supported by an 
>>>> alternate modeling than I will consider it.  I don't want to make a 
>>>> late change without an example use case to consider.
>>>>
>>>> --Stephan
>>>>
>>>>> If you wish I can commit my version of the schemas which does the
>>>>> above (but slightly tidied up), either to the tip or a new branch.
>>>>>
>>>>>
>>>>> -- 
>>>>> Stian Soiland-Reyes, myGrid team
>>>>> School of Computer Science
>>>>> The University of Manchester
>>>>>
>>>>
>>>>
>>>
>>
>> -- 
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk 
>> <mailto:l.moreau@ecs.soton.ac.uk>
>> United Kingdom http://www.ecs.soton.ac.uk/~lavm 
>> <http://www.ecs.soton.ac.uk/%7Elavm>
>>
>>
>>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Tuesday, 12 February 2013 22:15:40 UTC