Substitution group handling from Antoine Mensch on 2009-09-28 (public-exi-comments@w3.org from September 2009)

From: Antoine Mensch <antoine.mensch@odonata.fr>
Date: Mon, 28 Sep 2009 09:55:22 +0200
To: public-exi-comments@w3.org
Message-ID: <4AC06BEA.6030508@odonata.fr>
The following definition (section 8.5.4.1.6) of the list of valid 
members of an element declaration substitution group seems underspecified:

    Let S be the set of element declarations that directly or indirectly
    reaches the element declaration PTi through the chain of
    {substitution group affiliation} property of the elements, plus PTi
    itself if it was not in the set.


The actual contents of S cannot be determined by only looking at the XML 
Schema in which PTi is declared and the additional XML schemas it 
imports. Rather, the complete set of XML Schemas in scope must be 
considered to build S, as members of S can be contributed by each XML 
Schema that imports the XML Schema in which PTi is declared.

It is therefore important to determine the set of XML Schemas in scope 
for a given EXI encoder/decoder, as shown in the example below:

Let
- "a" be an element declaration in XML Schema A,
- "b" an element declaration in XML Schema B which has "a" as 
{substitution group affiliation} property,
- "c" an element declaration in XML Schema B which has "a" as 
{substitution group affiliation} property.

Let P1, P2 and P3 be three EXI processors which respectively have {A, B, 
C}, {A, B} and {A, C} as known XML Schemas.

While in theory P1 and P2 could exchange schema-informed documents using 
both A and B, P1 and P3 could exchange documents using both A and C, and 
P2 and P3 could exchange documents using A, this will not be possible 
unless a precise and shared definition of the set S for element 
declaration "a" can be determined for each exchanged document. Indeed, a 
naive static implementation would generate incompatible sets S1={"a", 
"b", "c"}, S2={"a", "b"} and S3={"a", "c"} for
P1, P2 and P3.

Is it the intention of the WG that this issue be addressed using the 
SchemaId option? The current version of the spec leaves the use of this 
option completely open in such cases, and that could lead to 
interoperability issues. If it is nevertheless the case, it could at 
least be useful to clarify in section 8.5.4.1.6 that S depends on the 
SchemaId option.

The WG could perhaps consider an alternative approach where members of 
an element declaration substitution group are encoded as SE(*) the first 
time their namespace appear in the document, and using the scheme 
outlined in section 8.5.4.1.6 afterwards. This would allow both the 
encoder and decoder to build the same set of in-scope namespaces for the 
document, thus guaranteeing interoperability if both processors share 
schemas for those namespaces. On the other hand, this would require the 
dynamic construction of the set S for all elements that are potential 
heads of substitution groups, thus deviating from the static approach 
used so far for schema-informed grammars.

Still about section 8.5.4.1.6, a minor optimization could probably be 
obtained by excluding element declarations whose {abstract} property is 
true from the set S, as such elements should never occur in valid documents.

Best regards,

Antoine Mensch
Received on Monday, 28 September 2009 08:01:33 UTC