RE: Substitution group handling from Taki Kamiya on 2009-10-14 (public-exi-comments@w3.org from October 2009)

From: Taki Kamiya <tkamiya@us.fujitsu.com>
Date: Wed, 14 Oct 2009 13:59:46 -0700
To: <antoine.mensch@odonata.fr>, <public-exi-comments@w3.org>
Message-ID: <0EB98B413C1D429DB949E1BD711F3E41@homunculus>
Hi Antoine,

Thanks for the comment and your careful attetion to the details of spec.

The EXI schema-informed grammar system is described in a way that is
solely concerned with the abstract schema model which is agnostic about
the physical schema composition (i.e. imports, includes and redefines)
that is in the separate realm of the XML Schema specification.

The schema information in effect for individual EXI stream is either
communicated out-of-band or through the schemaID option. This is described
in section "5.4 EXI Options". However, your suggestion to make the correlation
explicit is well taken, and we will add a sentence in "8.5 Schema-informed
Grammars" to that effect with reference to that description.

EXI does not try to leverage every feature of XML Schema exhaustively to
wring every potential efficiency out of schemas. Instead, those schema
features that EXI capitalizes on have been selected to achieve the best
use of the schema. This is based on empirical judgement on the effect and
broadness of the feature application while being keenly aware of the need to
balance between the benefit of extra compactness and the accrued complexity
that may adversely affect the code footprint and the processing efficiency.

In the case of the abstract element case you brought to the attention,
it is expected to cause only a slightest improvement if any in general
given the log_2(n) formula used in the Unsigned Integer representation.
We hope this helps to explain why EXI does not take advantage this XML
Schema feature.

Thanks!

-taki


-----Original Message-----
From: Antoine Mensch
Sent: Monday, September 28, 2009 12:55 AM
To: public-exi-comments@w3.org
Subject: Substitution group handling

> The following definition (section 8.5.4.1.6) of the list of valid
> members of an element declaration substitution group seems underspecified:
>
>     Let S be the set of element declarations that directly or indirectly
>     reaches the element declaration PTi through the chain of
>     {substitution group affiliation} property of the elements, plus PTi
>     itself if it was not in the set.
>
>
> The actual contents of S cannot be determined by only looking at the XML
> Schema in which PTi is declared and the additional XML schemas it
> imports. Rather, the complete set of XML Schemas in scope must be
> considered to build S, as members of S can be contributed by each XML
> Schema that imports the XML Schema in which PTi is declared.
>
> It is therefore important to determine the set of XML Schemas in scope
> for a given EXI encoder/decoder, as shown in the example below:
>
> Let
> - "a" be an element declaration in XML Schema A,
> - "b" an element declaration in XML Schema B which has "a" as
> {substitution group affiliation} property,
> - "c" an element declaration in XML Schema B which has "a" as
> {substitution group affiliation} property.
>
> Let P1, P2 and P3 be three EXI processors which respectively have {A, B,
> C}, {A, B} and {A, C} as known XML Schemas.
>
> While in theory P1 and P2 could exchange schema-informed documents using
> both A and B, P1 and P3 could exchange documents using both A and C, and
> P2 and P3 could exchange documents using A, this will not be possible
> unless a precise and shared definition of the set S for element
> declaration "a" can be determined for each exchanged document. Indeed, a
> naive static implementation would generate incompatible sets S1={"a",
> "b", "c"}, S2={"a", "b"} and S3={"a", "c"} for
> P1, P2 and P3.
>
> Is it the intention of the WG that this issue be addressed using the
> SchemaId option? The current version of the spec leaves the use of this
> option completely open in such cases, and that could lead to
> interoperability issues. If it is nevertheless the case, it could at
> least be useful to clarify in section 8.5.4.1.6 that S depends on the
> SchemaId option.
>
> The WG could perhaps consider an alternative approach where members of
> an element declaration substitution group are encoded as SE(*) the first
> time their namespace appear in the document, and using the scheme
> outlined in section 8.5.4.1.6 afterwards. This would allow both the
> encoder and decoder to build the same set of in-scope namespaces for the
> document, thus guaranteeing interoperability if both processors share
> schemas for those namespaces. On the other hand, this would require the
> dynamic construction of the set S for all elements that are potential
> heads of substitution groups, thus deviating from the static approach
> used so far for schema-informed grammars.
>
> Still about section 8.5.4.1.6, a minor optimization could probably be
> obtained by excluding element declarations whose {abstract} property is
> true from the set S, as such elements should never occur in valid documents.
>
> Best regards,
>
> Antoine Mensch
>
Received on Wednesday, 14 October 2009 21:00:26 UTC