Request for comments on the issue of Complex Type Grammar

Hi,

The EXI WG recently found that certain complex types result in, when they are
fully fledged using the process described in section 8.5.4.4.1, grammars that
are slightly different from those that are generated by actual EXI 1.0
implementations.

The discrepancy was first discovered for the ur-type grammar, however, the
condition under which the issue becomes evident was identified to be
potentially all complex types that have an attribute wildcard.

The WG analyzed the finding, and agreed that the implentations are actually
generating grammars that had originally been expected by the WG to be the
effect of what was described in the spec. Therefore, the WG sees it as an
erratum of the specification, and would like to fix the error.

Technical details regarding the issue, and the way the EXI Working Group intends 
to fix the bug is thoroughly described below.

Please let us know if there are any comments or questions as soon as
possible.

Sincerely,
Daniel Peintner, Takuki Kamiya
for the EXI Working Group


-------------------------------------------------------------------------------

Section 8.5.4.1.3.3 "Complex Ur-Type Grammar" prescribes a grammar for 
ur-type as follows.

Type_ur-type,0 :
  AT (*)  Type_ur-type,0
  SE(*)  Type_ur-type,1
  EE
  CH  Type ur-type, 1

Type_ur-type,1 :
  SE(*)  Type_ur-type,1
  EE
  CH  Type_ur-type,1

The last paragraph of the section also says that the content index used for 
the above grammar is 1.

Shown below is the grammar you can get by applying the rule described in 
8.5.4.4.1 "Adding Productions when Strict is False" to the above grammar, 
assuming the default preservation option (i.e. no PI, no CM, etc.).

Type_ur-type,0 :
  AT (*)  Type_ur-type,0
  SE(*)  Type_ur-type,1
  EE
  CH  Type ur-type,1
  AT(xsi:type)  Type_ur-type,0
  AT(xsi:nil)  Type_ur-type,0
  AT(*)  Type_ur-type,0
  AT(*) [untyped value]  Type_ur-type,0
  SE(*)  Type_ur-type,1copy
  CH [untyped value]  Type_ur-type,1copy

Type_ur-type,1 :
  SE(*)  Type_ur-type,1
  EE
  CH  Type_ur-type,1
  AT(*)  Type_ur-type,1
  AT(*) [untyped value]  Type_ur-type,1
  SE(*)  Type_ur-type,1copy
  CH [untyped value]  Type_ur-type,1copy

Type_ur-type,1copy :
  SE(*)  Type_ur-type,1copy
  EE
  CH  Type_ur-type,1copy
  SE(*)  Type_ur-type,1copy
  CH [untyped value]  Type_ur-type,1copy

Say you have a XML of the form:

<A>abc</A>

where the element A is typed xsd:anyType in the schema.

According to the above grammar, upon characters "abc", the state moves 
from Type_ur-type,0 to Type ur-type,1 if the 4th production in Type_ur-type,0 
is used.

Because "abc" is a character data in an element, at that point you will never see 
attributes for the element. However, the grammar Type_ur-type,1 where the state 
is now at, can still accept attributes. This is not what was intended to be the effect 
of what was described in section 8.5.4.1.3.3. The specification needs to be 
repaired to address this discrepancy between  the current description and the 
effect intended that the description was supposed to have led us to.

Given that the spec is already clear in Section "8.5.4.1.3 Type Grammars" how 
grammars are build the WG is inclined to clarify it as follows:

* Remove Section "8.5.4.1.3.3 Complex Ur-Type Grammar" and references entirely

*  Change the the fourth para in Section "8.5.4.1.3 Type Grammars" from

"Sections 8.5.4.1.3.1 Simple Type Grammars and 8.5.4.1.3.2 Complex Type Grammars 
describe the processes for creating Type i  and TypeEmpty i  from XML Schema simple 
type definitionsXS1 and complex type definitionsXS1 defined in schemas as well as 
built-in primitive typesXS2, built-in derived typesXS2 and simple ur-typeXS2 defined by 
XML Schema specification [XML Schema Datatypes]. Section 8.5.4.1.3.3 Complex Ur-Type 
Grammar defines the grammar used for processing instances of element contents of 
type xsd:anyTypeXS1. "

to

"Sections 8.5.4.1.3.1 Simple Type Grammars and 8.5.4.1.3.2 Complex Type Grammars 
describe the processes for creating Type i  and TypeEmpty i  from XML Schema simple 
type definitionsXS1 and complex type definitionsXS1 defined in schemas as well as 
built-in primitive typesXS2, built-in derived typesXS2, simple ur-typeXS2 and complex 
ur-typeXS2 defined by XML Schema specification [XML Schema Datatypes].

In addition, Section "8.5.4.1.3.2 Complex Type Grammars" needs the following change 
to correctly support the handling of those cases such as the one described as an 
example in the problem statement above.

Currently, if an {attribute wildcard} is specified, an extra attribute use grammar 
G_n-1 of the following form is added.

  G_n−1,0 :
    EE

Change the form of the above grammar to:

  G_n−1,0 :
    EE

  G_n−1,1 :
    EE

Just before the first note in the section, add the following rule.

If there is neither an attribute use nor an {attribute wildcard}, G_0 of the following form 
is used as an attribute use grammar.

  G_0,0 :
    EE

-------------------------------------------------------------------------------
END

Received on Monday, 4 February 2013 22:07:19 UTC