Re: Question on "Adding Productions when Strict is False"

Hi Daniel,

Great thanks for your response. I'm not sure if this mailing list is
the right place to discuss this and I wouldn't if I wasn't convinced
there is some issue around that.
I understand that each implementation has its own strategy to handle
the grammar representation - my issue is that it seems to me that
EXIficient picks the wrong event code in a certain case because of
that "AT(invalid)" production. I know I must be wrong because the
OpenEXI is able to decode the EXI stream produced by EXIficient - I
just don't understand why although I went through the spec many times.
Please consider the following very simple use case:
You have xml:
<?xml version="1.0" encoding="UTF-8"?>
<test xmlns="http://ns-test"/>

and schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://ns-test" elementFormDefault="qualified">
<xs:element name="test">
  <xs:complexType/>
</xs:element>
</xs:schema>

During schema-enabled encoding with Preserve.prefixes (all the rest of
the options have default values):
EXIficient encodes the NS event with event code 1.4

....
encoder.encodeAttributeList(exiAttributes); // SAXEncoder, line 127
currentRule.get2ndLevelEventCode(EventType.NAMESPACE_DECLARATION,
fidelityOptions); // AbstractEXIBodyEncoder, line 360
encode2ndLevelEventCode(ec2); // AbstractEXIBodyEncoder, line 363
....

Looking at the spec the grammar and event codes should look like this:

test-0:
           EE                                      0
           AT(xsi:type)  test-0              1.0
           AT(xsi:nil)  test-0                 1.1
           AT (*)  test-0                       1.2
           NS  test-0                           1.3
           SE (*) content2                    1.4
           CH [untyped value] content2 1.5

The NS event should be encoded with 1.3 according to the spec.

Where am I going wrong?

Thanks in advance!

Best regards,
Rumen


On Thu, Nov 15, 2012 at 9:14 AM, Peintner, Daniel (ext)
<daniel.peintner.ext@siemens.com> wrote:
> Hi Rumen,
>
>> Element-i,0 : AT (*) [untyped] Element-i,0 n.m+2)
>>
>> It is inserted before "NS Element-i,0   next n.m" production and after
>> the "AT (*) Element-i,0    next n.m"
>>
>> According to my understanding of the specification this productions
>> has three parts [...]
>
> Your understanding is correct.
>
>> However, when looking at the EXIficient implementation:
>> in the SchemaInformedFirstStartTag class, methods getNumberOf2ndLevelEvents()
>> and get2ndLevelEventCode(), the code includes one more production with
>> even code with 2 parts:
>>
>> In the source code this extra production is referred in the comments
>> as "AT(invalid)."
>
> Like in most specifications there are different strategies to actually implement a certain behaviour. This is also the case for the source code you cited.
>
> EXIficient creates an event on the second level that links to the available events on the third level. This second level event code part is never meant to be encoded without a subsequently following third level event code part.
>
> However, everyone is free to choose another strategy as long as the result matches the specification.
>
> Hope this helps,
>
> -- Daniel
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Rumen Kyusakov [mailto:kjussakov@gmail.com]
> Sent: Wednesday, October 24, 2012 11:27 AM
> To: public-exi@w3.org
> Subject: Question on "Adding Productions when Strict is False"
>
> Dear all,
>
> I have a question regarding encoding of event codes in schema mode when strict is FALSE.
> According to my understanding of
> http://www.w3.org/TR/2011/REC-exi-20110310/#addingProductions the second level productions (the productions with even codes with 2
> parts)
> for the first grammar rule of schema-derived grammars are:
>
> Element-i,0 : EE                                          n.m // Only
> if not available already with shorter event code
>                    AT(xsi:type) Element-i,0     next n.m
>                    AT(xsi:nil) Element-i,0        next n.m
>                    AT (*) Element-i,0              next n.m
>                    NS Element-i,0                  next n.m // If NS
> are preserved
>                    SC Fragment                     next n.m // If SC
> are preserved
>                    SE (*) Element-i,c2            next n.m
>                    CH [untyped] Element-i,c2  next n.m
>                    ER Element-i,c2                next n.m // If ER
> are preserved
>
> However, when looking at the EXIficient implementation:
> in the SchemaInformedFirstStartTag class, methods getNumberOf2ndLevelEvents() and get2ndLevelEventCode(), the code includes one more production with even code with 2 parts:
>
> Element-i,0 : AT (*) [untyped] Element-i,0 n.m+2)
>
> It is inserted before "NS Element-i,0   next n.m" production and after
> the "AT (*) Element-i,0    next n.m"
> In the source code this extra production is referred in the comments as "AT(invalid)."
>
> According to my understanding of the specification this productions has three parts event code that is defined by the following fragment from the spec:
>
> For each non-terminal Element i, j , such that 0 ≤ j ≤ content , with zero or more productions of the following form:
>
> Element i, j :
>         AT (qname 0 ) [schema-typed value] NonTerminal 0
>         AT (qname 1 ) [schema-typed value] NonTerminal 1
>             ⋮
>         AT (qname x-1 ) [schema-typed value] NonTerminal x-1 where x represents the number of attributes declared in the schema for this context, add the following productions:
>
>
> Element i, j :
>         AT (*) Element i, j n.m
>         AT (qname 0 ) [untyped value] NonTerminal 0 n.(m+1).0
>         AT (qname 1 ) [untyped value] NonTerminal 1 n.(m+1).1
>             ⋮     ⋮
>         AT (qname x-1 ) [untyped value] NonTerminal x-1 n.(m+1).(x-1)
>         AT (*) [untyped value] Element i, j n.(m+1).(x)
>
> where n.m represents the next available event code with length 2.
>
> The last production "AT (*) [untyped value] Element i, j n.(m+1).(x)" has three parts and not 2.
>
> Some test with OpenEXI showed that the same extra production "AT (*) [untyped]" with even code with two parts is used as well.
>
> Can someone give me pointers on why we have this extra production?
>
> Best Regards,
> Rumen
>
>

Received on Thursday, 15 November 2012 15:15:12 UTC