is XML Schema context free or context sensitive or something entirely different from Gregor on 2005-05-09 (www-xml-schema-comments@w3.org from April to June 2005)

From: Gregor <iamgregor@gmail.com>
Date: 09 May 2005 11:50:49 -0600
To: www-xml-schema-comments@w3.org
Message-ID: <642f07b6050428084553ba2875@mail.gmail.com>
I do not understand the nature of the XML Schema language.

"...The approach followed here follows the best practices currently
used in the programming languages community, although somewhat adapted
for XML. The hallmark of this approach is the use of context free
grammars to provide syntactic checking and the use of inference rules
to provide the semantics associated with each piece of syntax. This
means there is, essentially, one inference rule per context free
grammar production. This set of inference rules is not intended to be
in any way minimal, but it is helpful from both a pedagogical and
implementation standpoint - for each syntactic construct it is
straightforward to identify its underlying semantics."
http://www.w3.org/TR/2001/WD-xmlschema-formal-20010320/

I think I understood the part about it being a context free grammar.
To my understanding, this is shown by the production rules in the
recommendation as in:

<element>...
Content: (annotation?, ((simpleType | complexType)?, (unique | key | keyref)*))
</element>

not looking at the attributes one could write:

element ::= (annotation?, ((simpleType | complexType)?, (unique | key
| keyref)*))

>This means there is, essentially, one inference rule per context free
grammar production.

This is the part I no longer understand, because looking at the
recommendation Chapter 3.3  I can find many rules restricting the
above production.
e.g.
3.3.4. 3.1 If {nillable} is false, then there must be no attribute
information item among the element information item's [attributes]
whose [namespace name] is identical to
http://www.w3.org/2001/XMLSchema-instance and whose [local name] is
nil.

One rule is refering to the context of the element:
3.3.3 2 If the item's parent is not <schema>, then all of the
following must be true:

This kind of rules e.g. rules refering not to the right side of the
production rule (for some attribute element konfiguration) but to the
context in which the left side was present.

In this case: 
schema :: = ((include | import | redefine | annotation)*,
(((simpleType | complexType | group | attributeGroup) | element |
attribute | notation), annotation*)*)

So if the element was taken from here, a rule applies to it then if it
comes from this production:
sequence ::= (annotation?, (element | group | choice | sequence | any)*) 

So I assume the whole language is no longer context free and therefore
no longer in the Chomsky Hierachie 2 (context free grammars). Please
comment!!

The problem that I have with not knowing how to classify the XML
Schema language is the following.

I am currently testing parsers and their compliance with certain parts
of the recommendation of the XML Schema language.

I would assume (feel free to correct me) that if XML Schema was truly
context free, that a parser that could correctly validate against one
production rule (say: sequence ::= (annotation?, (element | group |
choice | sequence | any)*) ) could do that no matter where in a XML
Schema schema the production rule occurs.

If on the other hand there is a finite set of rules (dealing with
context or content) to all production rules could one classify those
rules to again prove that a parser can validate XMl instance against
one (or all) production rules in any kind of context using a finite
number of test cases?

Another question in the same direction: 
How does the concept of a context free language deal with the problem
that element can be a terminal (e.g. within a sequence) and a
non-terminal (e.g. when containing a complexType)

Thank you for your help,

Gregor
Received on Monday, 9 May 2005 17:51:16 UTC