Sequence Type Checking from Jeni Tennison on 2002-05-07 (www-xpath-comments@w3.org from April to June 2002)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Tue, 7 May 2002 09:33:42 +0100
To: public-qt-comments@w3.org
CC: www-xpath-comments@w3.org
Message-ID: <381294962046.20020507093342@jenitennison.com>
Hi,

Can you clarify what "AtomicType" means in the context of sequence
type checking in Section 2.1.3.2. (SequenceType) of the XPath 2.0 WD?
I think that you mean this to include all atomic simple types, since
higher up, in the introduction part of Section 2.1.3 (Types), you say:

  "The set of named types includes all the built-in types *and all
  user-defined simple or complex types* for which the type declaration
  contains a name." (my emphasis)

That being the case, isn't the current set of productions for
SequenceType ambiguous? What if I defined the following type in a
schema with no target namespace:

<xs:simpleType name="item">
  <xs:restriction base="xs:token">
    <xs:pattern value="item[0-9]{3}" />
  </xs:restriction>
</xs:simpleType>

This is a user-defined atomic type whose name is 'item' (with no
target namespace and thus no prefix). Since atomic types are simply
named, saying:

  item+

could mean "one or more of the item type from the schema" or it could
mean "one or more items of any type". Similarly, I could have types
called "element", "attribute", "node" and so on for the other ItemType
keywords.

Or perhaps you are restricting XML Schema to that subset of XML
Schemas in which all the components have a target namespace? If so, I
don't *think* you've mentioned that anywhere, and it's a pretty big
restriction. Or perhaps you mean for types that are named the same as
the keywords to have to be prefixed with a ":" as elsewhere? In which
case you should incorporate that into the BNF.

If neither of those is the case, one method of clarifying this would
be to make those ItemTypes that are actually node types look like
(node) KindTests (production [31]), so you'd have node(),
processing-instruction() and so on. This could be extended to include
element() and attribute(), perhaps adopting the same syntax as
processing instruction tests to provide the name of the node:

  element('foo')

would mean the same as:

  element foo


Another thing here is how you match elements and attributes if you use
an ElemOrAttrType. The text says:

  2. Another form of ElemOrAttrType is simply a QName, which is
     interpreted as the required name of the element or attribute. The
     QName must be an element or attribute name that is found in the
     in-scope schema definitions.

But earlier on, in-scope schema definitions is defined as:

  In-scope schema definitions. This is a set of (QName, type
  definition) pairs. It defines the set of types that are available
  for reference within the expression. It includes the built-in schema
  types and all globally-declared types in imported schemas.

Perhaps you're using "type definition" in a different way from XML
Schema, but in XML Schema, "type definitions" aren't element
declarations. I think that you might need to add something to the
static context -- "in-scope element declarations" and "in-scope
attribute declarations" -- in which to search, although I notice that
these are explicitly left out of the data model...

I think that you mean that doing "element foo" will only match foo
elements that are declared at the top level of the schema (i.e.
{element declarations} on schema component Schema). Is that correct? I
think it might also be helpful to be able to distinguish between
"elements of this name, wherever they're declared" and "elements of
this name declared at the top-level of the schema". Similarly, indeed
particularly, for attributes. You'll commonly have the following
within a schema (particularly those generated from DTDs):

<xs:element name="foo">
  <xs:complexType>
    <xs:attribute name="id" type="xs:ID" />
  </xs:complexType>
</xs:element>

<xs:element name="bar">
  <xs:complexType>
    <xs:attribute name="id" type="xs:ID" />
  </xs:complexType>
</xs:element>

And currently there's no way that I can see of referring to "id
attributes wherever they're declared" or even "id attributes as in the
element declaration for foo or the element declaration for bar".

Furthermore, if you do mean to have "element foo" only match top-level
element declarations, then I don't understand why you're allowed to
specify a type when matching those kinds of elements, but aren't
allowed to do so when matching local element declarations. A top-level
element declaration can only have one type, just like a local element
declaration. Perhaps you mean it to be that when the type is specified
you match all elements, wherever their declaration?

A final thing here is that the SchemaGlobalContext should include
attribute groups and model groups, so that you can distinguish between
foo elements with the following declarations:

<xs:element name="foo" type="type1" />

<xs:complexType name="bar">
  <xs:sequence>
    <xs:element name="foo" type="type2" />
  </xs:sequence>
</xs:complexType>

<xs:group name="bar">
  <xs:sequence>
    <xs:element name="foo" type="type3" />
  </xs:sequence>
</xs:group>

If you did allow for this kind of schema, you'd also have to add
"in-scope group definitions" and "in-scope attribute definitions" to
the static context.

---

Personally, I'd like to see the syntax used here unified with the
syntax used in XSLT in match patterns. It strikes me that you're doing
a similar kind of thing as match patterns here: putting together a
test that identifies the kind of things that are allowed in a
sequence. Perhaps an alternative, therefore, would be to use type(),
say, to indicate a type and have things like:

  type('xs:date')        refers to the built-in Schema type date
  @*?                    refers to an optional attribute
  *                      refers to any element
  office:letter          refers to an element with a specific name
  *[type('po:address')]+ refers to one or more elements of the given
                         type
  node()*                refers to a sequence of zero or more nodes of
                         any type
  item()*                refers to a sequence of zero or more nodes or
                         atomic values

I am not advocating a full pattern syntax here -- I understand that
you want to be able to *identify* the node/type from these
SequenceType indicators, not only match them, so that you can do
static analysis. Ths kind of thing I'm thinking about is something
like (forgive the BNF):

SequenceType   ::=  (ItemTypes OccurrenceIndicator) | EmptyType
EmptyType      ::=  "(" ")"
ItemTypes      ::=  ItemType
                    | "(" ItemType ("|" ItemType)+ ")"
ItemType       ::=  NodeType
                    | AtomicType
                    | "item" "(" ")"
NodeType       ::=  NamedNodeType
                    | "node" "(" ")"
                    | "document" "(" ")"  // or perhaps "/"
                    | "text" "(" ")"
                    | "processing-instruction" "(" ")"
                    | "comment" "(" ")"
NamedNodeType  ::=  ElemOrAttr SchemaType?
                    | SchemaContext ElemOrAttr
ElemOrAttr     ::=  "@"? ("*" | QName)
SchemaType     ::=  "[" "type" "(" QNameLiteral ")" "]"
QNameLiteral   ::=  ("'" QName "'") | ('"' QName '"')
SchemaContext  ::=  SchemaGlobalContext ("/" SchemaContextStep)* "/"
SchemaGlobalContext ::= "schema" "(" ")" "/" (TypeOrGroup | QName)
TypeOrGroup         ::= ("type" | "group") "(" QNameLiteral ")"
SchemaContextStep   ::= QName
AtomicType          ::= "type" "(" (UnknownType | QNameLiteral)? ")"
UnknownType         ::= ("'" "'") | ('"' '"')
OccurrenceIndicator ::= ("*" | "+" | "?")

So for example rather than writing:

  element foo in type bar/baz +

You'd write:

  schema()/type('bar')/baz/foo +

Rather than writing "empty", you'd write "()". Rather than writing
"unknown", you'd write "type('')". Rather than writing "atomic value",
you'd write "type()"

You could say something like "any number of id attributes declared
within foo or bar element declarations" with:

  (schema()/foo/@id | schema()/bar/@id)*

With a few adjustments, this kind of syntax would enable users to make
the distinction between "elements declared anywhere" and "elements
declared at the top level". The former could be matched with:

  foo

whereas the latter with:

  schema()/foo

Obviously the above would still need some work (in the above, "type()"
means different things in different situations), but I'd hope that a
unified syntax with XSLT match patterns will make the sequence typing
more flexible in the long run, easier to learn for people with
experience with XSLT, and eventually enable XSLT templates to match
things other than nodes.

Cheers,

Jeni
---
Jeni Tennison
http://www.jenitennison.com/
Received on Tuesday, 7 May 2002 06:01:47 UTC