Data Model Last Call 2 Comments

13 comments

Comment Status Class Summary
DM-LC2-0001 open technical Mapping PSVI Additions to Types
DM-LC2-0002 open technical Accessors (Element), description of dm:string-value
DM-LC2-0003 open technical Accessors (Element), description of dm:typed-value
DM-LC2-0004 open technical Construction from a PSVI (Element), definition of children property
DM-LC2-0005 open technical Accessors (Attribute), definition of dm:string-value()
DM-LC2-0006 open technical Construction from an Infoset (Attribute), definition of type property
DM-LC2-0006 open technical Construction from an PSVI (Attribute)
DM-LC2-0008 open technical Accessors (Namespace), definition of dm:typed-value()
DM-LC2-0009 open technical Accessors (Text), definition of dm:type()
DM-LC2-0010 open technical Atomic Values, implementation defined types
DM-LC2-0011 open technical 1.1. The term type
DM-LC2-0012 open technical 1.3. Items and singleton sequences
DM-LC2-0013 open technical 1.4. The implications of [validity] ≠ valid

DM-LC2-0001: Mapping PSVI Additions to Types

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Mapping PSVI Additions to Types
There are three kinds of simple types in XML Schema: atomic types,
list types, and union types. Under the definition given in this
section:

  - if an element is declared as having an atomic type, then its
    dm:type() is that type

  - if an element is declared as having a list type, then its
    dm:type() is that type

  - if an element is declared as having a union type, then its
    dm:type() is the member type that its value complies with

So if I have a schema that includes:

<xs:element name="holidays" type="my:dates" />
<xs:simpleType name="dates">
  <xs:list itemType="xs:date" />
</xs:simpleType>

then the element:

  <my:holidays>2003-12-24 2003-12-25</my:holidays>

would be of the type 'dates', and would match the SequenceTypes:

  element(my:holidays, my:dates)
  element(my:holidays)

On the other hand, if I have an element declaration like:

<xs:element name="holiday" type="my:date" />
<xs:simpleType name="date">
  <xs:union memberTypes="xs:gMonthDay xs:date" />
</xs:simpleType>

then the element:

  <my:holiday>--12-25</my:holiday>

would be of the type xs:gMonthDay while the element:

  <my:holiday>2003-12-25</my:holiday>

would be of the type xs:date.

I think it's absolutely imperative, in this case, that both
<my:holiday> elements above match the SequenceTypes:

  element(my:holiday, my:date)
  element(my:holiday)

but I am not able to figure out, from the Formal Semantics, whether
this is the case -- whether xs:date and xs:gMonthDay are considered to
be subtypes of the date type.

Before re-reading this section, my understanding was that the
dm:type() of an element was its [type definition], as specified in the
PSVI, but that the type of its dm:typed-value() was the appropriate
member type. So, for example:

  <my:holiday>--12-25</my:holiday>

results in a data model instance that looks like:

  element my:holiday of type my:date {
    "--12-25" of type xs:gMonthDay
  }

whereas:

  <my:holiday>2003-12-25</my:holiday>

results in a data model instance that looks like:

  element my:holiday of type my:date {
    "2003-12-25" of type xs:date
  }

In other words, you should use the [member type definition] property
to identify the type of the typed value of the element/attribute, but
not the type of the element/attribute itself. I think doing anything
else will lead to sticky problems when you have restrictions of union
types, and will lead to general confusion given that elements of list
and union types would be treated inconsistently.

DM-LC2-0002: Accessors (Element), description of dm:string-value

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Accessors (Element), description of dm:string-value
I'm concerned that the definition of the string value given here leads
to some really inconsistent results because in some cases you get the
concatenation of the text nodes within the element (the original value
of the element within the XML document) and in other cases you get the
result of converting the typed value of the element to a string, but
it's hard to predict which you're going to get and when.

The confusion is at least partly due, I think, to the data model using
the type of the *element*, rather than the type of the typed value of
the element, to determine the string value of the element.

For example, if you have an element:

  <foo xsi:type="xs:NMTOKEN">   1    </foo>

then the string value of the element is the xs:NMTOKEN "1" (the same
as the typed value of the element).

But if you have an element:

  <foo xsi:type="xs:integer">    1   </foo>

then the string value of the element is the xs:string "   1   " (the
concatenation of the text node descendants of the element).

Also, if you have an element that contains a *list* of xs:QName
values, then the string value will be the concatenation of the text
node descendants of the element, rather than being normalised in the
way described for a single xs:QName.

My understanding was that there were two types of implementations that
we might be catering for here: implementations that retained the
original string value of an element, and implementations that kept
only the typed value of an element, and reconstructed the string value
on demand. What we have now is a situation where the latter approach
is no longer viable (at least for elements of types other than
xs:string, xs:anyURI, xs:QName, xs:dateTime, xs:date or xs:time).

If we don't care about implementations that just keep the typed value
of an element, it would be much much simpler if we just said "The
string value of an element is the result of concatenating its
text-node descendants." We could leave it up to those host languages
that have to deal with element construction to determine how the
content of the text-node descendants is determined -- probably they
will do so based on the typed value of the element.

If, on the other hand, we want to say that when the element is of a
simple type or a complex type with simple content, the string value is
derived from the typed value of the node, then we should do that
consistently. We should use the result of casting the typed value to a
xs:string, and enforce the use of single spaces between items if the
typed value contains more than one item. Enforcing this will be
vehemently opposed by XSLT users, who will wish to retain things like
trailing zeros in numbers so that identity transformations don't lead
to invalid documents.

DM-LC2-0003: Accessors (Element), description of dm:typed-value

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Accessors (Element), description of dm:typed-value
2nd bullet says:

  If the element has a simple type or a complex type with simple
  content, returns a sequence of zero or more atomic values derived
  from the string-value of the node and its type in a way that is
  consistent with XML Schema validation.

So the typed value of an element is based on its string value. But the
string value is defined in terms of its typed value (at least for
certain types of elements). We need to have a non-circular definition.
Saying that the string value is the result of concatenating the
text-node descendants of the element would give us a non-circular
definition.

DM-LC2-0004: Construction from a PSVI (Element), definition of children property

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Construction from a PSVI (Element), definition of children property
I think we should mention the case where an element has been assigned
a default value. In this case, the [children] of the element will not
contain any element or character information items, but the [schema
normalized value] will hold the default value for the element.

Under the current rules, depending on the implementation, this means
the element might or might not contain a text node with the default
value. We might want to legislate on whether this text node is created
or not; note that this interacts with how the typed value and string
value of the element is defined, and is particularly problematic with
xs:QName values.

DM-LC2-0005: Accessors (Attribute), definition of dm:string-value()

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Accessors (Attribute), definition of dm:string-value()
Many of the same comments apply here as did to the description of
dm:string-value() for element nodes.

In addition, the current definition here doesn't tell me what should
happen if the attribute type is not one of xdt:untypedAtomic,
xs:string, xs:anyURI, xs:QName, xs:dateTime, xs:date or xs:time. What
if it's xs:integer? What if it's a list type?

In the definition of how attribute nodes are generated from Infosets
or PSVIs, there's a string-value property; why doesn't
dm:string-value() just return the value of the string-value property?
Or is the string-value property the "intrinsic value" of the
attribute?

DM-LC2-0006: Construction from an Infoset (Attribute), definition of type property

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Construction from an Infoset (Attribute), definition of type property
I'm a bit worried that assigning attributes the type xs:NMTOKEN will
lead to usability problems because the NMTOKEN type is often used, in
DTDs, for numeric attributes but you can't, for example, sum a
sequence of attributes with xs:NMTOKEN values without explicitly
casting them.

So I'd be in favour of only using the [attribute type] to assign a
type to attribute nodes generated from an Infoset if the [attribute
type] is one of ID, IDREF or IDREFS.

If we go the other way, and want to use as much type information as
possible from the DTD, then it seems to me that attribute information
items with the [attribute type] ENUMERATION should be assigned the
type xs:NMTOKEN.

DM-LC2-0006: Construction from an PSVI (Attribute)

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Construction from an PSVI (Attribute)
Final para says:

  Note: attributes from the XML Schema instance namespace,
  "http://www.w3.org/2001/XMLSchema-instance", (xsi:schemaLocation,
  xsi:type, etc.) appear as ordinary attributes in the data model.
  They will be validated appropriately by schema processors and will
  simply appear as attributes of type xs:anySimpleType if they haven't
  been schema validated.

XML Schema basically ignores attributes in the XML Schema instance
namespace when it comes to validation: their validity doesn't get
assessed in the normal way, so you don't get the additional PSVI
properties on those attribute information items. Which means that they
should all end up with the type xdt:untypedAtomic (not
xs:anySimpleType as in the above paragraph).

So we might want to describe special treatment for these attributes,
so that xsi:type is of type xs:QName, for example. Given that we can't
know about the individual validity of the attributes, I think we can
only do this if the [validity] property of its parent element
information item is present and has the value "valid" -- the element
couldn't be valid, I think, if it had an xsi:* attribute on it that
didn't have an appropriate value.

DM-LC2-0008: Accessors (Namespace), definition of dm:typed-value()

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Accessors (Namespace), definition of dm:typed-value()
Given that the namespace must have URI values, can't we give a more
specific type for the typed value, such as xs:anyURI or xs:string?

DM-LC2-0009: Accessors (Text), definition of dm:type()

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Accessors (Text), definition of dm:type()
Definition of dm:type() says:

  Returns xdt:untypedAtomic.

Why do text nodes have a type when PIs, comments, namespace nodes and
document nodes don't? What is the purpose of text nodes all having the
same type? When is the type of a text node actually used? I think that
dm:type() for text nodes should return ().

DM-LC2-0010: Atomic Values, implementation defined types

open technical
http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Oct/0039.html
Jeni Tennison Atomic Values, implementation defined types
Atomic Values, 4th para says:

  Implementors may extend the set of types available. The value space
  of those types, as well as the behavior of those types when used in
  expressions, is implementation defined.

I don't think it's particularly clear what implementations can and
can't do. The value space of atomic values is defined in the previous
paragraph as being the union of the value spaces of the atomic types,
which are defined as being the primitive types and their subtypes (by
restriction).

So that would imply that implementers can only introduce types that
are restrictions of the primitive atomic types or, I suppose, types
that cross the boundaries of the value spaces of the primitive atomic
types.

What I'm asking is, can an implementation extend the set of primitive
atomic types to include, say, a ext:three-valued-boolean type with the
possible values true, false, and unknown? If so, such a type would not
be an "atomic type" under the current definition, so if it's possible,
that definition needs to change.

DM-LC2-0011: 1.1. The term type

open technical
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html
Schema WG 1.1. The term type
DM appears to use the term type for several related but different
concepts; we believe it would be desirable if you were to clarify the
meaning of the term, or at least if you called the reader's attention
to its overloading.

The Data Model specification appeals to the Formal Semantics
specification, which says types are XML Schema types. However, XML
Schema tries to avoid the term "type", instead using "type
definition".

Among the uses of “type” we have noticed are:

   1. T1. a name (for example, as used by the dm:type accessor).

   2. T2. a set of values (this sense is used by XML Schema's internal
      work on a formalization, which includes a "Type Lattice").

   3. T3. an XML Schema Type Definition component (simple or complex).
      Defines a set of values and certain properties, such as [name],
      [baseType], etc.

   4. T4. an OO class. Defines a set of values, inheritance info, and operators.

Specifically, we suggest that the dm:type accessor be renamed to
dm:type-name and that “type” be explicitly defined. If
“type” is just a synonym for “type
definition”, say so in the definition ot “type”.

DM-LC2-0012: 1.3. Items and singleton sequences

open technical
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html
Schema WG 1.3. Items and singleton sequences
Section 6 Sequences reads in part:

    An important characteristic of the data model is that there is no
    distinction between an item (a node or an atomic value) and a
    singleton sequence containing that item.

One consequence of this characteristic is that the types xs:integer
and a list of xs:integer with length constrained to 1 have exactly the
same value space in the Data Model. That is, each value in the value
space is a sequence of a single xs:integer. This is different from the
XML Schema value spaces for the two types. Might this cause a problem
for functions or other uses of the Data Model?

We believe further discussion is needed here.

DM-LC2-0013: 1.4. The implications of [validity] ≠ valid

open technical
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html
Schema WG 1.4. The implications of [validity] ≠ valid
Section 3.6 para 2 reads in part: “The only information that
can be inferred from an invalid or not known validity value is that
the information item is well-formed.”

This is not true in the general case: the values of the properties
[validity] and [validation attempted] interact, so that some
inferences beyond well-formedness can be made. (If [validity] is
'notKnown', for example, we can infer without examining the PSVI that
[validation attempted] is not 'full'. If for some node N [validity] is
'invalid', we can infer that declarations are available for at least
some element or attribute information items in the subtree rooted in
N.) The data model doesn't have to be interested in those inferences,
but it is simply incorrect to say that they don't exist.

On the whole, we believe that that the data model misses an
opportunity by failing to exploit the information contained in the
[validity] and [validation attempted] properties more fully.