[Bug 3048] RQ-152 Should XML Schema be aligned with XML 1.1? (xml1.1) from bugzilla@wiggum.w3.org on 2006-03-25 (www-xml-schema-comments@w3.org from January to March 2006)

From: <bugzilla@wiggum.w3.org>
Date: Sat, 25 Mar 2006 18:07:24 +0000
To: www-xml-schema-comments@w3.org
CC:
Message-Id: <E1FNDAa-0007DA-Ta@wiggum.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=3048

           Summary: RQ-152 Should XML Schema be aligned with XML 1.1?
                    (xml1.1)
           Product: XML Schema
           Version: 1.1 only
          Platform: Other
        OS/Version: All
            Status: NEW
          Keywords: unclassified
          Severity: normal
          Priority: P2
         Component: Structures: XSD Part 1
        AssignedTo: ht@w3.org
        ReportedBy: cmsmcq@w3.org
         QAContact: www-xml-schema-comments@w3.org


This issue was originally reported by Henry Thompson, Noah Mendelsohn.

Should XML Schema 1.1 be aligned with XML 1.1?  Some salient points on
which XML 1.1 differs from XML 1.0 include:

. Documents may now be labeled "<?xml version="1.1">".  Such a
  designation *MAY* be used, but is discouraged, if the document could
  also have been serialized as "<?xml version="1.0">"; the new
  designation is required, of course, when new features described below
  are used.

. The set of name characters for element and attribute names has been
  expanded, and indeed is now open-ended: XML 1.1 allows such names to
  include not just current Unicode characters, but others that may be
  assigned by the Unicode consortium in the future.  As I understand it,
  the distinction between the evolving flavors is not signaled in the
  XML declaration.  Version="1.1" allows any possible future characters,
  but only if the Unicode consortium has assigned them.

. The definition of "char"
  (http://www.w3.org/TR/2004/REC-xml11-20040204/#NT-Char) has been
  changed to allow previously disallowed control characters in the range
  #x1 through #x1f.

. Some new line end characters have been introduced.  These are
  handled quite early in XML processing, and may not cause schema much
  trouble because they won't be visible at the Infoset level where
  Schema works.

Several points on which alignment may be needed have been identified;
in the words of Henry Thompson (first item) and Noah Mendelsohn
(others):

1 XML 1.1 adds
  (http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-line-ends) #x85 and
  #x2028 to the characters involved in line break normalization.  XML
  Schema may need to change our whitespace handling in 1.1 to take
  account of this.

2 We use Infosets for instances and schemas.  There is a question as
  to how one knows whether the new names and content might appear in
  such an Infoset.  It may be implied that the switch is to be found in
  the [version] property of the document information item (
  http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#infoitem.document).
  Concerns regarding the Infoset include:

      - While the version property is indeed in the Infoset rec, and the
        2nd addition talks about needing a processor that can handle
        whatever serialized document you might have, I don't think it
        specifically ties the legal values of properties such as the
        [local name] of an element or legal [character codes] to this
        [version] property.  Synthetic Infosets, for example, need to
        be covered IMO.  For example, the newly published Infoset Rec
        says
(http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#infoitem.character)
        "[character code] The ISO 10646 character code (in the range 0
        to #x10FFFF, though not every value in this range is a legal
        XML character code) of the character.", which seems a bit
        vague on what it means to be an XML character.

      - We in schemas define both schema "documents" and instances to
        be validated as element information items, with no reference
        to a required or containing document information item.  I
        think we need to consider whether the [version] property of
        the doc info item would meet our need to determine what
        version of XML we've got with respect to instances and
        (purported) schema documents.

3 Our *xsd:string* type explicitly refers to the *char* production
  of XML 1.0 2nd addition.  Thus, it will not validate strings
  containing the control characters of XML 1.1.  We could perhaps
  introduce a new type that would validate the new content, but
  there are complications, including:

      - xsd:string is  base for types like xsd:token, so we 
        might have to create parallel versions of some of those

      - If you wanted to write a schema document that had an
        enumeration or fixed value constraint containing the new
        characters, then that schema document would have to be
        expressed as an XML 1.1 Infoset (see comment above regarding
        possible ambiguity about which Infosets are 1.1)

      - Our pattern language
        (http://www.w3.org/TR/xmlschema-2/#rf-pattern) is designed to
        constrain strings, but as I read the spec it defines
        (http://www.w3.org/TR/xmlschema-2/#dt-normalc) "A normal
        character is any XML character that is not a metacharacter."
        With the publication of XML 1.1 we see in hindsight that this
        is insufficiently precise.

4 Since the range of legal element names has changed, we face
  questions regarding our ability to validate element and attribute
  content using the new names.

      - If your schema is written as a schema document, then
        presumably you can only enter the names if the document is an
        XML 1.1 Infoset (similar to concern raised for enumerations on
        strings)

      - Since the range is implicitly extensible as Unicode changes,
        it would seem that even a label of XML 1.1 on an infoset for a
        schema document does not ensure that it has the expressive
        power to name all the XML element and attribute names that one
        might wish to validate.  Some processor might be checking the
        schema document with knowledge of, say Unicode 4.0, but the
        schema document might have been written with knowledge of a
        Unicode 5.0 that "assigned" no characters.


      - We have types such as *xsd:name*
        (http://www.w3.org/TR/xmlschema-2/#Name) about which our
        Recommendation says "[Definition:] Name represents XML
        Names. The *value space* of Name is the set of all strings
        which *match* the Name production of [XML 1.0 (Second
        Edition)]. The *lexical space* of Name is the set of all
        strings which *match* the Name production of [XML 1.0 (Second
        Edition)]. The *base type* of Name is token. "  Note that
        *xsd:token* is derived from *xsd:string*, which is discussed
        above.

      - We have an *xsd:Qname* type, the definition of which
        (http://www.w3.org/TR/xmlschema-2/#QName) is "[Definition:]
        QName represents XML qualified names. The *value space* of
        QName is the set of tuples {namespace name, local part}, where
        namespace name is an anyURI and local part is an NCName. The
        *lexical space* of QName is the set of strings that *match*
        the QName production of [Namespaces in XML]."  That link to
        [Namespaces in XML] (http://www.w3.org/TR/xmlschema-2/#XMLNS)
        is explicitly to: "World Wide Web Consortium. Namespaces in
        XML. Available at:
        http://www.w3.org/TR/1999/REC-xml-names-19990114/", which is
        to the 1999 Namespaces in XML recommendation.

      - We use that QName type in the schema for schemas for the
        names of elements and attributes to be validated, as well as
        for references within schemas.

      - Our component descriptions tend to have "{name}" properties
        that constrain their content by that same 1999 version of
        Namespaces.  See for example the element declaration schema
        component
        (http://www.w3.org/TR/xmlschema-1/#Element_Declaration_details).
        In general, there is a necessary tie between what we can put
        in these component properties, what we can express in a
        serialized schema document, what we can express in the
        corresponding schema document infoset, what's allowed by the
        *xsd:Qname* type, and the names of elements and attributes we
        can validate.

5 Our type system is used by others such as query, both in the data
  model and as the type system for functions and operators.  As we
  wrestle with the definitions of types like xsd:string and
  xsd:name, I presume that some intensive liaison with them will be
  needed.  It's not implausible that if we introduce an
  xsd:stringv11 type, that duplicate functions would be needed for
  every F&O function that accepts or returns a string.  Likewise
  for *xsd:Qname*, etc.  Other groups such as XMLP and RDF also use
  our type system and might be affected by changes or by lack of
  synergy with XML 1.0 or XML 1.1.

6 We talk about the representation of XML schema documents for
  retrieval on the web
  (http://www.w3.org/TR/xmlschema-1/#schema-repr).  The pertinent part
  of the description of the web resource to be retrieved says
  (http://www.w3.org/TR/xmlschema-1/#c-vxd): "It resolves to (a
  fragment of) a resource which is an XML document (of type
  "application/xml" or "text/xml" with an XML declaration for
  preference, but this is not required), which in turn corresponds to
  a "<schema>" element information item in a well-formed
  information set, which in turn corresponds to a valid schema."  It
  seems we now need to be clearer as to if and when such documents may
  have "<?xml version="1.1"?>", what the rules are for
  cross-importing and including across versions, etc.  All of these
  must be related to whatever we decide above regarding rules for our
  components, types, enumeration constraints, etc.

See note to comments list on 6 February 2004
(http://lists.w3.org/Archives/Public/www-xml-schema-comments/2004JanMar/0019.html)
from Henry Thompson.

See note to comments list on 19 February 2004
(http://lists.w3.org/Archives/Public/www-xml-schema-comments/2004JanMar/0027.html)
from Noah Mendelsohn.

See also proposed erratum for XML Schema 1.0 sent on 8 June 2004
(http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2004Jun/0068.html)
by C. M. Sperberg-McQueen (and ensuing discussion).

The underlying issue applies to both Structures and Datatypes.  
This Bugzilla entry is for Structures; see bug 1838 for the
entry for Datatypes.
Received on Saturday, 25 March 2006 18:08:23 UTC