Issues xmlsch-01,,12 Re: XML Schema WG comments on RDF documents

Michael,

Thank you for these comments.  I note particularly the care that has 
clearly been taken in reviewing the RDFCore documents.

I have indicated below links to our issue tracking document for the issues 
that you have raised as substantive.  Those that are editorial, I have left 
to the editors to decide whether they consider them within their editorial 
discretion.  I have in some cases recorded a comment against a 
different  document to the one that inspired it.  This reflects my 
understanding of the document that 'owns' the specification of a particular 
concept.

The document editors may respond further if they need any clarification of 
the comment.

Brian

At 14:33 10/03/2003 -0700, C. M. Sperberg-McQueen wrote:

>Colleagues:
>
>With apologies for the delay, I transmit to you herewith the comments
>of the XML Schema Working Group on the various RDF documents published
>in Last Call recently.  We congratulate you on the progress of your
>work and hope our comments are useful to you.  An HTML version of our
>comments may be found at
>
>http://www.w3.org/XML/Group/2003/03/xml-schema-rdf-notes.html

I'm told this is a link is to a page which is member visible but not public.

[...]

>1.1. Design question, complexity (substantive)
>
>    The introduction of pairs consisting of a lexical form and a type (or,
>    strictly speaking, a lexical form and a type label) seems at first
>    glance to complicate the RDF model somewhat. We have had the
>    impression that in other parts of RDF, typing is handled by adding
>    further arcs and nodes. If the type of a resource is identified by
>    having an arc labeled rdf:type from it to (the URI of) its (RDF) type,
>    and if the type of an arc is similarly identified by an arc, then
>    surely a reason ought to be given for shifting to a different method
>    for typing literal strings. It seems like a dramatic shift in the
>    infrastructure of RDF, from "everything is a node, an arc, or a
>    literal value" to "everything is a node, an arc, or a typed literal
>    value". Perhaps not quite so dramatic, after all. But the question of
>    design consistency remains: why not "everything is a typed node, a
>    typed arc, or a typed literal"?

Recorded against concepts as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-01


>1.2. Whitespace handling (schema-related)
>
>    Some members of the XML Schema WG have expressed concern that XML
>    Schema's rules for whitespace handling may interfere with expected
>    behavior in other contexts. This may be the appropriate place to bring
>    this question up.
>    In brief, XML Schema's simple types each define a whitespace facet,
>    which governs the kind of whitespace pre-processing done by an XML
>    Schema processor before the lexical form is checked for type validity.
>    Since the point of whitespace normalization is to simplify subsequent
>    processing, the lexical spaces of XML Schema's simple types are (like
>    those in many programming languages) defined without reference to the
>    preceding whitespace normalization. Integers, for example, are
>    represented by sequences of decimal digits; sequences containing
>    blanks are not legal lexical forms for integers. Indeed, strictly
>    speaking it is only after the whitespace pre-processing is done that
>    the XML Schema processor can be said to be working with a lexical form
>    at all.
>    For example, the integer type has a value of collapse for the
>    whitespace facet, which means leading and trailing whitespace is
>    stripped, and internal whitespace sequences are reduced to a single
>    blank (x20) character. In an XML document in which the element
>    exterms:age is defined as having type xs:integer, the following
>    instances of exterms:age will all be type-valid:
>
>      <exterms:age>27</exterms:age>
>      <exterms:age>
>        27
>      </exterms:age>
>      <exterms:age>   27  </exterms:age>
>      <exterms:age>   2<!--* ha, ha, fooled your full-text indexer!
>      *-->7  </exterms:age>
>
>    The input information set, in each case, contains a character
>    information item for "2" followed by a character information item for
>    "7", with character information items for whitespace characters, and a
>    comment information item, present in some of the examples. In all
>    cases, the lexical form proper is the character sequence "27" (i.e.
>    the sequence of characters after white space handling, and ignoring
>    comments, processing instructions, entity boundaries, and other
>    distractions). This is a legal lexical form for an integer, so all the
>    examples are type valid.
>    Some members of the XML Schema WG have worried that it may not be
>    obvious that the whitespace processing is not part of the process of
>    checking lexical forms for type validity, but part of the process of
>    extracting the lexical forms from the XML information set presented to
>    the processor. If an RDF document contains
>
>      <exterms:age>   27  </exterms:age>
>
>    and a processor hands the contents of the element to a generic
>    type-checker for XML Schema's simple types, saying in effect "this
>    purports to be the lexical form of an integer; is that OK?", that type
>    checker will be required (if it conforms to the XML Schema spec's
>    definition of the simple types) to say "no, the character sequence
>    `   27  ' is not a legal lexical form for an integer."
>    It's not clear whether RDF, being type-system neutral, can directly
>    address this concern (e.g. by specifying that an RDF processor should
>    do the appropriate whitespace pre-processing, or by warning users that
>    they should not include vagrant whitespace in typed literals), or
>    whether it suffices for developers of RDF software with built-in
>    support for XML Schema's simple types to deal with it, e.g. by
>    performing it themselves before handing the resulting lexical form to
>    a type checker.
>    As noted, some members of our WG feel that you need to be alerted to
>    this as a possible source of confusion and unexpected results. Other
>    members of the WG feel that it verges on disrespect to assume that you
>    need instruction on this point. We compromised by agreeing to point
>    out the issue to you, and to leave you to draw your own conclusions.

Recorded against concepts as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-02


>2. Notes on RDF Concepts and Abstract Syntax
>
>2.1. Mapping from lexical forms to values (schema-related, terminological)
>
>    In [21]http://www.w3.org/TR/rdf-concepts/#section-Datatypes:
>
>      [21] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
>
>      A datatype mapping is a set of pairs whose first element belongs to
>      the lexical space of the datatype, and the second element belongs
>      to the value space of the datatype:
>
>    We agree that it is useful to define a term to denote such mappings;
>    in the interests of inter-specification consistency, we wonder whether
>    you would be willing to consider using the term lexical mapping, which
>    we are introducing in our forthcoming draft of XML Schema 1.1. The
>    term datatype mapping seems unlikely to be usable in the XML Schema
>    specification, where it would suggest to some readers a mapping from
>    one datatype to another, rather than as here a mapping from lexical
>    space to value space. (XML Schema 1.0 got by without a term for this
>    concept.)

Recorded as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-03


>2.2. Values without lexical forms (schema-related, important)
>
>    In [22]http://www.w3.org/TR/rdf-concepts/#section-Datatypes:
>
>      [22] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
>
>      * Each member of the value space may be paired with any number
>        (including zero) of members of the lexical space (lexical
>        representations for that value).
>
>    The provision for values without corresponding lexical forms
>    contradicts an assumption to which the XML Schema spec appeals from
>    time to time. The lexical space of any simple datatype in XML Schema
>    is the domain of the type's lexical mapping; the value space is its
>    domain. There are no meaningless lexical forms in the lexical space of
>    the type, nor are there ineffable values in the value space. By
>    eliminating values from the value space (e.g. by setting minimal and
>    maximal values), the type definer may indirectly also eliminate
>    lexical forms from the lexical space; conversely, by eliminating some
>    items from the lexical space (e.g. by setting a pattern), the type
>    definer may eliminate items from the value space.
>    Are there crucial aspects of RDF which will break if the list item
>    quoted above is changed to read "paired with one or more members of
>    the lexical space"?

Recorded as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-04


>2.3. Lexical forms, strings, and character sequences (schema-related,
>editorial)
>
>    In [23]http://www.w3.org/TR/rdf-concepts/#section-Datatypes:
>
>      [23] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
>
>      With one exception, the datatypes used in RDF have a lexical space
>      consisting of a set of strings.
>
>    Since "string" is used as the local name for a particular simple type
>    in the XML Schema namespace, we believe it will be less confusing for
>    users, in the long run, if the lexical representations of
>    simple-datatype values are described not as "strings" but as
>    "character sequences".
>    This comment also applies to other uses of the term string to denote
>    the members of a lexical space.

Recorded as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-05


>2.4. Strings for natural-language data (substantive)
>
>    In [24]http://www.w3.org/TR/rdf-concepts/#section-Datatypes:
>
>      [24] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
>
>      * A plain literal is a string combined with an optional language
>        identifier. This should be used for plain text in a natural
>        language. As recommended in the RDF formal semantics
>        [RDF-SEMANTICS], these plain literals are self-denoting.
>
>    We do not believe that simple strings are likely to be adequate for
>    the representation of arbitrary natural-language text. Even in
>    English, natural-language utterances (such as this document) may need
>    some degree of inline markup for clarity and adequate presentation; in
>    natural-language utterances requiring bidirectional display or ruby,
>    the best authorities (including the W3C I18n Working Group) recommend
>    the use of markup within the natural-language utterance. We thus
>    suggest that you may wish to moderate this recommendation that
>    natural-language material be represented by literals.
>    This is not an area in which we claim particular technical expertise;
>    we merely call it to your attention in the hopes that doing so may be
>    useful to you.

Recorded as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-06


>2.5. Typos and minor editorial notes
>
>    In [25]http://www.w3.org/TR/rdf-concepts/#section-Literal-Value, for
>    "the datatype mapping is applied to the pair form by the lexical form
>    and the language identifier" read "the datatype mapping is applied to
>    the pair formed by the lexical form and the language identifier".
>    In the same section, for "Such a case, while in error, is not
>    syntacticly ill-formed " read "Such a case, while in error, is not
>    syntactically ill-formed" (et passim).
>    In section [26]http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral,
>    for "root element tag" read "root element".
>    In the same section, for "XML element content" read "XML data" (the
>    term element content is used in some markup-related specs as a
>    complement of mixed content to denote the content of elements which
>    can contain other elements but cannot contain parsed character data).
>
>      [25] http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
>      [26] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral

Left, for now, to editor's discretion.


>3. Notes on RDF Semantics
>
>3.1. The "meaning" of literals (editorial)
>
>      The meaning of a literal is principally determined by its character
>      string: it either refers to the value mapped from the string by the
>      associated datatype, or if no datatype is provided then it refers
>      to the literal itself, which is either a unicode character string
>      or a pair of a string with a language tag.
>
>    Some members of the XML Schema WG are made nervous by the appeal to
>    the notion of "meaning" here. [N.B. our task force read this section
>    out of context, and were not aware of any foregoing elucidation. So
>    this comment may be out of place.] There is also some concern about
>    the apparent conflation here of the notions of meaning and reference.
>    We wonder whether this discussion would be weakened by replacing
>    references to meaning and reference by references to denotation; we
>    are inclined to think it would be an improvement, but recognize that
>    the RDF Core WG's views may differ.

Left, for now, to editors discretion.


>3.2. Types as lexical mappings (schema-related)
>
>      A datatype is an entity characterized by a set of character strings
>      called lexical forms and a mapping from that set to a set of
>      values.
>
>    We have a couple of reservations concerning this characterization.
>      * Elsewhere (e.g. in Concepts and Abstract Syntax, section 3.3,
>        [27]http://www.w3.org/TR/rdf-concepts/#section-Datatypes), the RDF
>        specs say that there may be values in a value space which are not
>        in the range of the lexical mapping; we have suggested that if
>        possible those statements should be changed, but if they are
>        retained, then a datatype cannot be characterized solely by the
>        lexical space and the lexical mapping, because such ineffable
>        values appear in neither of these.
>      * The statement describes (with the exception of the problem just
>        noted) simple datatypes, but not the class of complex datatypes
>        which can be defined by XML Schema, nor all the types (or
>        type-like constructs) definable in various other schema languages
>        for XML.
>
>      [27] http://www.w3.org/TR/rdf-concepts/#section-Datatypes

Recorded as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-07


>3.3. Miscellaneous editorial notes
>
>    In [28]http://www.w3.org/TR/rdf-mt/#dtype_interp, for "which we will
>    refer to as XSD and use the Qname prefix xsd:" read "which we will
>    refer to as XSD and denote using the Qname prefix xsd" (or something
>    similar).
>    In [29]http://www.w3.org/TR/rdf-mt/#dtype_interp:
>
>      [28] http://www.w3.org/TR/rdf-mt/#dtype_interp
>      [29] http://www.w3.org/TR/rdf-mt/#dtype_interp
>
>      For example, XML Schema requires that the value spaces of
>      xsd:string and xsd:decimal to be disjoint ...
>
>    This sentence is not exactly wrong, but it seems slightly unusual to
>    use the verb require here, instead of define or something similar. We
>    suggest recasting this as "For example, XML Schema defines the value
>    spaces of xsd:string and xsd:decimal as disjoint ..." (Note, for the
>    record, that the value spaces of all the primitive simple datatypes of
>    XML Schema 1.0 are pairwise disjoint.)
>    In ,
>
>      any literal of the form "sss"@ttt^^ddd, where ddd is not
>      rdf:XMLLiteral, treated as identical to the same literal without
>      the language tag, "sss"@ddd
>
>    is "sss"@ddd a typo for "sss"^^ddd?
>    In [30]http://www.w3.org/TR/rdf-mt/#dtype_entail, for "it is valid to
>    add any number of leading zeros to any numeral and still be a correct
>    lexical form for xsd:integer", perhaps read "it is possible to add any
>    number of leading zeros to any lexical form for xs:integer without it
>    ceasing to be a correct lexical form for xsd:integer"
>
>      [30] http://www.w3.org/TR/rdf-mt/#dtype_entail

Left to editors discretion.


>4. Notes on RDF/XML Syntax Specification (Revised)
>
>    RDF/XML Syntax, [31]http://www.w3.org/TR/rdf-syntax-grammar/
>
>      [31] http://www.w3.org/TR/rdf-syntax-grammar/
>
>4.1. Manifest typing in the instance (policy)
>
>      RDF allows Typed Literals to be given as the object node of arcs.
>      These consist of a literal string (with optional language) and a
>      datatype RDF URI Reference. This is handled ... with an additional
>      rdf:datatype="datatypeURI" attribute on the property element.
>
>    We believe there are probably good reasons for using an rdf:datatype
>    attribute, instead of re-using the existing xsi:type attribute which
>    has (when the type is defined in a schema defined by XML Schema 1.0)
>    the same semantics. In particular, rdf:datatype does not assume or
>    assert the existence of the type named as a type in a schema defined
>    by XML Schema, so it would be problematic to use xsi:type.
>    We do fear, however, that users are likely to find this
>    near-duplication of the meaning and function of xsi:type confusing. It
>    is not clear to us what, if anything, can or should be done to
>    minimize this danger.


Recorded as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-08


>4.2. QNames (Editorial, but important)
>
>    We were unable, on a first reading, to determine whether the default
>    namespace declaration, and thus unprefixed names, were or were not
>    allowed in order to encode 'RDF URI References'. Indeed the
>    introductory prose about QNames (2nd para of
>    [32]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-intro])
>    does not seem to connect up with the relevant (?) production in
>    [33]http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar]
>    , which we take to be
>    [34]http://www.w3.org/TR/rdf-syntax-grammar/#URI-reference].
>    This can and should be cleared up.
>
>      [32] http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-intro
>      [33] http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar
>      [34] http://www.w3.org/TR/rdf-syntax-grammar/#URI-reference

Recorded as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-09


>4.3. Miscellaneous editorial notes
>
>    In
>    [35]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-empty-prop
>    erty-elements, the sentence
>
>      [35] 
> http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-empty-property-elements
>
>      When an arc in an RDF Graph points to an object node which has no
>      further arcs, which appears in RDF/XML as an empty node element
>      sequence such as the pair <rdf:Description rdf:about="...">
>      </rdf:Description>, this form can be shortened.
>
>    seems less clear than it might be. Different readers prove to have
>    different views on what is meant by "the pair <rdf:Description
>    rdf:about="..."> </rdf:Description>"; perhaps it can be replaced by
>    something like "the empty element <rdf:Description rdf:about="..."/>"
>    without loss of precision? Perhaps the sentence could read
>
>      When an arc in an RDF Graph points to an object node which has no
>      further arcs, which appears in RDF/XML as an empty node element
>      such as <rdf:Description rdf:about="..."/>, this form can be
>      shortened.

Left to editor's discretion


>4.4. Normative specification of XML grammar (policy, substantive)
>
>    We note with admiration the excellent tutorial introduction to the
>    striped syntax in Section 2
>    [36]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax]. We are
>    less happy with the nature of the syntax, and with the approach taken
>    to its normative statement
>    [37]http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar]
>    .
>    As regards the syntax itself, we would much prefer to have seen a move
>    to a single canonical syntax with much less variablity. With respect,
>    the current design suggests that the value of XML has been
>    misunderstood. The range of alternative forms of expression provided
>    for in the current design make it very difficult to use the broad
>    range of generic XML tools (e.g. syntax-directed editors, XSLT) which
>    could give so much benefit to RDF users. (More on this below.) At the
>    very least we would encourage you to specify a single canonical form,
>    probably strictly striped, which could be defined by an XML Schema or
>    DTD. We would be happy to work with you to develop a schema for such a
>    subset.
>    As regards the approach taken to defining the syntax, in our view,
>    layering of specs has very high value, and so defining an XML document
>    type by way of what is very nearly a character-level BNF is at best a
>    missed opportunity and at worst a serious mistake. It obscures the
>    important aspects of the document type behind a welter of irrelevant
>    detail about e.g. whitespace and start-tag/end-tag matching. It makes
>    it very difficult for the reader to actually understand what is and
>    isn't actually allowed -- what an RDF/XML document actually looks
>    like.
>    Not only does this confuse levels and thus readers, it also runs the
>    risk of inadvertently defining an XML subset. It also appears, on a
>    strict reading, to rule out XML documents not derived from the parsing
>    of character streams as possible RDF/XML (so that it would be
>    illegitimate to regard a data structure created using a DOM interface,
>    for example, as RDF/XML).
>    The use of event-triggered data-model construction actions to specify
>    the relationship between XML representation and corresponding data
>    objects is innovative and compelling, but surely it would be
>    straight-forward to associate these events with a pre-order traversal
>    of an infoset independently constrained by a DTD, XML Schema schema or
>    other appropriate definition of the canonical document type. If
>    continued support for alternative forms is considered essential, then
>    a two-step approach where the semantics of any non-canonical form is
>    defined in terms of a canonical form to which it corresponds would
>    still be far simpler than the current approach.
>
>      [36] http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax
>      [37] http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar

Recorded as

   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-10
and
   http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-11


>4.5. On the relation between RDF and off-the-shelf XML tools (policy,
>substantive)
>
>    With some diffidence, we conclude by raising what may be a sensitive
>    issue.
>    It does not seem to us that the XML serialization of RDF shows RDF to
>    advantage. At the level of the underlying graph model, RDF information
>    has a simple and regular structure, which appears in the XML
>    serialization to be anything but simple and so irregular as to bring
>    the words "capricious" and "arbitrary" to the lips of unprejudiced
>    observers. Tastes in markup style differ, but we believe that the root
>    of the problem is the high degree of variability with which the same
>    underlying graph structures may be serialized, according to the rules
>    given in this document.
>    Owing in part to the variability itself, and in part to the specific
>    forms taken by that variability, it is not feasible to write an XML
>    Schema schema, or (if the comments in Appendix A.1 are accurate) a
>    Relax NG schema, or an XML 1.0 DTD, which defines the set of correct
>    serializations of correct RDF graphs. It is not convenient to run XSLT
>    processes over arbitrary RDF serializations, nor to query or process
>    arbitrary RD data using XQuery. Arbitrary RDF data is similarly
>    inconvenient for other standard XML tools to process.
>    There is, as a result, something of a cleft between the RDF community
>    and the set of RDF tools on the one hand, and the community of users
>    and tools employing what some have called colloquial XML. The parallel
>    development of query languages, schema languages, object models, APIs,
>    editors, display tools, and so on does offer relatively harmless ways
>    for a large number of people to employ their time, but it does not
>    seem to us to serve the larger Web community well.
>    The cleft between RDF and colloquial XML does not seem to us to be
>    required by the RDF data model. A graph in which nodes have certain
>    properties and arcs have certain properties is not, in itself, a
>    peculiarly difficult structure to render in XML or to process with
>    off-the-shelf XML tools. An XML vocabulary in which nodes may appear
>    as elements, or as attributes, or as attribute values, or as the
>    PCDATA content of elements, and in which property names may appear as
>    three of the same four constructs, on the other hand, seems a rather
>    less straightforward XML representation of the underlying graph
>    structure than most XML vocabularies for graphs have chosen.
>    The result is that not just arbitrary RDF data, but data encoded using
>    vocabularies defined in RDF terms (for which current W3C work provides
>    a number of examples), will be hard to process using off-the-shelf
>    tools. We believe this difficulty represents a lost opportunity, and
>    we believe the opportunity could readily be seized if the XML
>    serialization were modified to capture more of the regularity of the
>    RDF data model.
>    We are ready to work together with the Working Groups in the Semantic
>    Web Activity and with other interested parties to formulate an XML
>    serialization which captures the information in the RDF model and
>    which is more readily amenable to processing with off-the-shelf XML
>    tools.

Recorded as

    http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-12

Received on Wednesday, 12 March 2003 05:17:54 UTC