- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Wed, 12 Mar 2003 10:17:48 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, www-rdf-comments@w3.org
- Cc: W3C XML Schema IG <w3c-xml-schema-ig@w3.org>
Michael,
Thank you for these comments. I note particularly the care that has
clearly been taken in reviewing the RDFCore documents.
I have indicated below links to our issue tracking document for the issues
that you have raised as substantive. Those that are editorial, I have left
to the editors to decide whether they consider them within their editorial
discretion. I have in some cases recorded a comment against a
different document to the one that inspired it. This reflects my
understanding of the document that 'owns' the specification of a particular
concept.
The document editors may respond further if they need any clarification of
the comment.
Brian
At 14:33 10/03/2003 -0700, C. M. Sperberg-McQueen wrote:
>Colleagues:
>
>With apologies for the delay, I transmit to you herewith the comments
>of the XML Schema Working Group on the various RDF documents published
>in Last Call recently. We congratulate you on the progress of your
>work and hope our comments are useful to you. An HTML version of our
>comments may be found at
>
>http://www.w3.org/XML/Group/2003/03/xml-schema-rdf-notes.html
I'm told this is a link is to a page which is member visible but not public.
[...]
>1.1. Design question, complexity (substantive)
>
> The introduction of pairs consisting of a lexical form and a type (or,
> strictly speaking, a lexical form and a type label) seems at first
> glance to complicate the RDF model somewhat. We have had the
> impression that in other parts of RDF, typing is handled by adding
> further arcs and nodes. If the type of a resource is identified by
> having an arc labeled rdf:type from it to (the URI of) its (RDF) type,
> and if the type of an arc is similarly identified by an arc, then
> surely a reason ought to be given for shifting to a different method
> for typing literal strings. It seems like a dramatic shift in the
> infrastructure of RDF, from "everything is a node, an arc, or a
> literal value" to "everything is a node, an arc, or a typed literal
> value". Perhaps not quite so dramatic, after all. But the question of
> design consistency remains: why not "everything is a typed node, a
> typed arc, or a typed literal"?
Recorded against concepts as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-01
>1.2. Whitespace handling (schema-related)
>
> Some members of the XML Schema WG have expressed concern that XML
> Schema's rules for whitespace handling may interfere with expected
> behavior in other contexts. This may be the appropriate place to bring
> this question up.
> In brief, XML Schema's simple types each define a whitespace facet,
> which governs the kind of whitespace pre-processing done by an XML
> Schema processor before the lexical form is checked for type validity.
> Since the point of whitespace normalization is to simplify subsequent
> processing, the lexical spaces of XML Schema's simple types are (like
> those in many programming languages) defined without reference to the
> preceding whitespace normalization. Integers, for example, are
> represented by sequences of decimal digits; sequences containing
> blanks are not legal lexical forms for integers. Indeed, strictly
> speaking it is only after the whitespace pre-processing is done that
> the XML Schema processor can be said to be working with a lexical form
> at all.
> For example, the integer type has a value of collapse for the
> whitespace facet, which means leading and trailing whitespace is
> stripped, and internal whitespace sequences are reduced to a single
> blank (x20) character. In an XML document in which the element
> exterms:age is defined as having type xs:integer, the following
> instances of exterms:age will all be type-valid:
>
> <exterms:age>27</exterms:age>
> <exterms:age>
> 27
> </exterms:age>
> <exterms:age> 27 </exterms:age>
> <exterms:age> 2<!--* ha, ha, fooled your full-text indexer!
> *-->7 </exterms:age>
>
> The input information set, in each case, contains a character
> information item for "2" followed by a character information item for
> "7", with character information items for whitespace characters, and a
> comment information item, present in some of the examples. In all
> cases, the lexical form proper is the character sequence "27" (i.e.
> the sequence of characters after white space handling, and ignoring
> comments, processing instructions, entity boundaries, and other
> distractions). This is a legal lexical form for an integer, so all the
> examples are type valid.
> Some members of the XML Schema WG have worried that it may not be
> obvious that the whitespace processing is not part of the process of
> checking lexical forms for type validity, but part of the process of
> extracting the lexical forms from the XML information set presented to
> the processor. If an RDF document contains
>
> <exterms:age> 27 </exterms:age>
>
> and a processor hands the contents of the element to a generic
> type-checker for XML Schema's simple types, saying in effect "this
> purports to be the lexical form of an integer; is that OK?", that type
> checker will be required (if it conforms to the XML Schema spec's
> definition of the simple types) to say "no, the character sequence
> ` 27 ' is not a legal lexical form for an integer."
> It's not clear whether RDF, being type-system neutral, can directly
> address this concern (e.g. by specifying that an RDF processor should
> do the appropriate whitespace pre-processing, or by warning users that
> they should not include vagrant whitespace in typed literals), or
> whether it suffices for developers of RDF software with built-in
> support for XML Schema's simple types to deal with it, e.g. by
> performing it themselves before handing the resulting lexical form to
> a type checker.
> As noted, some members of our WG feel that you need to be alerted to
> this as a possible source of confusion and unexpected results. Other
> members of the WG feel that it verges on disrespect to assume that you
> need instruction on this point. We compromised by agreeing to point
> out the issue to you, and to leave you to draw your own conclusions.
Recorded against concepts as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-02
>2. Notes on RDF Concepts and Abstract Syntax
>
>2.1. Mapping from lexical forms to values (schema-related, terminological)
>
> In [21]http://www.w3.org/TR/rdf-concepts/#section-Datatypes:
>
> [21] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
>
> A datatype mapping is a set of pairs whose first element belongs to
> the lexical space of the datatype, and the second element belongs
> to the value space of the datatype:
>
> We agree that it is useful to define a term to denote such mappings;
> in the interests of inter-specification consistency, we wonder whether
> you would be willing to consider using the term lexical mapping, which
> we are introducing in our forthcoming draft of XML Schema 1.1. The
> term datatype mapping seems unlikely to be usable in the XML Schema
> specification, where it would suggest to some readers a mapping from
> one datatype to another, rather than as here a mapping from lexical
> space to value space. (XML Schema 1.0 got by without a term for this
> concept.)
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-03
>2.2. Values without lexical forms (schema-related, important)
>
> In [22]http://www.w3.org/TR/rdf-concepts/#section-Datatypes:
>
> [22] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
>
> * Each member of the value space may be paired with any number
> (including zero) of members of the lexical space (lexical
> representations for that value).
>
> The provision for values without corresponding lexical forms
> contradicts an assumption to which the XML Schema spec appeals from
> time to time. The lexical space of any simple datatype in XML Schema
> is the domain of the type's lexical mapping; the value space is its
> domain. There are no meaningless lexical forms in the lexical space of
> the type, nor are there ineffable values in the value space. By
> eliminating values from the value space (e.g. by setting minimal and
> maximal values), the type definer may indirectly also eliminate
> lexical forms from the lexical space; conversely, by eliminating some
> items from the lexical space (e.g. by setting a pattern), the type
> definer may eliminate items from the value space.
> Are there crucial aspects of RDF which will break if the list item
> quoted above is changed to read "paired with one or more members of
> the lexical space"?
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-04
>2.3. Lexical forms, strings, and character sequences (schema-related,
>editorial)
>
> In [23]http://www.w3.org/TR/rdf-concepts/#section-Datatypes:
>
> [23] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
>
> With one exception, the datatypes used in RDF have a lexical space
> consisting of a set of strings.
>
> Since "string" is used as the local name for a particular simple type
> in the XML Schema namespace, we believe it will be less confusing for
> users, in the long run, if the lexical representations of
> simple-datatype values are described not as "strings" but as
> "character sequences".
> This comment also applies to other uses of the term string to denote
> the members of a lexical space.
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-05
>2.4. Strings for natural-language data (substantive)
>
> In [24]http://www.w3.org/TR/rdf-concepts/#section-Datatypes:
>
> [24] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
>
> * A plain literal is a string combined with an optional language
> identifier. This should be used for plain text in a natural
> language. As recommended in the RDF formal semantics
> [RDF-SEMANTICS], these plain literals are self-denoting.
>
> We do not believe that simple strings are likely to be adequate for
> the representation of arbitrary natural-language text. Even in
> English, natural-language utterances (such as this document) may need
> some degree of inline markup for clarity and adequate presentation; in
> natural-language utterances requiring bidirectional display or ruby,
> the best authorities (including the W3C I18n Working Group) recommend
> the use of markup within the natural-language utterance. We thus
> suggest that you may wish to moderate this recommendation that
> natural-language material be represented by literals.
> This is not an area in which we claim particular technical expertise;
> we merely call it to your attention in the hopes that doing so may be
> useful to you.
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-06
>2.5. Typos and minor editorial notes
>
> In [25]http://www.w3.org/TR/rdf-concepts/#section-Literal-Value, for
> "the datatype mapping is applied to the pair form by the lexical form
> and the language identifier" read "the datatype mapping is applied to
> the pair formed by the lexical form and the language identifier".
> In the same section, for "Such a case, while in error, is not
> syntacticly ill-formed " read "Such a case, while in error, is not
> syntactically ill-formed" (et passim).
> In section [26]http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral,
> for "root element tag" read "root element".
> In the same section, for "XML element content" read "XML data" (the
> term element content is used in some markup-related specs as a
> complement of mixed content to denote the content of elements which
> can contain other elements but cannot contain parsed character data).
>
> [25] http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
> [26] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral
Left, for now, to editor's discretion.
>3. Notes on RDF Semantics
>
>3.1. The "meaning" of literals (editorial)
>
> The meaning of a literal is principally determined by its character
> string: it either refers to the value mapped from the string by the
> associated datatype, or if no datatype is provided then it refers
> to the literal itself, which is either a unicode character string
> or a pair of a string with a language tag.
>
> Some members of the XML Schema WG are made nervous by the appeal to
> the notion of "meaning" here. [N.B. our task force read this section
> out of context, and were not aware of any foregoing elucidation. So
> this comment may be out of place.] There is also some concern about
> the apparent conflation here of the notions of meaning and reference.
> We wonder whether this discussion would be weakened by replacing
> references to meaning and reference by references to denotation; we
> are inclined to think it would be an improvement, but recognize that
> the RDF Core WG's views may differ.
Left, for now, to editors discretion.
>3.2. Types as lexical mappings (schema-related)
>
> A datatype is an entity characterized by a set of character strings
> called lexical forms and a mapping from that set to a set of
> values.
>
> We have a couple of reservations concerning this characterization.
> * Elsewhere (e.g. in Concepts and Abstract Syntax, section 3.3,
> [27]http://www.w3.org/TR/rdf-concepts/#section-Datatypes), the RDF
> specs say that there may be values in a value space which are not
> in the range of the lexical mapping; we have suggested that if
> possible those statements should be changed, but if they are
> retained, then a datatype cannot be characterized solely by the
> lexical space and the lexical mapping, because such ineffable
> values appear in neither of these.
> * The statement describes (with the exception of the problem just
> noted) simple datatypes, but not the class of complex datatypes
> which can be defined by XML Schema, nor all the types (or
> type-like constructs) definable in various other schema languages
> for XML.
>
> [27] http://www.w3.org/TR/rdf-concepts/#section-Datatypes
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-07
>3.3. Miscellaneous editorial notes
>
> In [28]http://www.w3.org/TR/rdf-mt/#dtype_interp, for "which we will
> refer to as XSD and use the Qname prefix xsd:" read "which we will
> refer to as XSD and denote using the Qname prefix xsd" (or something
> similar).
> In [29]http://www.w3.org/TR/rdf-mt/#dtype_interp:
>
> [28] http://www.w3.org/TR/rdf-mt/#dtype_interp
> [29] http://www.w3.org/TR/rdf-mt/#dtype_interp
>
> For example, XML Schema requires that the value spaces of
> xsd:string and xsd:decimal to be disjoint ...
>
> This sentence is not exactly wrong, but it seems slightly unusual to
> use the verb require here, instead of define or something similar. We
> suggest recasting this as "For example, XML Schema defines the value
> spaces of xsd:string and xsd:decimal as disjoint ..." (Note, for the
> record, that the value spaces of all the primitive simple datatypes of
> XML Schema 1.0 are pairwise disjoint.)
> In ,
>
> any literal of the form "sss"@ttt^^ddd, where ddd is not
> rdf:XMLLiteral, treated as identical to the same literal without
> the language tag, "sss"@ddd
>
> is "sss"@ddd a typo for "sss"^^ddd?
> In [30]http://www.w3.org/TR/rdf-mt/#dtype_entail, for "it is valid to
> add any number of leading zeros to any numeral and still be a correct
> lexical form for xsd:integer", perhaps read "it is possible to add any
> number of leading zeros to any lexical form for xs:integer without it
> ceasing to be a correct lexical form for xsd:integer"
>
> [30] http://www.w3.org/TR/rdf-mt/#dtype_entail
Left to editors discretion.
>4. Notes on RDF/XML Syntax Specification (Revised)
>
> RDF/XML Syntax, [31]http://www.w3.org/TR/rdf-syntax-grammar/
>
> [31] http://www.w3.org/TR/rdf-syntax-grammar/
>
>4.1. Manifest typing in the instance (policy)
>
> RDF allows Typed Literals to be given as the object node of arcs.
> These consist of a literal string (with optional language) and a
> datatype RDF URI Reference. This is handled ... with an additional
> rdf:datatype="datatypeURI" attribute on the property element.
>
> We believe there are probably good reasons for using an rdf:datatype
> attribute, instead of re-using the existing xsi:type attribute which
> has (when the type is defined in a schema defined by XML Schema 1.0)
> the same semantics. In particular, rdf:datatype does not assume or
> assert the existence of the type named as a type in a schema defined
> by XML Schema, so it would be problematic to use xsi:type.
> We do fear, however, that users are likely to find this
> near-duplication of the meaning and function of xsi:type confusing. It
> is not clear to us what, if anything, can or should be done to
> minimize this danger.
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-08
>4.2. QNames (Editorial, but important)
>
> We were unable, on a first reading, to determine whether the default
> namespace declaration, and thus unprefixed names, were or were not
> allowed in order to encode 'RDF URI References'. Indeed the
> introductory prose about QNames (2nd para of
> [32]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-intro])
> does not seem to connect up with the relevant (?) production in
> [33]http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar]
> , which we take to be
> [34]http://www.w3.org/TR/rdf-syntax-grammar/#URI-reference].
> This can and should be cleared up.
>
> [32] http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-intro
> [33] http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar
> [34] http://www.w3.org/TR/rdf-syntax-grammar/#URI-reference
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-09
>4.3. Miscellaneous editorial notes
>
> In
> [35]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-empty-prop
> erty-elements, the sentence
>
> [35]
> http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-empty-property-elements
>
> When an arc in an RDF Graph points to an object node which has no
> further arcs, which appears in RDF/XML as an empty node element
> sequence such as the pair <rdf:Description rdf:about="...">
> </rdf:Description>, this form can be shortened.
>
> seems less clear than it might be. Different readers prove to have
> different views on what is meant by "the pair <rdf:Description
> rdf:about="..."> </rdf:Description>"; perhaps it can be replaced by
> something like "the empty element <rdf:Description rdf:about="..."/>"
> without loss of precision? Perhaps the sentence could read
>
> When an arc in an RDF Graph points to an object node which has no
> further arcs, which appears in RDF/XML as an empty node element
> such as <rdf:Description rdf:about="..."/>, this form can be
> shortened.
Left to editor's discretion
>4.4. Normative specification of XML grammar (policy, substantive)
>
> We note with admiration the excellent tutorial introduction to the
> striped syntax in Section 2
> [36]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax]. We are
> less happy with the nature of the syntax, and with the approach taken
> to its normative statement
> [37]http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar]
> .
> As regards the syntax itself, we would much prefer to have seen a move
> to a single canonical syntax with much less variablity. With respect,
> the current design suggests that the value of XML has been
> misunderstood. The range of alternative forms of expression provided
> for in the current design make it very difficult to use the broad
> range of generic XML tools (e.g. syntax-directed editors, XSLT) which
> could give so much benefit to RDF users. (More on this below.) At the
> very least we would encourage you to specify a single canonical form,
> probably strictly striped, which could be defined by an XML Schema or
> DTD. We would be happy to work with you to develop a schema for such a
> subset.
> As regards the approach taken to defining the syntax, in our view,
> layering of specs has very high value, and so defining an XML document
> type by way of what is very nearly a character-level BNF is at best a
> missed opportunity and at worst a serious mistake. It obscures the
> important aspects of the document type behind a welter of irrelevant
> detail about e.g. whitespace and start-tag/end-tag matching. It makes
> it very difficult for the reader to actually understand what is and
> isn't actually allowed -- what an RDF/XML document actually looks
> like.
> Not only does this confuse levels and thus readers, it also runs the
> risk of inadvertently defining an XML subset. It also appears, on a
> strict reading, to rule out XML documents not derived from the parsing
> of character streams as possible RDF/XML (so that it would be
> illegitimate to regard a data structure created using a DOM interface,
> for example, as RDF/XML).
> The use of event-triggered data-model construction actions to specify
> the relationship between XML representation and corresponding data
> objects is innovative and compelling, but surely it would be
> straight-forward to associate these events with a pre-order traversal
> of an infoset independently constrained by a DTD, XML Schema schema or
> other appropriate definition of the canonical document type. If
> continued support for alternative forms is considered essential, then
> a two-step approach where the semantics of any non-canonical form is
> defined in terms of a canonical form to which it corresponds would
> still be far simpler than the current approach.
>
> [36] http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax
> [37] http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-10
and
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-11
>4.5. On the relation between RDF and off-the-shelf XML tools (policy,
>substantive)
>
> With some diffidence, we conclude by raising what may be a sensitive
> issue.
> It does not seem to us that the XML serialization of RDF shows RDF to
> advantage. At the level of the underlying graph model, RDF information
> has a simple and regular structure, which appears in the XML
> serialization to be anything but simple and so irregular as to bring
> the words "capricious" and "arbitrary" to the lips of unprejudiced
> observers. Tastes in markup style differ, but we believe that the root
> of the problem is the high degree of variability with which the same
> underlying graph structures may be serialized, according to the rules
> given in this document.
> Owing in part to the variability itself, and in part to the specific
> forms taken by that variability, it is not feasible to write an XML
> Schema schema, or (if the comments in Appendix A.1 are accurate) a
> Relax NG schema, or an XML 1.0 DTD, which defines the set of correct
> serializations of correct RDF graphs. It is not convenient to run XSLT
> processes over arbitrary RDF serializations, nor to query or process
> arbitrary RD data using XQuery. Arbitrary RDF data is similarly
> inconvenient for other standard XML tools to process.
> There is, as a result, something of a cleft between the RDF community
> and the set of RDF tools on the one hand, and the community of users
> and tools employing what some have called colloquial XML. The parallel
> development of query languages, schema languages, object models, APIs,
> editors, display tools, and so on does offer relatively harmless ways
> for a large number of people to employ their time, but it does not
> seem to us to serve the larger Web community well.
> The cleft between RDF and colloquial XML does not seem to us to be
> required by the RDF data model. A graph in which nodes have certain
> properties and arcs have certain properties is not, in itself, a
> peculiarly difficult structure to render in XML or to process with
> off-the-shelf XML tools. An XML vocabulary in which nodes may appear
> as elements, or as attributes, or as attribute values, or as the
> PCDATA content of elements, and in which property names may appear as
> three of the same four constructs, on the other hand, seems a rather
> less straightforward XML representation of the underlying graph
> structure than most XML vocabularies for graphs have chosen.
> The result is that not just arbitrary RDF data, but data encoded using
> vocabularies defined in RDF terms (for which current W3C work provides
> a number of examples), will be hard to process using off-the-shelf
> tools. We believe this difficulty represents a lost opportunity, and
> we believe the opportunity could readily be seized if the XML
> serialization were modified to capture more of the regularity of the
> RDF data model.
> We are ready to work together with the Working Groups in the Semantic
> Web Activity and with other interested parties to formulate an XML
> serialization which captures the information in the RDF model and
> which is more readily amenable to processing with off-the-shelf XML
> tools.
Recorded as
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-12
Received on Wednesday, 12 March 2003 05:17:54 UTC