- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Wed, 12 Mar 2003 10:17:48 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, www-rdf-comments@w3.org
- Cc: W3C XML Schema IG <w3c-xml-schema-ig@w3.org>
Michael, Thank you for these comments. I note particularly the care that has clearly been taken in reviewing the RDFCore documents. I have indicated below links to our issue tracking document for the issues that you have raised as substantive. Those that are editorial, I have left to the editors to decide whether they consider them within their editorial discretion. I have in some cases recorded a comment against a different document to the one that inspired it. This reflects my understanding of the document that 'owns' the specification of a particular concept. The document editors may respond further if they need any clarification of the comment. Brian At 14:33 10/03/2003 -0700, C. M. Sperberg-McQueen wrote: >Colleagues: > >With apologies for the delay, I transmit to you herewith the comments >of the XML Schema Working Group on the various RDF documents published >in Last Call recently. We congratulate you on the progress of your >work and hope our comments are useful to you. An HTML version of our >comments may be found at > >http://www.w3.org/XML/Group/2003/03/xml-schema-rdf-notes.html I'm told this is a link is to a page which is member visible but not public. [...] >1.1. Design question, complexity (substantive) > > The introduction of pairs consisting of a lexical form and a type (or, > strictly speaking, a lexical form and a type label) seems at first > glance to complicate the RDF model somewhat. We have had the > impression that in other parts of RDF, typing is handled by adding > further arcs and nodes. If the type of a resource is identified by > having an arc labeled rdf:type from it to (the URI of) its (RDF) type, > and if the type of an arc is similarly identified by an arc, then > surely a reason ought to be given for shifting to a different method > for typing literal strings. It seems like a dramatic shift in the > infrastructure of RDF, from "everything is a node, an arc, or a > literal value" to "everything is a node, an arc, or a typed literal > value". Perhaps not quite so dramatic, after all. But the question of > design consistency remains: why not "everything is a typed node, a > typed arc, or a typed literal"? Recorded against concepts as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-01 >1.2. Whitespace handling (schema-related) > > Some members of the XML Schema WG have expressed concern that XML > Schema's rules for whitespace handling may interfere with expected > behavior in other contexts. This may be the appropriate place to bring > this question up. > In brief, XML Schema's simple types each define a whitespace facet, > which governs the kind of whitespace pre-processing done by an XML > Schema processor before the lexical form is checked for type validity. > Since the point of whitespace normalization is to simplify subsequent > processing, the lexical spaces of XML Schema's simple types are (like > those in many programming languages) defined without reference to the > preceding whitespace normalization. Integers, for example, are > represented by sequences of decimal digits; sequences containing > blanks are not legal lexical forms for integers. Indeed, strictly > speaking it is only after the whitespace pre-processing is done that > the XML Schema processor can be said to be working with a lexical form > at all. > For example, the integer type has a value of collapse for the > whitespace facet, which means leading and trailing whitespace is > stripped, and internal whitespace sequences are reduced to a single > blank (x20) character. In an XML document in which the element > exterms:age is defined as having type xs:integer, the following > instances of exterms:age will all be type-valid: > > <exterms:age>27</exterms:age> > <exterms:age> > 27 > </exterms:age> > <exterms:age> 27 </exterms:age> > <exterms:age> 2<!--* ha, ha, fooled your full-text indexer! > *-->7 </exterms:age> > > The input information set, in each case, contains a character > information item for "2" followed by a character information item for > "7", with character information items for whitespace characters, and a > comment information item, present in some of the examples. In all > cases, the lexical form proper is the character sequence "27" (i.e. > the sequence of characters after white space handling, and ignoring > comments, processing instructions, entity boundaries, and other > distractions). This is a legal lexical form for an integer, so all the > examples are type valid. > Some members of the XML Schema WG have worried that it may not be > obvious that the whitespace processing is not part of the process of > checking lexical forms for type validity, but part of the process of > extracting the lexical forms from the XML information set presented to > the processor. If an RDF document contains > > <exterms:age> 27 </exterms:age> > > and a processor hands the contents of the element to a generic > type-checker for XML Schema's simple types, saying in effect "this > purports to be the lexical form of an integer; is that OK?", that type > checker will be required (if it conforms to the XML Schema spec's > definition of the simple types) to say "no, the character sequence > ` 27 ' is not a legal lexical form for an integer." > It's not clear whether RDF, being type-system neutral, can directly > address this concern (e.g. by specifying that an RDF processor should > do the appropriate whitespace pre-processing, or by warning users that > they should not include vagrant whitespace in typed literals), or > whether it suffices for developers of RDF software with built-in > support for XML Schema's simple types to deal with it, e.g. by > performing it themselves before handing the resulting lexical form to > a type checker. > As noted, some members of our WG feel that you need to be alerted to > this as a possible source of confusion and unexpected results. Other > members of the WG feel that it verges on disrespect to assume that you > need instruction on this point. We compromised by agreeing to point > out the issue to you, and to leave you to draw your own conclusions. Recorded against concepts as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-02 >2. Notes on RDF Concepts and Abstract Syntax > >2.1. Mapping from lexical forms to values (schema-related, terminological) > > In [21]http://www.w3.org/TR/rdf-concepts/#section-Datatypes: > > [21] http://www.w3.org/TR/rdf-concepts/#section-Datatypes > > A datatype mapping is a set of pairs whose first element belongs to > the lexical space of the datatype, and the second element belongs > to the value space of the datatype: > > We agree that it is useful to define a term to denote such mappings; > in the interests of inter-specification consistency, we wonder whether > you would be willing to consider using the term lexical mapping, which > we are introducing in our forthcoming draft of XML Schema 1.1. The > term datatype mapping seems unlikely to be usable in the XML Schema > specification, where it would suggest to some readers a mapping from > one datatype to another, rather than as here a mapping from lexical > space to value space. (XML Schema 1.0 got by without a term for this > concept.) Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-03 >2.2. Values without lexical forms (schema-related, important) > > In [22]http://www.w3.org/TR/rdf-concepts/#section-Datatypes: > > [22] http://www.w3.org/TR/rdf-concepts/#section-Datatypes > > * Each member of the value space may be paired with any number > (including zero) of members of the lexical space (lexical > representations for that value). > > The provision for values without corresponding lexical forms > contradicts an assumption to which the XML Schema spec appeals from > time to time. The lexical space of any simple datatype in XML Schema > is the domain of the type's lexical mapping; the value space is its > domain. There are no meaningless lexical forms in the lexical space of > the type, nor are there ineffable values in the value space. By > eliminating values from the value space (e.g. by setting minimal and > maximal values), the type definer may indirectly also eliminate > lexical forms from the lexical space; conversely, by eliminating some > items from the lexical space (e.g. by setting a pattern), the type > definer may eliminate items from the value space. > Are there crucial aspects of RDF which will break if the list item > quoted above is changed to read "paired with one or more members of > the lexical space"? Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-04 >2.3. Lexical forms, strings, and character sequences (schema-related, >editorial) > > In [23]http://www.w3.org/TR/rdf-concepts/#section-Datatypes: > > [23] http://www.w3.org/TR/rdf-concepts/#section-Datatypes > > With one exception, the datatypes used in RDF have a lexical space > consisting of a set of strings. > > Since "string" is used as the local name for a particular simple type > in the XML Schema namespace, we believe it will be less confusing for > users, in the long run, if the lexical representations of > simple-datatype values are described not as "strings" but as > "character sequences". > This comment also applies to other uses of the term string to denote > the members of a lexical space. Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-05 >2.4. Strings for natural-language data (substantive) > > In [24]http://www.w3.org/TR/rdf-concepts/#section-Datatypes: > > [24] http://www.w3.org/TR/rdf-concepts/#section-Datatypes > > * A plain literal is a string combined with an optional language > identifier. This should be used for plain text in a natural > language. As recommended in the RDF formal semantics > [RDF-SEMANTICS], these plain literals are self-denoting. > > We do not believe that simple strings are likely to be adequate for > the representation of arbitrary natural-language text. Even in > English, natural-language utterances (such as this document) may need > some degree of inline markup for clarity and adequate presentation; in > natural-language utterances requiring bidirectional display or ruby, > the best authorities (including the W3C I18n Working Group) recommend > the use of markup within the natural-language utterance. We thus > suggest that you may wish to moderate this recommendation that > natural-language material be represented by literals. > This is not an area in which we claim particular technical expertise; > we merely call it to your attention in the hopes that doing so may be > useful to you. Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-06 >2.5. Typos and minor editorial notes > > In [25]http://www.w3.org/TR/rdf-concepts/#section-Literal-Value, for > "the datatype mapping is applied to the pair form by the lexical form > and the language identifier" read "the datatype mapping is applied to > the pair formed by the lexical form and the language identifier". > In the same section, for "Such a case, while in error, is not > syntacticly ill-formed " read "Such a case, while in error, is not > syntactically ill-formed" (et passim). > In section [26]http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral, > for "root element tag" read "root element". > In the same section, for "XML element content" read "XML data" (the > term element content is used in some markup-related specs as a > complement of mixed content to denote the content of elements which > can contain other elements but cannot contain parsed character data). > > [25] http://www.w3.org/TR/rdf-concepts/#section-Literal-Value > [26] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral Left, for now, to editor's discretion. >3. Notes on RDF Semantics > >3.1. The "meaning" of literals (editorial) > > The meaning of a literal is principally determined by its character > string: it either refers to the value mapped from the string by the > associated datatype, or if no datatype is provided then it refers > to the literal itself, which is either a unicode character string > or a pair of a string with a language tag. > > Some members of the XML Schema WG are made nervous by the appeal to > the notion of "meaning" here. [N.B. our task force read this section > out of context, and were not aware of any foregoing elucidation. So > this comment may be out of place.] There is also some concern about > the apparent conflation here of the notions of meaning and reference. > We wonder whether this discussion would be weakened by replacing > references to meaning and reference by references to denotation; we > are inclined to think it would be an improvement, but recognize that > the RDF Core WG's views may differ. Left, for now, to editors discretion. >3.2. Types as lexical mappings (schema-related) > > A datatype is an entity characterized by a set of character strings > called lexical forms and a mapping from that set to a set of > values. > > We have a couple of reservations concerning this characterization. > * Elsewhere (e.g. in Concepts and Abstract Syntax, section 3.3, > [27]http://www.w3.org/TR/rdf-concepts/#section-Datatypes), the RDF > specs say that there may be values in a value space which are not > in the range of the lexical mapping; we have suggested that if > possible those statements should be changed, but if they are > retained, then a datatype cannot be characterized solely by the > lexical space and the lexical mapping, because such ineffable > values appear in neither of these. > * The statement describes (with the exception of the problem just > noted) simple datatypes, but not the class of complex datatypes > which can be defined by XML Schema, nor all the types (or > type-like constructs) definable in various other schema languages > for XML. > > [27] http://www.w3.org/TR/rdf-concepts/#section-Datatypes Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-07 >3.3. Miscellaneous editorial notes > > In [28]http://www.w3.org/TR/rdf-mt/#dtype_interp, for "which we will > refer to as XSD and use the Qname prefix xsd:" read "which we will > refer to as XSD and denote using the Qname prefix xsd" (or something > similar). > In [29]http://www.w3.org/TR/rdf-mt/#dtype_interp: > > [28] http://www.w3.org/TR/rdf-mt/#dtype_interp > [29] http://www.w3.org/TR/rdf-mt/#dtype_interp > > For example, XML Schema requires that the value spaces of > xsd:string and xsd:decimal to be disjoint ... > > This sentence is not exactly wrong, but it seems slightly unusual to > use the verb require here, instead of define or something similar. We > suggest recasting this as "For example, XML Schema defines the value > spaces of xsd:string and xsd:decimal as disjoint ..." (Note, for the > record, that the value spaces of all the primitive simple datatypes of > XML Schema 1.0 are pairwise disjoint.) > In , > > any literal of the form "sss"@ttt^^ddd, where ddd is not > rdf:XMLLiteral, treated as identical to the same literal without > the language tag, "sss"@ddd > > is "sss"@ddd a typo for "sss"^^ddd? > In [30]http://www.w3.org/TR/rdf-mt/#dtype_entail, for "it is valid to > add any number of leading zeros to any numeral and still be a correct > lexical form for xsd:integer", perhaps read "it is possible to add any > number of leading zeros to any lexical form for xs:integer without it > ceasing to be a correct lexical form for xsd:integer" > > [30] http://www.w3.org/TR/rdf-mt/#dtype_entail Left to editors discretion. >4. Notes on RDF/XML Syntax Specification (Revised) > > RDF/XML Syntax, [31]http://www.w3.org/TR/rdf-syntax-grammar/ > > [31] http://www.w3.org/TR/rdf-syntax-grammar/ > >4.1. Manifest typing in the instance (policy) > > RDF allows Typed Literals to be given as the object node of arcs. > These consist of a literal string (with optional language) and a > datatype RDF URI Reference. This is handled ... with an additional > rdf:datatype="datatypeURI" attribute on the property element. > > We believe there are probably good reasons for using an rdf:datatype > attribute, instead of re-using the existing xsi:type attribute which > has (when the type is defined in a schema defined by XML Schema 1.0) > the same semantics. In particular, rdf:datatype does not assume or > assert the existence of the type named as a type in a schema defined > by XML Schema, so it would be problematic to use xsi:type. > We do fear, however, that users are likely to find this > near-duplication of the meaning and function of xsi:type confusing. It > is not clear to us what, if anything, can or should be done to > minimize this danger. Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-08 >4.2. QNames (Editorial, but important) > > We were unable, on a first reading, to determine whether the default > namespace declaration, and thus unprefixed names, were or were not > allowed in order to encode 'RDF URI References'. Indeed the > introductory prose about QNames (2nd para of > [32]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-intro]) > does not seem to connect up with the relevant (?) production in > [33]http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar] > , which we take to be > [34]http://www.w3.org/TR/rdf-syntax-grammar/#URI-reference]. > This can and should be cleared up. > > [32] http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-intro > [33] http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar > [34] http://www.w3.org/TR/rdf-syntax-grammar/#URI-reference Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-09 >4.3. Miscellaneous editorial notes > > In > [35]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-empty-prop > erty-elements, the sentence > > [35] > http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-empty-property-elements > > When an arc in an RDF Graph points to an object node which has no > further arcs, which appears in RDF/XML as an empty node element > sequence such as the pair <rdf:Description rdf:about="..."> > </rdf:Description>, this form can be shortened. > > seems less clear than it might be. Different readers prove to have > different views on what is meant by "the pair <rdf:Description > rdf:about="..."> </rdf:Description>"; perhaps it can be replaced by > something like "the empty element <rdf:Description rdf:about="..."/>" > without loss of precision? Perhaps the sentence could read > > When an arc in an RDF Graph points to an object node which has no > further arcs, which appears in RDF/XML as an empty node element > such as <rdf:Description rdf:about="..."/>, this form can be > shortened. Left to editor's discretion >4.4. Normative specification of XML grammar (policy, substantive) > > We note with admiration the excellent tutorial introduction to the > striped syntax in Section 2 > [36]http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax]. We are > less happy with the nature of the syntax, and with the approach taken > to its normative statement > [37]http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar] > . > As regards the syntax itself, we would much prefer to have seen a move > to a single canonical syntax with much less variablity. With respect, > the current design suggests that the value of XML has been > misunderstood. The range of alternative forms of expression provided > for in the current design make it very difficult to use the broad > range of generic XML tools (e.g. syntax-directed editors, XSLT) which > could give so much benefit to RDF users. (More on this below.) At the > very least we would encourage you to specify a single canonical form, > probably strictly striped, which could be defined by an XML Schema or > DTD. We would be happy to work with you to develop a schema for such a > subset. > As regards the approach taken to defining the syntax, in our view, > layering of specs has very high value, and so defining an XML document > type by way of what is very nearly a character-level BNF is at best a > missed opportunity and at worst a serious mistake. It obscures the > important aspects of the document type behind a welter of irrelevant > detail about e.g. whitespace and start-tag/end-tag matching. It makes > it very difficult for the reader to actually understand what is and > isn't actually allowed -- what an RDF/XML document actually looks > like. > Not only does this confuse levels and thus readers, it also runs the > risk of inadvertently defining an XML subset. It also appears, on a > strict reading, to rule out XML documents not derived from the parsing > of character streams as possible RDF/XML (so that it would be > illegitimate to regard a data structure created using a DOM interface, > for example, as RDF/XML). > The use of event-triggered data-model construction actions to specify > the relationship between XML representation and corresponding data > objects is innovative and compelling, but surely it would be > straight-forward to associate these events with a pre-order traversal > of an infoset independently constrained by a DTD, XML Schema schema or > other appropriate definition of the canonical document type. If > continued support for alternative forms is considered essential, then > a two-step approach where the semantics of any non-canonical form is > defined in terms of a canonical form to which it corresponds would > still be far simpler than the current approach. > > [36] http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax > [37] http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-10 and http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-11 >4.5. On the relation between RDF and off-the-shelf XML tools (policy, >substantive) > > With some diffidence, we conclude by raising what may be a sensitive > issue. > It does not seem to us that the XML serialization of RDF shows RDF to > advantage. At the level of the underlying graph model, RDF information > has a simple and regular structure, which appears in the XML > serialization to be anything but simple and so irregular as to bring > the words "capricious" and "arbitrary" to the lips of unprejudiced > observers. Tastes in markup style differ, but we believe that the root > of the problem is the high degree of variability with which the same > underlying graph structures may be serialized, according to the rules > given in this document. > Owing in part to the variability itself, and in part to the specific > forms taken by that variability, it is not feasible to write an XML > Schema schema, or (if the comments in Appendix A.1 are accurate) a > Relax NG schema, or an XML 1.0 DTD, which defines the set of correct > serializations of correct RDF graphs. It is not convenient to run XSLT > processes over arbitrary RDF serializations, nor to query or process > arbitrary RD data using XQuery. Arbitrary RDF data is similarly > inconvenient for other standard XML tools to process. > There is, as a result, something of a cleft between the RDF community > and the set of RDF tools on the one hand, and the community of users > and tools employing what some have called colloquial XML. The parallel > development of query languages, schema languages, object models, APIs, > editors, display tools, and so on does offer relatively harmless ways > for a large number of people to employ their time, but it does not > seem to us to serve the larger Web community well. > The cleft between RDF and colloquial XML does not seem to us to be > required by the RDF data model. A graph in which nodes have certain > properties and arcs have certain properties is not, in itself, a > peculiarly difficult structure to render in XML or to process with > off-the-shelf XML tools. An XML vocabulary in which nodes may appear > as elements, or as attributes, or as attribute values, or as the > PCDATA content of elements, and in which property names may appear as > three of the same four constructs, on the other hand, seems a rather > less straightforward XML representation of the underlying graph > structure than most XML vocabularies for graphs have chosen. > The result is that not just arbitrary RDF data, but data encoded using > vocabularies defined in RDF terms (for which current W3C work provides > a number of examples), will be hard to process using off-the-shelf > tools. We believe this difficulty represents a lost opportunity, and > we believe the opportunity could readily be seized if the XML > serialization were modified to capture more of the regularity of the > RDF data model. > We are ready to work together with the Working Groups in the Semantic > Web Activity and with other interested parties to formulate an XML > serialization which captures the information in the RDF model and > which is more readily amenable to processing with off-the-shelf XML > tools. Recorded as http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-12
Received on Wednesday, 12 March 2003 05:17:54 UTC