W3C XML Schema Working Group

Comments on RDF documents

ed. Charles Campbell, C. M. Sperberg-McQueen, Henry S. Thompson

10 March 2003

1. Notes on RDF Primer
- 1.1. Design question, complexity (substantive) [xmlsch-01]
- 1.2. Whitespace handling (schema-related) [xmlsch-02]
2. Notes on RDF Concepts and Abstract Syntax
3. Notes on RDF Semantics
4. Notes on RDF/XML Syntax Specification (Revised)

NOTE:

[This document contains notes considered and approved by the W3C XML Schema Working Group and transmitted to the RDF Core Working Group as comments on the last-call drafts of various RDF-related documents, together with the text of various responses to these issues transmitted to the XML Schema WG in different email messages:
Jeremy Carroll to the XML Schema IG, 30 June 2003 (http://lists.w3.org/Archives/Public/www-rdf-comments/2003AprJun/0295.html and http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003Jun/0196.html)

Dave Beckett to C. M. Sperberg-McQueen, 27 March 2003 (on xmlsch-08) (http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0592.html and http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003Mar/0143.html)

Dave Beckett to C. M. Sperberg-McQueen, 27 March 2003 (on xmlsch-09) (http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0593.html and http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003Mar/0144.html)

Dave Beckett to C. M. Sperberg-McQueen, 29 April 2003 (on xmlsch-10) (http://lists.w3.org/Archives/Public/www-rdf-comments/2003AprJun/0112.html and http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003Apr/0299.html)

Dave Beckett to C. M. Sperberg-McQueen, 29 April 2003 (on xmlsch-11) (http://lists.w3.org/Archives/Public/www-rdf-comments/2003AprJun/0114.html and http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003Apr/0301.html)

Dave Beckett to C. M. Sperberg-McQueen, 29 April 2003 (on xmlsch-12) (http://lists.w3.org/Archives/Public/www-rdf-comments/2003AprJun/0113.html and http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003Apr/0300.html)

This version of this document also contains responses from the XML Schema Working Group; these responses were considered and approved at the XML Schema WG teleconference of 3 October 2003 and are herewith transmitted to the RDF Core Working Group as replies to its request for feedback on its resolution of the issues.

$Id: xmlschema.rdf.comments.responses.html,v 1.2 2003/10/03 19:53:05 cmsmcq Exp $

The XML Schema Working Group congratulates the RDF Core Working Group on progressing its several documents to Last Call; we apologize for the late submission of these comments, and hope that they prove helpful.

Our comments include some which bear directly on the use of XML Schema's simple types by RDF, to which we believe you wished us to give particular attention. In the text which follows, these are labeled “schema-related”. Some other comments, in contrast, relate to important and difficult technical and policy questions relating to language design and tool usage; these are labeled “policy”. We hope that you will give these comments very serious consideration, but we do not pretend to any special standing in raising them, other than as representative members of the XML community at large. Finally, there are some other questions which are not directly related to XML Schema or to XML in general, and for which we therefore pretend to no particular expertise or standing, but which we happened to notice and which we call to your attention, as any technically minded reader might do, in the hopes that doing so may be useful to you; these are labeled “substantive” or “editorial” as the case might be.

1. Notes on RDF Primer

RDF Primer, section 2.4 Typed literals http://www.w3.org/TR/rdf-primer/#typedliterals

1.1. Design question, complexity (substantive) [xmlsch-01]

The introduction of pairs consisting of a lexical form and a type (or, strictly speaking, a lexical form and a type label) seems at first glance to complicate the RDF model somewhat. We have had the impression that in other parts of RDF, typing is handled by adding further arcs and nodes. If the type of a resource is identified by having an arc labeled rdf:type from it to (the URI of) its (RDF) type, and if the type of an arc is similarly identified by an arc, then surely a reason ought to be given for shifting to a different method for typing literal strings. It seems like a dramatic shift in the infrastructure of RDF, from “everything is a node, an arc, or a literal value” to “everything is a node, an arc, or a typed literal value”. Perhaps not quite so dramatic, after all. But the question of design consistency remains: why not “everything is a typed node, a typed arc, or a typed literal”?

Response from RDF:

Our resolution is: xmlsch-01 as in 0252 with amendment. i.e.

The RDF Core WG interprets this comment as two questions and a comment:
1) Why is the type of a literal not described using a property arc, as is done for other literals?

2) Having introduced typed literal nodes, why not introduce typed resource nodes and typed property arcs as well

3) The WG should provide a rationale for this design in the specifications

Regarding question 1:

This would require that literals be allowed as subjects of RDF statements. This is not possible in current RDF/XML and would require considerable change, beyond the scope of the WG, to support it. Further it introduces problems of non-monotonicity in the semantics. A property whose value is plain literal is currently taken to denote a sequence characters. Adding a further statement could change that value to, say an integer, invalidating previous inferences and breaking a fundamental tenet of RDF.

Regarding question 2:

No requirement justified a change to the notion of a URIREF node or an RDF arc.

Regarding comment 3:

Providing a rationale document to accompany the specifications would certainly be nice to have, but the working group chose to spend its writing resource on explanatory text and formal specification rather than justification. We reject this comment on the grounds that the specifications are not intended to provide a rationale.

Response from XML Schema

On question 1: Thank you; that helps clarify the design.

On question 2: In the final analysis this is your call and we don't plan to lie down in the road over it. For the record, though, we should record that we find your analysis unconvincing. The introduction of typed literals introduces a new idea into RDF, and it is obvious that this new idea has possible applications elsewhere in the design space. Your response amounts to saying that you chose not to work through the design implications of introducing this kind of type labeling, because it seemed possible to get by without such re-thinking. The result is that the new idea will continue to feel incompletely integrated into RDF; it will feel like a patch added as an afterthought rather than an integral part of the design.

On comment 3: We understand your desire not to work your editors to death. Your one-paragraph response to question 1, however, does a good job of clarifying the point that was obscure to us, and we think it may not be beyond the wit of your editors to introduce its substance into the text at some appropriate point.

Overall: we are not wholly convinced by your resolution of this issue but do not wish to appeal to the Director on it.

1.2. Whitespace handling (schema-related) [xmlsch-02]

Some members of the XML Schema WG have expressed concern that XML Schema's rules for whitespace handling may interfere with expected behavior in other contexts. This may be the appropriate place to bring this question up.

In brief, XML Schema's simple types each define a whitespace facet, which governs the kind of whitespace pre-processing done by an XML Schema processor before the lexical form is checked for type validity. Since the point of whitespace normalization is to simplify subsequent processing, the lexical spaces of XML Schema's simple types are (like those in many programming languages) defined without reference to the preceding whitespace normalization. Integers, for example, are represented by sequences of decimal digits; sequences containing blanks are not legal lexical forms for integers. Indeed, strictly speaking it is only after the whitespace pre-processing is done that the XML Schema processor can be said to be working with a lexical form at all.

For example, the integer type has a value of collapse for the whitespace facet, which means leading and trailing whitespace is stripped, and internal whitespace sequences are reduced to a single blank (x20) character. In an XML document in which the element exterms:age is defined as having type xs:integer, the following instances of exterms:age will all be type-valid:

<exterms:age>27</exterms:age>
<exterms:age>
  27
</exterms:age>
<exterms:age>   27  </exterms:age>
<exterms:age>   2<!--* ha, ha, fooled your full-text indexer!
*-->7  </exterms:age>

The input information set, in each case, contains a character information item for “2” followed by a character information item for “7”, with character information items for whitespace characters, and a comment information item, present in some of the examples. In all cases, the lexical form proper is the character sequence “27” (i.e. the sequence of characters after white space handling, and ignoring comments, processing instructions, entity boundaries, and other distractions). This is a legal lexical form for an integer, so all the examples are type valid.

Some members of the XML Schema WG have worried that it may not be obvious that the whitespace processing is not part of the process of checking lexical forms for type validity, but part of the process of extracting the lexical forms from the XML information set presented to the processor. If an RDF document contains

<exterms:age>   27  </exterms:age>

and a processor hands the contents of the element to a generic type-checker for XML Schema's simple types, saying in effect “this purports to be the lexical form of an integer; is that OK?”, that type checker will be required (if it conforms to the XML Schema spec's definition of the simple types) to say “no, the character sequence ‘ 27 ’ is not a legal lexical form for an integer.”

It's not clear whether RDF, being type-system neutral, can directly address this concern (e.g. by specifying that an RDF processor should do the appropriate whitespace pre-processing, or by warning users that they should not include vagrant whitespace in typed literals), or whether it suffices for developers of RDF software with built-in support for XML Schema's simple types to deal with it, e.g. by performing it themselves before handing the resulting lexical form to a type checker.

As noted, some members of our WG feel that you need to be alerted to this as a possible source of confusion and unexpected results. Other members of the WG feel that it verges on disrespect to assume that you need instruction on this point. We compromised by agreeing to point out the issue to you, and to leave you to draw your own conclusions.

Response from RDF:

The RDF Core WG resolved: xmlsch-02 addressed by msg-0097 where msg-0097 is http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0097.html and says
PROPOSE RDF Core accepts the comment xmlsch-02 and agree to add the following test case:
<rdf:Description rdf:about="http://www.example.org/a">
   <eg:prop rdf:datatype="&xsd;int">3</eg:prop>
</rdf:Description>
Does not entail
<rdf:Description rdf:about="http://www.example.org/a">
   <eg:prop rdf:datatype="&xsd;int"> 3 </eg:prop>
</rdf:Description>
Moreover the following comment to be added to concepts:
NOTE: In [XML Schema (part 1)], white space normalization occurs during validation according to the value of the whiteSpace facet. The lexical-to-value mapping used in RDF datatyping occurs after this, so that the whiteSpace facet has no effect in RDF datatyping.

In fact more test cases were desired, and the test cases created are currently awaiting final WG approval and can be found in: http://www.w3.org/2000/10/rdf-tests/rdfcore/xmlsch-02/

The Manifest file describes four tests showing that::
A well-formed typed literal is not related to an ill-formed literal. Even if they only differ by whitespace.

A simple test for well-formedness of a typed literal.

An integer with whitespace is ill-formed.

The actual text corresponding to the agreed note is found at the end of section 5 http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatypes a certain amount of editorial descretion was taken to consolidate notes concerning your comments.

The full note from the editors draft is:
Note: When the datatype is defined using XML Schema:

...

+ In [XML-SCHEMA1], white space normalization occurs during validation according to the value of the whiteSpace facet. The lexical-to-value mapping used in RDF datatyping occurs after this, so that the whiteSpace facet has no effect in RDF datatyping.

Response from XML Schema

Thank you for your reply.

The XML Schema Working Group is in agreement on one point of our reply and divided in our opinion on a second point.

First, we are agreed that the position you sketch out is not a source of logical inconsistency which will render your specification meaningless or logically problematic. It is entirely possible for you to handle whitespace in this way.

On the second point, our views are divided.

A minority of the Working Group believes that you have made a reasonable design choice, given that RDF will only ever be produced by and consumed by software, and that humans and issues of human legibility are not and should not be matters of concern in your design.

A larger portion of the Working Group vigorously disagrees and believes that for RDF processors to treat your two test cases differently is to build into RDF a potential for astonishing users and leading to unexpected results which will haunt you and your users for years to come. In this view, it is not as a matter of compatibility with XML Schema, but as a matter of common-sense concern for your users that you should simply say that the whitespace processing specified for the type in question should be performed by any RDF processor.

Overall: we do not have consensus either to express satisfaction with your resolution of this issue or to raise a formal dissent. In the opinion of our chair, this means there is no formal dissent, but he recommends that this point be listed during the review of formal dissents as an issue on which there was not perfect consensus.

2. Notes on RDF Concepts and Abstract Syntax

2.1. Mapping from lexical forms to values (schema-related, terminological) [xmlsch-03]

In http://www.w3.org/TR/rdf-concepts/#section-Datatypes:

A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:

We agree that it is useful to define a term to denote such mappings; in the interests of inter-specification consistency, we wonder whether you would be willing to consider using the term lexical mapping, which we are introducing in our forthcoming draft of XML Schema 1.1. The term datatype mapping seems unlikely to be usable in the XML Schema specification, where it would suggest to some readers a mapping from one datatype to another, rather than as here a mapping from lexical space to value space. (XML Schema 1.0 got by without a term for this concept.)

Response from RDF. [see next section].

2.2. Values without lexical forms (schema-related, important) [xmlsch-04]

In http://www.w3.org/TR/rdf-concepts/#section-Datatypes:

Each member of the value space may be paired with any number (including zero) of members of the lexical space (lexical representations for that value).

The provision for values without corresponding lexical forms contradicts an assumption to which the XML Schema spec appeals from time to time. The lexical space of any simple datatype in XML Schema is the domain of the type's lexical mapping; the value space is its domain. There are no meaningless lexical forms in the lexical space of the type, nor are there ineffable values in the value space. By eliminating values from the value space (e.g. by setting minimal and maximal values), the type definer may indirectly also eliminate lexical forms from the lexical space; conversely, by eliminating some items from the lexical space (e.g. by setting a pattern), the type definer may eliminate items from the value space.

Are there crucial aspects of RDF which will break if the list item quoted above is changed to read “paired with one or more members of the lexical space”?

Response from RDF:

We decided:
PROPOSED to clarify xmlsch-03 xmlsch-04 pfps-13 based on the proposal to close in http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0368.html
i.e.
PROPOSE
xmlsch-03 - we globally use the term lexical-to-value mapping instead of datatype mapping or any other term

xmslch-04 - we do not change the definition of value space but add a note clarifying the relationship with XML Schema datatypes.

The new text can be found in the editors draft at: http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatypes and reads:
5. Datatypes (Normative)

The datatype abstraction used in RDF is compatible with the abstraction used in XML Schema Part 2: Datatypes [XML-SCHEMA2].

A datatype consists of a lexical space, a value space and a lexical-to-value mapping.

The lexical space of a datatype is a set of Unicode [UNICODE] strings.

The lexical-to-value mapping of a datatype is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:

Each member of the lexical space is paired with (maps to) exactly one member of the value space. Each member of the value space may be paired with any number (including zero) of members of the lexical space (lexical representations for that value).

A datatype is identified by one or more URI references.

RDF may be used with any datatype definition that conforms to this abstraction, even if not defined in terms of XML Schema.

Certain XML Schema built-in datatypes are not suitable for use within RDF. For example, the QName datatype requires a namespace declaration to be in scope during the mapping, and is not recommended for use in RDF. [RDF-SEMANTICS] contains a more detailed discussion of specific XML Schema built-in datatypes.

Note: When the datatype is defined using XML Schema:
All values correspond to some lexical form, either using the lexical-to-value mapping of the datatype or if it is a union datatype with a lexical mapping associated with one of the member datatypes.

XML Schema facets remain part of the datatype and are used by the XML Schema mechanisms that control the lexical space and the value space; however, RDF does not define a standard mechanism to access these facets.

Response from XML Schema

Thank you; this looks better.

2.3. Lexical forms, strings, and character sequences (schema-related, editorial) [xmlsch-05]

In http://www.w3.org/TR/rdf-concepts/#section-Datatypes:

With one exception, the datatypes used in RDF have a lexical space consisting of a set of strings.

Since “string” is used as the local name for a particular simple type in the XML Schema namespace, we believe it will be less confusing for users, in the long run, if the lexical representations of simple-datatype values are described not as “strings” but as “character sequences”.

This comment also applies to other uses of the term string to denote the members of a lexical space.

Response from RDF:

RESOLVED: do not accept xmlsch-05

Rationale: It feels like a fairly extensive editorial change. Also in the semantic web activity documents xsd:string is always refered to in its qualified form, and so the possible confusion is diminished.

Response from XML Schema

Thank you; we believe you are making a mistake but we will not insist on our suggestion.

2.4. Strings for natural-language data (substantive) [xmlsch-06]

In http://www.w3.org/TR/rdf-concepts/#section-Datatypes:

A plain literal is a string combined with an optional language identifier. This should be used for plain text in a natural language. As recommended in the RDF formal semantics [RDF-SEMANTICS], these plain literals are self-denoting.

We do not believe that simple strings are likely to be adequate for the representation of arbitrary natural-language text. Even in English, natural-language utterances (such as this document) may need some degree of inline markup for clarity and adequate presentation; in natural-language utterances requiring bidirectional display or ruby, the best authorities (including the W3C I18n Working Group) recommend the use of markup within the natural-language utterance. We thus suggest that you may wish to moderate this recommendation that natural-language material be represented by literals.

This is not an area in which we claim particular technical expertise; we merely call it to your attention in the hopes that doing so may be useful to you.

Response from RDF:

RESOLVED: to accept xmlsch-06, with revised wording as noted
A plain literal is a string combined with an optional language identifier. This may be used for plain text in a natural language. As recommended in the RDF formal semantics [RDF-SEMANTICS], these plain literals are self-denoting.
after other changes the text now reads:[1]

Response from XML Schema

Thank you. This wording is better.

We believe (again, we claim no special expertise here and would defer to the views of the Internationalization Working Group) that you might usefully add a health warning here. For example
... This may, if necessary, be used for plain text in a natural language, but in general this is not recommended; natural language is usually better represented with a more elaborate structure....

We hope that you will be persuaded to add a health warning, but we do not believe this point is worth registering a formal dissent for.

2.5. Typos and minor editorial notes

In http://www.w3.org/TR/rdf-concepts/#section-Literal-Value, for “the datatype mapping is applied to the pair form by the lexical form and the language identifier” read “the datatype mapping is applied to the pair formed by the lexical form and the language identifier”.

In the same section, for “Such a case, while in error, is not syntacticly ill-formed ” read “Such a case, while in error, is not syntactically ill-formed” (et passim).

In section http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral, for “root element tag” read “root element”.

In the same section, for “XML element content” read “XML data” (the term element content is used in some markup-related specs as a complement of mixed content to denote the content of elements which can contain other elements but cannot contain parsed character data).

3. Notes on RDF Semantics

3.1. The “meaning” of literals (editorial)

The meaning of a literal is principally determined by its character string: it either refers to the value mapped from the string by the associated datatype, or if no datatype is provided then it refers to the literal itself, which is either a unicode character string or a pair of a string with a language tag.

Some members of the XML Schema WG are made nervous by the appeal to the notion of “meaning” here. [N.B. our task force read this section out of context, and were not aware of any foregoing elucidation. So this comment may be out of place.] There is also some concern about the apparent conflation here of the notions of meaning and reference. We wonder whether this discussion would be weakened by replacing references to meaning and reference by references to denotation; we are inclined to think it would be an improvement, but recognize that the RDF Core WG's views may differ.

Response from RDF: None (unless overlooked).

Response from XML Schema

We continue to believe this comment important, but are willing to leave this to your editors' judgement.

3.2. Types as lexical mappings (schema-related)

A datatype is an entity characterized by a set of character strings called lexical forms and a mapping from that set to a set of values.

We have a couple of reservations concerning this characterization.

Elsewhere (e.g. in Concepts and Abstract Syntax, section 3.3, http://www.w3.org/TR/rdf-concepts/#section-Datatypes), the RDF specs say that there may be values in a value space which are not in the range of the lexical mapping; we have suggested that if possible those statements should be changed, but if they are retained, then a datatype cannot be characterized solely by the lexical space and the lexical mapping, because such ineffable values appear in neither of these.
The statement describes (with the exception of the problem just noted) simple datatypes, but not the class of complex datatypes which can be defined by XML Schema, nor all the types (or type-like constructs) definable in various other schema languages for XML.

Response from RDF: None (unless overlooked).

Response from XML Schema

The wording given in your response to xmlsch-04 (your section 5 of RDF Concepts) seems to address these concerns adequately. Thank you.

3.3. Miscellaneous editorial notes

In http://www.w3.org/TR/rdf-mt/#dtype_interp, for “which we will refer to as XSD and use the Qname prefix xsd:” read “which we will refer to as XSD and denote using the Qname prefix xsd” (or something similar).

In http://www.w3.org/TR/rdf-mt/#dtype_interp:

For example, XML Schema requires that the value spaces of xsd:string and xsd:decimal to be disjoint ...

This sentence is not exactly wrong, but it seems slightly unusual to use the verb require here, instead of define or something similar. We suggest recasting this as “For example, XML Schema defines the value spaces of xsd:string and xsd:decimal as disjoint ...” (Note, for the record, that the value spaces of all the primitive simple datatypes of XML Schema 1.0 are pairwise disjoint.)

In ,

any literal of the form "sss"@ttt^^ddd, where ddd is not rdf:XMLLiteral, treated as identical to the same literal without the language tag, "sss"@ddd

is "sss"@ddd a typo for "sss"^^ddd?

In http://www.w3.org/TR/rdf-mt/#dtype_entail, for “it is valid to add any number of leading zeros to any numeral and still be a correct lexical form for xsd:integer”, perhaps read “it is possible to add any number of leading zeros to any lexical form for xs:integer without it ceasing to be a correct lexical form for xsd:integer”

4. Notes on RDF/XML Syntax Specification (Revised)

RDF/XML Syntax, http://www.w3.org/TR/rdf-syntax-grammar/

4.1. Manifest typing in the instance (policy) [xmlsch-08]

RDF allows Typed Literals to be given as the object node of arcs. These consist of a literal string (with optional language) and a datatype RDF URI Reference. This is handled ... with an additional rdf:datatype="datatypeURI" attribute on the property element.

We believe there are probably good reasons for using an rdf:datatype attribute, instead of re-using the existing xsi:type attribute which has (when the type is defined in a schema defined by XML Schema 1.0) the same semantics. In particular, rdf:datatype does not assume or assert the existence of the type named as a type in a schema defined by XML Schema, so it would be problematic to use xsi:type.

We do fear, however, that users are likely to find this near-duplication of the meaning and function of xsi:type confusing. It is not clear to us what, if anything, can or should be done to minimize this danger.

Response from RDF

Colleagues,

The RDF Core WG has considered your last call comment captured in http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-08 (raised in section "4.1. Manifest typing in the instance (policy)" of http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0489.html ) and decided http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Mar/0124.html to accept it.

The RDF Core WG agrees that there are good reasons for not using rdf:datatype rather than xsi:type.

We agree with the XML Schema WG that one reason is that RDF is not restricted to using datatypes defined by XML Schema, but allows other datatypes conforming to the XML Schema model for datatypes.

Another reason is that no other RDF/XML attribute takes QNames as arguments. Allowing this in one specific case is also likely to cause confusion.

Whilst RDF Core would have preferred to not to introduce a different attribute, it's judgement was that the solution proposed in the last call drafts is the best of the options available.

To minimise any confusion, RDF Core has carefully described the correct syntax in both the primer and the RDF/XML syntax documents. We further note that incorrect use of xsi:type where rdf:datatype should be used will be recognised as a syntax error by RDF/XML parsers.

Please reply, copying www-rdf-comments@w3.org whether this response is an acceptable disposition of your comment.
Further, more detailed, informal explanation is given below. We assume familiarity with XML Schema datatypes :)
RDF Datatypes in instance documents are described in general terms in the section “Typed Literals - rdf:datatype” http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-datatyped-literals

RDF datatypes are identified by URI-references and thus to indicate that a piece of RDF/XML is a datatyped literal, you need to give the URI somewhere. Existing (untyped) literals are used like this:
As element content:
    <ex:prop>foo</ex:prop>
As attribute content:
    <ex:Node ... ex:prop="foo" ... />
The latter can be considered as an abbreviation of the former form. The other things that can apply to RDF literals are the in-scope XML language:
As element content:
    <ex:prop xml:lang="en">foo</ex:prop>
As attribute content:
    <ex:Node ... ex:prop="foo" 
      xml:lang="en" ... />
(of course xml:lang can be on any outer element)
So it was natural to allow datatypes by creating an extra attribute in the property element form. Using it in the attribute form would have meant all the attributes values were of the same datatype (not so useful) and wasn't proposed. Thus a datatyped RDF literal is used in the instance data in the element form with a new rdf:datatype attribute:
    <ex:prop xml:lang="en" 
      rdf:datatype="http://example.org/dt"
      >foo</ex:prop>
    <ex:prop rdf:datatype="http://example.org/dt"
      >foo</ex:prop>
(Note, whether xml:lang values applies to such datatypes/is involved in the datatype mapping is another issue, please don't get distracted!)
The above didn't use an XML schema datatype URI in the example above since any datatype is be allowed (identified by a URI).

Not in any particular order or necessarily complete, but here is a summary of some issues that RDF Core considered for the RDF/XML syntax on encoding datatypes using xsi:type and why the rdf:datatype solution was decided.
1. xsi:type content is an XML Qname not a URI

Thus cannot indicate any arbitrary datatype URI reference, so another attribute would be needed for that case (like rdf:datatypeURI) - adding two attributes would be worse than adding one.

2. XML Qname attribute content in RDF/XML

This would be the first attribute in RDF/XML to take a XML Qname value (a big step). This would require extra explanation so that existing users wouldn't confuse them with those that took URIs.

3. Namespace declarations, prefixes

It would also require instance documents to declare the xsi namespace prefix and have to also check for any namespaces declared inside such xsi:type values and declare those too - again new implementations and explanation needed.

4. xsi:type would be confused with rdf:type

Since the former takes Qnames and the latter URI references, it would be possible to get the name wrong and be confused at the errors. Although xsi:type wouldn't be legal everywhere rdf:type was, rdf:type would have been allowed on elements that took xsi:type.

5. confusing URIs and Qnames

A bad choice of namespace prefixes might make cause other problems in xsi:type values, confusing them for URIs. It would also be more than likely that people would try to use Qnames in rdf:type attribute values.

6. xsi:type is illegal in RDF/XML now, unlikely to be used accidently

If somebody was tempted to use xsi:type, it would likely cause the parsing to fail. It is only ever used as an attribute in XML Schema documents and to use it on a literal in RDF/XML would be something like this:

<ex:prop xsi:type="xsd:string">foo</ex:prop>

which is forbidden by grammar production http://www.w3.org/TR/2003/WD-rdf-syntax-grammar-20030123/#literalPropertyElt

Response from XML Schema

Thank you for your response; we regret to report that we are slightly confused by it.

Is ‘The RDF Core WG agrees that there are good reasons for not using rdf:datatype rather than xsi:type’ a typo for ‘The RDF Core WG agrees that there are good reasons for ~~not~~ using rdf:datatype rather than xsi:type’? If so, then the rest of the reply makes more sense.

We are concerned, however, by point 6 in your informal response. It seems problematic to us to make xsi:type illegal in XML-encoded RDF documents. Among other things, this would make it impossible to validate documents with an XML Schema schema and later process them with an RDF processor.

It is also not clear to us that you or we have fully addressed the question of user confusion. The small string distance between rdf:type and xsi:type is almost certain eventually to confuse any user who must use both. Your analysis seems to us to indicate only that you don't really expect any users to be interested in both RDF and XML Schema; we don't think that such users are likely to be all that rare.

But since we don't have any better solution to urge upon you, all we can do is gloomily predict user confusion and hope for the best.

4.2. QNames (Editorial, but important) [xmlsch-09]

We were unable, on a first reading, to determine whether the default namespace declaration, and thus unprefixed names, were or were not allowed in order to encode 'RDF URI References'. Indeed the introductory prose about QNames (2nd para of [http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-intro]) does not seem to connect up with the relevant (?) production in [http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar], which we take to be [http://www.w3.org/TR/rdf-syntax-grammar/#URI-reference].

This can and should be cleared up.

Response from RDF

Colleagues,

The RDF Core WG has considered your last call comment captured in http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-09 (raised in section "4.2. QNames (Editorial, but important)" of from http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0489.html) and decided (http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Mar/0138.html) to accept it giving the following explanation:
The RDF/XML syntax WD section referred to is paragraph 2 of http://www.w3.org/TR/2003/WD-rdf-syntax-grammar-20030123/#section-Syntax-intro is the very first section in the document introducing the syntax intended as an overview, not defining the grammar.

We accept that this paragraph could be misleading and imply that an XML prefix, and thus only prefixed names, are required.

We propose to amend the text in that paragraph to make it clear that in a XML QName the prefix is optional where there is a default namespace either by adding a note or rewording to remove the mention of prefixes.
However, we note, the link [Qnames] in the section above already goes to the following definition of QName:
    QName ::= (Prefix ':')? LocalPart
    Namespaces in XML
    -- http://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-QName
which shows that the prefix part is optional in the current definiton of QNames.
This is also mentioned in the errata:
"Names with no colon can be qualified names." Namespaces in XML Errata -- http://www.w3.org/XML/xml-names-19990114-errata#NE10
We also peeked at XML 1.1 CR:
     QName ::= PrefixedName | UnprefixedName
Namespaces in XML 1.1, W3C Candidate Recommendation 18 December 2002 http://www.w3.org/TR/2002/CR-xml-names11-20021218/#NT-QName
which keeps the same distinction.
Please reply, copying www-rdf-comments@w3.org indicating whether this is an acceptable resolution of the comment.

Response from XML Schema

Thank you for the clarification.

4.3. Miscellaneous editorial notes

In http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-empty-property-elements, the sentence

When an arc in an RDF Graph points to an object node which has no further arcs, which appears in RDF/XML as an empty node element sequence such as the pair <rdf:Description rdf:about="..."> </rdf:Description>, this form can be shortened.

seems less clear than it might be. Different readers prove to have different views on what is meant by “the pair <rdf:Description rdf:about="..."> </rdf:Description>”; perhaps it can be replaced by something like “the empty element <rdf:Description rdf:about="..."/>” without loss of precision? Perhaps the sentence could read

When an arc in an RDF Graph points to an object node which has no further arcs, which appears in RDF/XML as an empty node element such as <rdf:Description rdf:about="..."/>, this form can be shortened.

4.4. Normative specification of XML grammar (policy, substantive) [xmlsch-10, xmlsch-11]

We note with admiration the excellent tutorial introduction to the striped syntax in Section 2 [http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax]. We are less happy with the nature of the syntax, and with the approach taken to its normative statement [http://www.w3.org/TR/rdf-syntax-grammar/#section-Infoset-Grammar].

As regards the syntax itself, we would much prefer to have seen a move to a single canonical syntax with much less variablity. With respect, the current design suggests that the value of XML has been misunderstood. The range of alternative forms of expression provided for in the current design make it very difficult to use the broad range of generic XML tools (e.g. syntax-directed editors, XSLT) which could give so much benefit to RDF users. (More on this below.) At the very least we would encourage you to specify a single canonical form, probably strictly striped, which could be defined by an XML Schema or DTD. We would be happy to work with you to develop a schema for such a subset.

As regards the approach taken to defining the syntax, in our view, layering of specs has very high value, and so defining an XML document type by way of what is very nearly a character-level BNF is at best a missed opportunity and at worst a serious mistake. It obscures the important aspects of the document type behind a welter of irrelevant detail about e.g. whitespace and start-tag/end-tag matching. It makes it very difficult for the reader to actually understand what is and isn't actually allowed -- what an RDF/XML document actually looks like.

Not only does this confuse levels and thus readers, it also runs the risk of inadvertently defining an XML subset. It also appears, on a strict reading, to rule out XML documents not derived from the parsing of character streams as possible RDF/XML (so that it would be illegitimate to regard a data structure created using a DOM interface, for example, as RDF/XML).

The use of event-triggered data-model construction actions to specify the relationship between XML representation and corresponding data objects is innovative and compelling, but surely it would be straight-forward to associate these events with a pre-order traversal of an infoset independently constrained by a DTD, XML Schema schema or other appropriate definition of the canonical document type. If continued support for alternative forms is considered essential, then a two-step approach where the semantics of any non-canonical form is defined in terms of a canonical form to which it corresponds would still be far simpler than the current approach.

First response from RDF:

Dear Colleagues

The RDF Core WG has considered your last call comment captured in http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-10 (raised in section "4.4. Normative specification of XML grammar (policy, substantive)" of http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0489.html) and decided http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0361.html to postpone it.

A canonical subset of RDF/XML was considered by the RDF Core WG. However the WG believes that due to the way mixed namespaces are used in RDF/XML it is not possible to define such a subset that:
a) can represent all the RDF graphs that RDF/XML can represent

b) can be described by an DTD or an XML Schema.

An alternative would be to define a new syntax that is describable with a DTD or an XML Schema but doing so is beyond the scope of RDF Core's current charter. We note that the XHTML WG have expressed interest in working on such a syntax and have been encouraged to do so by RDF Core. RDF Core also welcomes XML Schema's offer to help with this work.

We will add this issue to the RDFCore postponed issues list at: http://www.w3.org/2000/03/rdf-tracking/#rdfms-validating-embedded-rdf

Please reply to this email, copying www-rdf-comments@w3.org indicating whether this decision is acceptable.

Thanks

Dave

Second response from RDF

Dear Colleagues

The RDF Core WG has considered your last call comment captured in http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-11 (raised in section "4.4. Normative specification of XML grammar (policy, substantive)" of http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0489.html)

The main points you raised in this comment are:
1) RDF/XML is defined in "what is very nearly a character-level BNF [which] is at best a missed opportunity and at worst a serious mistake."
obscuring important parts of the document type

making it very difficult for the reader to actually understand what is and isn't actually allowed.

confusing layers

RDF/XML is entirely layered on the XML Infoset as defined in Syntax Data Model http://www.w3.org/TR/rdf-syntax-grammar/#section-Data-Model and is not defined at the character-level.

All XML detail is handled by the XML specifications, not this document - deployed RDF/XML applications are entirely built on standard XML tools. In layering on the XML infoset, we leave only the important parts of RDF/XML that users and application writers need be concerned about - elements, attributes, whitespace and text.

It would have been a mistake to gloss over where, say, the whitespace was significant and where it was ignored - which was one problem with the original RDF M&S specification.

2) Rules out XML documents not parsed from character streams (such as DOM)

This was explicitly called out:
This model illustrates one way to create a representation of an RDF Graph from an RDF/XML document. It does not mandate any implementation method - any other method that results in a representation of the same RDF Graph may be used.

In particular: ...
This specification does not require the use of [XPATH] or [SAX2]

http://www.w3.org/TR/rdf-syntax-grammar/#section-Data-Model

If a DOM interface can provide the very few (4) XML Infoset Infoitems that are needed here, it is not ruled out.

3) Suggests a two-step approach first mapping to canonical RDF form constrained by DTD or XML Schema

An approach using a mapping to a canonical RDF written in XML is related to issue xmslch-10 where we explain why we didn't feel we could do this under the current charter. It certainly would have been useful and helped.

The model and grammar used here closely matches how many RDF/XML apps were written, in a token matching style that can be used with standard syntax lexers and grammar generators. This approach has proved suitable after other implementor feedback.

The RDF Core Working Group has decided: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0361.html that the explanation above answers your comment as a clarification.

Please reply to this email, copying www-rdf-comments@w3.org indicating whether this decision is acceptable.

Thanks

Dave

Response from XML Schema

Thank you.

We realize that this is a difficult area, but we believe that it would be a mistake for W3C to move forward with a new version of the RDF specifications without undertaking the work of a revision of the syntax.

We regret that we must dissent formally from your resolution of this issue. The current mismatch between RDF syntax and off-the-shelf XML tools has not become easier to bear as time goes on; we believe it must be addressed.

4.5. On the relation between RDF and off-the-shelf XML tools (policy, substantive) [xmlsch-12]

With some diffidence, we conclude by raising what may be a sensitive issue.

It does not seem to us that the XML serialization of RDF shows RDF to advantage. At the level of the underlying graph model, RDF information has a simple and regular structure, which appears in the XML serialization to be anything but simple and so irregular as to bring the words “capricious” and “arbitrary” to the lips of unprejudiced observers. Tastes in markup style differ, but we believe that the root of the problem is the high degree of variability with which the same underlying graph structures may be serialized, according to the rules given in this document.

Owing in part to the variability itself, and in part to the specific forms taken by that variability, it is not feasible to write an XML Schema schema, or (if the comments in Appendix A.1 are accurate) a Relax NG schema, or an XML 1.0 DTD, which defines the set of correct serializations of correct RDF graphs. It is not convenient to run XSLT processes over arbitrary RDF serializations, nor to query or process arbitrary RD data using XQuery. Arbitrary RDF data is similarly inconvenient for other standard XML tools to process.

There is, as a result, something of a cleft between the RDF community and the set of RDF tools on the one hand, and the community of users and tools employing what some have called colloquial XML. The parallel development of query languages, schema languages, object models, APIs, editors, display tools, and so on does offer relatively harmless ways for a large number of people to employ their time, but it does not seem to us to serve the larger Web community well.

The cleft between RDF and colloquial XML does not seem to us to be required by the RDF data model. A graph in which nodes have certain properties and arcs have certain properties is not, in itself, a peculiarly difficult structure to render in XML or to process with off-the-shelf XML tools. An XML vocabulary in which nodes may appear as elements, or as attributes, or as attribute values, or as the PCDATA content of elements, and in which property names may appear as three of the same four constructs, on the other hand, seems a rather less straightforward XML representation of the underlying graph structure than most XML vocabularies for graphs have chosen.

The result is that not just arbitrary RDF data, but data encoded using vocabularies defined in RDF terms (for which current W3C work provides a number of examples), will be hard to process using off-the-shelf tools. We believe this difficulty represents a lost opportunity, and we believe the opportunity could readily be seized if the XML serialization were modified to capture more of the regularity of the RDF data model.

We are ready to work together with the Working Groups in the Semantic Web Activity and with other interested parties to formulate an XML serialization which captures the information in the RDF model and which is more readily amenable to processing with off-the-shelf XML tools.

Response from RDF:

Dear Colleagues

The RDF Core WG has considered your last call comment captured in http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-12 raised in (XML Schema) section "4.5. On the relation between RDF and off-the-shelf XML tools (policy, substantive)" http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0489.html and (Butler) http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0531.html and decided http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0361.html to postpone it.

The main points we felt you raised in this comment are:
1) RDF/XML
doesn't match the RDF graph model well

many ways to write things (elements, attributes, attribute values , ...)

cannot write a W3C XML Schema, Relax NG schema, XML 1.0 DTD

"not convienient" to use XSLT, use XQuery, other XML tools

We know and could give you more problems. However we felt we couldn't fix it all due to the charter constraint:
[[The RDF Core WG is neither chartered to develop a new RDF syntax, ...]]

http://www.w3.org/2001/sw/RDFCoreWGCharter

Although we note, most of the above XML technologies mentioned above are successfully used with RDF/XML.

So we propose to postpone dealing with this in this WG, recording your comments for any future work.

2) RDF and XML need not be on different paths
models, QLs, APIs, editors, tools

this cleft is not required

We encourage work to help integrate better but recognise this is heading into larger web architecture issues.

3) Propose that the XML serialization were modified to capture more of the regularity of the RDF data model, offer help.

The WG notes your offer of help and has asked the semantic web coordination group to carry it forward.

We will add this issue to the RDFCore postponed issues list at: http://www.w3.org/2000/03/rdf-tracking/#rdfms-validating-embedded-rdf

Please reply to this email, copying www-rdf-comments@w3.org indicating whether this decision is acceptable.

Thanks

Dave

Response from XML Schema

None. (Note: owing to an error in preparing this version of this document, the RDF Core response on this issue was missed and the XML Schema Working Group did not have it in front of them when we approved our responses. The editor of this document has unilaterally suppressed the response originally drafted and approved, since it was predicated on false assumptions.)