Re: XML Schema WG comments on RDF documents from Jeremy Carroll on 2003-06-30 (www-rdf-comments@w3.org from April to June 2003)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Mon, 30 Jun 2003 23:10:44 +0300
To: www-rdf-comments@w3.org, cmsmcq@acm.org, w3c-xml-schema-ig@w3.org
Message-Id: <200306302310.44994.jjc@hpl.hp.com>
This is a combo reply to the following points in your message:
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0489.html
http://www.w3.org/XML/Group/2003/03/xml-schema-rdf-notes.html

 
1.1. Design question, complexity (substantive)
1.2. Whitespace handling (schema-related)
 
2.1. Mapping from lexical forms to values (schema-related, terminological)
2.2. Values without lexical forms (schema-related, important)
2.3. Lexical forms, strings, and character sequences (schema-related, 
editorial)
2.4. Strings for natural-language data (substantive)
2.5. Typos and minor editorial notes

While you made the first two comments against the RDF primer, the RDF Core WG 
took them as against our design, and it fell to the concepts editors to lead 
the group's efforts to address them.

We assigned issue identifiers as follows:
1.1. Design question, complexity (substantive)
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-01
1.2. Whitespace handling (schema-related)
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-02
 
2.1. Mapping from lexical forms to values (schema-related, terminological)
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-03
2.2. Values without lexical forms (schema-related, important)
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-04
2.3. Lexical forms, strings, and character sequences (schema-related, 
editorial)
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-05
2.4. Strings for natural-language data (substantive)
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-06
2.5. Typos and minor editorial notes
No id, considered by myself alone.

===

The resolutions for the first two issues are found in our minutes of the 9th 
May:
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0138
The resolution for issues xmlsch-03 xmlsch-04 are found in our minutes of the 
2nd May
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0031
The resolution for issues xmlsch-05 and xmlsch-06 are found in our minutes of 
the 16th May.
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0199

The latest editors draft, which has all the last call issues addressed is:
http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/
The changes sections and the IDs mentioned may help you see how your comments 
have helped us.

===

Blow by blow account:

xmlsch-01 1.1. Design question, complexity (substantive)
++++++++++++++++++++++++++++++++++++++++++++++
you said:
[[
1.1. Design question, complexity (substantive)
 The introduction of pairs consisting of a lexical form and a type (or, 
strictly speaking, a lexical form and a type label) seems at first glance to 
complicate the RDF model somewhat. We have had the impression that in other 
parts of RDF, typing is handled by adding further arcs and nodes. If the type 
of a resource is identified by having an arc labeled rdf:type from it to (the 
URI of) its (RDF) type, and if the type of an arc is similarly identified by 
an arc, then surely a reason ought to be given for shifting to a different 
method for typing literal strings. It seems like a dramatic shift in the 
infrastructure of RDF, from "everything is a node, an arc, or a literal 
value" to "everything is a node, an arc, or a typed literal value". Perhaps 
not quite so dramatic, after all. But the question of design consistency 
remains: why not "everything is a typed node, a typed arc, or a typed 
literal"?
]] 


Our resolution is:
xmlsch-01 as in 0252 with amendment.
i.e.
[[
The RDF Core WG interprets this comment as two questions and a comment:

   1)  Why is the type of a literal not described using a property arc, as 
is done for other literals?

   2)  Having introduced typed literal nodes, why not introduce typed 
resource nodes and typed property arcs as well

   3)  The WG should provide a rationale for this design in the specifications

Regarding question 1:

This would require that literals be allowed as subjects of RDF 
statements.  This is not possible in current RDF/XML and would require 
considerable change, beyond the scope of the WG, to  support it.    Further 
it introduces problems of non-monotonicity in the semantics.  A property 
whose value is plain literal is currently taken to denote a sequence 
characters.  Adding a further statement could change that value to, say an 
integer, invalidating previous inferences and breaking a fundamental tenet 
of RDF.

Regarding question 2:

No requirement justified a change to the notion of a URIREF node or an RDF 
arc.

Regarding comment 3:

Providing a rationale document to accompany the specifications would 
certainly be nice to have, but the working group chose to spend its writing 
resource on explanatory text and formal specification
rather than justification.  We reject this comment on the grounds that the 
specifications are not intended to provide a rationale.
]]

xmlsch-02 1.2. Whitespace handling (schema-related)
+++++++++++++++++++++++++++++++++++++++++++
you wrote:
[[
1.2. Whitespace handling (schema-related)
 Some members of the XML Schema WG have expressed concern that XML Schema's 
rules for whitespace handling may interfere with expected behavior in other 
contexts. This may be the appropriate place to bring this question up. 
In brief, XML Schema's simple types each define a whitespace facet, which 
governs the kind of whitespace pre-processing done by an XML Schema processor 
before the lexical form is checked for type validity. Since the point of 
whitespace normalization is to simplify subsequent processing, the lexical 
spaces of XML Schema's simple types are (like those in many programming 
languages) defined without reference to the preceding whitespace 
normalization. Integers, for example, are represented by sequences of decimal 
digits; sequences containing blanks are not legal lexical forms for integers. 
Indeed, strictly speaking it is only after the whitespace pre-processing is 
done that the XML Schema processor can be said to be working with a lexical 
form at all. 
For example, the integer type has a value of collapse for the whitespace 
facet, which means leading and trailing whitespace is stripped, and internal 
whitespace sequences are reduced to a single blank (x20) character. In an XML 
document in which the element exterms:age is defined as having type 
xs:integer, the following instances of exterms:age will all be type-valid: 
<exterms:age>27</exterms:age>
<exterms:age>
  27
</exterms:age>
<exterms:age>   27  </exterms:age>
<exterms:age>   2<!--* ha, ha, fooled your full-text indexer!
*-->7  </exterms:age>
 The input information set, in each case, contains a character information 
item for "2" followed by a character information item for "7", with character 
information items for whitespace characters, and a comment information item, 
present in some of the examples. In all cases, the lexical form proper is the 
character sequence "27" (i.e. the sequence of characters after white space 
handling, and ignoring comments, processing instructions, entity boundaries, 
and other distractions). This is a legal lexical form for an integer, so all 
the examples are type valid. 
Some members of the XML Schema WG have worried that it may not be obvious that 
the whitespace processing is not part of the process of checking lexical 
forms for type validity, but part of the process of extracting the lexical 
forms from the XML information set presented to the processor. If an RDF 
document contains 
<exterms:age>   27  </exterms:age>
 and a processor hands the contents of the element to a generic type-checker 
for XML Schema's simple types, saying in effect "this purports to be the 
lexical form of an integer; is that OK?", that type checker will be required 
(if it conforms to the XML Schema spec's definition of the simple types) to 
say "no, the character sequence '   27  ' is not a legal lexical form for an 
integer." 
It's not clear whether RDF, being type-system neutral, can directly address 
this concern (e.g. by specifying that an RDF processor should do the 
appropriate whitespace pre-processing, or by warning users that they should 
not include vagrant whitespace in typed literals), or whether it suffices for 
developers of RDF software with built-in support for XML Schema's simple 
types to deal with it, e.g. by performing it themselves before handing the 
resulting lexical form to a type checker. 
As noted, some members of our WG feel that you need to be alerted to this as a 
possible source of confusion and unexpected results. Other members of the WG 
feel that it verges on disrespect to assume that you need instruction on this 
point. We compromised by agreeing to point out the issue to you, and to leave 
you to draw your own conclusions. 
]]

The RDF Core WG resolved:
xmlsch-02 addressed by msg-0097
where msg-0097 is
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0097.html
and says

***
PROPOSE RDF Core accepts the comment xmlsch-02 and agree to add the
following test case:


<rdf:Description rdf:about="http://www.example.org/a">
   <eg:prop rdf:datatype="&xsd;int">3</eg:prop>
</rdf:Description>

Does not entail

<rdf:Description rdf:about="http://www.example.org/a">
   <eg:prop rdf:datatype="&xsd;int"> 3 </eg:prop>
</rdf:Description>

Moreover the following comment to be added to concepts:

[[
NOTE: In [XML Schema (part 1)], white space normalization occurs
during validation according to the value of the whiteSpace
facet. The lexical-to-value mapping used in RDF datatyping
occurs after this, so that the whiteSpace facet has no
effect in RDF datatyping.
]]
***
In fact more test cases were desired, and the test cases created are currently 
awaiting final WG approval and can be found in:
http://www.w3.org/2000/10/rdf-tests/rdfcore/xmlsch-02/
The Manifest file describes four tests showing that:: 
+ A well-formed typed literal is not related to an ill-formed literal. Even if 
they only differ by whitespace.
+ A simple test for well-formedness of a typed literal.
+ An integer with whitespace is ill-formed.

The actual text corresponding to the agreed note is found at the end of 
section 5
http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatypes
a certain amount of editorial descretion was taken to consolidate notes 
concerning your comments.

The full note from the editors draft is:

[[

Note: When the datatype is defined using XML Schema: 

...

+ In [XML-SCHEMA1], white space normalization occurs during validation 
according to the value of the whiteSpace facet. The lexical-to-value mapping 
used in RDF datatyping occurs after this, so that the whiteSpace facet has no 
effect in RDF datatyping. 

]]

xmlsch-03 2.1. Mapping from lexical forms to values
+++++++++++++++++++++++++++++++++++++++++
xmlsch-04 2.2. Values without lexical forms 
+++++++++++++++++++++++++++++++++++
You wrote:

[[
2.1. Mapping from lexical forms to values (schema-related, terminological)
In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: 
A datatype mapping is a set of pairs whose first element belongs to the 
lexical space of the datatype, and the second element belongs to the value 
space of the datatype: 
We agree that it is useful to define a term to denote such mappings; in the 
interests of inter-specification consistency, we wonder whether you would be 
willing to consider using the term lexical mapping, which we are introducing 
in our forthcoming draft of XML Schema 1.1. The term datatype mapping seems 
unlikely to be usable in the XML Schema specification, where it would suggest 
to some readers a mapping from one datatype to another, rather than as here a 
mapping from lexical space to value space. (XML Schema 1.0 got by without a 
term for this concept.) 

2.2. Values without lexical forms (schema-related, important)
In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: 


Each member of the value space may be paired with any number (including zero) 
of members of the lexical space (lexical representations for that value).
 The provision for values without corresponding lexical forms contradicts an 
assumption to which the XML Schema spec appeals from time to time. The 
lexical space of any simple datatype in XML Schema is the domain of the 
type's lexical mapping; the value space is its domain. There are no 
meaningless lexical forms in the lexical space of the type, nor are there 
ineffable values in the value space. By eliminating values from the value 
space (e.g. by setting minimal and maximal values), the type definer may 
indirectly also eliminate lexical forms from the lexical space; conversely, 
by eliminating some items from the lexical space (e.g. by setting a pattern), 
the type definer may eliminate items from the value space. 
Are there crucial aspects of RDF which will break if the list item quoted 
above is changed to read "paired with one or more members of the lexical 
space"? 
]]

We decided:
[[
PROPOSED to clarify xmlsch-03 xmlsch-04 pfps-13
  based on the proposal to close in
    http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0368.html
]]
i.e.
[[
PROPOSE
 - xmlsch-03 - we globally use the term lexical-to-value mapping instead of 
datatype mapping or any other term
 - xmslch-04 - we do not change the definition of value space but add a note 
clarifying the relationship with XML Schema datatypes.
]]

The new text can be found in the editors draft at:
http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatypes
and reads:
[[
5. Datatypes (Normative)

 The datatype abstraction used in RDF is compatible with the abstraction used 
in XML Schema Part 2: Datatypes [XML-SCHEMA2].

 A datatype consists of a lexical space, a value space and a lexical-to-value 
mapping. 

The lexical space of a datatype is a set of Unicode [UNICODE] strings.

 The lexical-to-value mapping of a datatype is a set of pairs whose first 
element belongs to the lexical space of the datatype, and the second element 
belongs to the value space of the datatype: 

Each member of the lexical space is paired with (maps to) exactly one member 
of the value space. 
Each member of the value space may be paired with any number (including zero) 
of members of the lexical space (lexical representations for that value). 

A datatype is identified by one or more URI references. 

RDF may be used with any datatype definition that conforms to this 
abstraction, even if not defined in terms of XML Schema. 

Certain XML Schema built-in datatypes are not suitable for use within RDF. For 
example, the QName datatype requires a namespace declaration to be in scope 
during the mapping, and is not recommended for use in RDF. [RDF-SEMANTICS] 
contains a more detailed discussion of specific XML Schema built-in 
datatypes. 


Note: When the datatype is defined using XML Schema: 

+ All values correspond to some lexical form, either using the 
lexical-to-value mapping of the datatype or if it is a union datatype with a 
lexical mapping associated with one of the member datatypes. 
+ XML Schema facets remain part of the datatype and are used by the XML Schema 
mechanisms that control the lexical space and the value space; however, RDF 
does not define a standard mechanism to access these facets.
]]

xmlsch-05 2.3. Lexical forms, strings, and character sequences
+++++++++++++++++++++++++++++++++++++++++++++++++++

Your comment:
[[
2.3. Lexical forms, strings, and character sequences (schema-related, 
editorial)
In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: 
With one exception, the datatypes used in RDF have a lexical space consisting 
of a set of strings. 
Since "string" is used as the local name for a particular simple type in the 
XML Schema namespace, we believe it will be less confusing for users, in the 
long run, if the lexical representations of simple-datatype values are 
described not as "strings" but as "character sequences". 
This comment also applies to other uses of the term string to denote the 
members of a lexical space.
]]

RESOLVED: do not accept xmlsch-05
Rationale:
It feels like a fairly extensive editorial change. Also in the semantic web
activity documents xsd:string is always refered to in its qualified form, and
so the possible confusion is diminished.

xmlsch-06 Strings for natural-language data
+++++++++++++++++++++++++++++++++++

Your comment:
[[
2.4. Strings for natural-language data (substantive)
In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: 


A plain literal is a string combined with an optional language identifier. 
This should be used for plain text in a natural language. As recommended in 
the RDF formal semantics [RDF-SEMANTICS], these plain literals are 
self-denoting. 
We do not believe that simple strings are likely to be adequate for the 
representation of arbitrary natural-language text. Even in English, 
natural-language utterances (such as this document) may need some degree of 
inline markup for clarity and adequate presentation; in natural-language 
utterances requiring bidirectional display or ruby, the best authorities 
(including the W3C I18n Working Group) recommend the use of markup within the 
natural-language utterance. We thus suggest that you may wish to moderate 
this recommendation that natural-language material be represented by 
literals.
This is not an area in which we claim particular technical expertise; we 
merely call it to your attention in the hopes that doing so may be useful to 
you.
]]

RESOLVED: to accept xmlsch-06, with revised wording as noted
[[
A plain literal is a string combined with an optional language
         identifier. This may be used for plain text
         in a natural language. As recommended in the RDF formal semantics
         [RDF-SEMANTICS], these plain literals are self-denoting.
]]
after other changes the text now reads:


Finally you made the following minor editorial comments:

[[
In http://www.w3.org/TR/rdf-concepts/#section-Literal-Value, for "the datatype 
mapping is applied to the pair form by the lexical form and the language 
identifier" read "the datatype mapping is applied to the pair formed by the 
lexical form and the language identifier".
]]
Text has vanished in other changes.
[[
 In the same section, for "Such a case, while in error, is not syntacticly 
ill-formed " read "Such a case, while in error, is not syntactically 
ill-formed" (et passim).
]]
done.
[[
In section http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral, for "root 
element tag" read "root element".
]]
this text has gone, however new text with start tag and end tag is now in 
place specifically:
"when embedded between an arbitrary XML start tag and an end tag form a 
document"
[[
In the same section, for "XML element content" read "XML data" (the term 
element content is used in some markup-related specs as a complement of mixed 
content to denote the content of elements which can contain other elements 
but cannot contain parsed character data).
]]
done.


Thank you for all your comments, and your detailed review.
They have been very helpful.

Please reply to this email, copying www-rdf-comments@w3.org indicating
whether these decisions are acceptable (please clearly identify those which 
are not).

Jeremy on behalf of RDF Core WG
Received on Monday, 30 June 2003 17:10:58 UTC