Re: XML Schema WG comments on RDF documents from C. M. Sperberg-McQueen on 2003-10-03 (www-rdf-comments@w3.org from October to December 2003)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: 03 Oct 2003 22:30:16 +0200
To: Jeremy Carroll <jjc@hpl.hp.com>
Cc: www-rdf-comments@w3.org, w3c-xml-schema-ig@w3.org
Message-Id: <1065213015.5824.498.camel@michael.hit.uib.no>
Colleagues,

thank you for your response to our comments.  A full account
of our formal responses to your responses is attached to
http://lists.w3.org/Archives/Public/www-rdf-comments/2003OctDec/0011.html
For the sake of those who are trying to track these particular issues
using the email archives, our responses on these topic are given 
below.

-C. M. Sperberg-McQueen
 for the XML Schema WG

On Mon, 2003-06-30 at 22:10, Jeremy Carroll wrote:
> This is a combo reply to the following points in your message:
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0489.html
> http://www.w3.org/XML/Group/2003/03/xml-schema-rdf-notes.html
> 
>  
> 1.1. Design question, complexity (substantive)
> 1.2. Whitespace handling (schema-related)
>  
> 2.1. Mapping from lexical forms to values (schema-related, terminological)
> 2.2. Values without lexical forms (schema-related, important)
> 2.3. Lexical forms, strings, and character sequences (schema-related, 
> editorial)
> 2.4. Strings for natural-language data (substantive)
> 2.5. Typos and minor editorial notes
> 
> While you made the first two comments against the RDF primer, the RDF Core WG 
> took them as against our design, and it fell to the concepts editors to lead 
> the group's efforts to address them.
> 
> We assigned issue identifiers as follows:
> 1.1. Design question, complexity (substantive)
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-01
> 1.2. Whitespace handling (schema-related)
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-02
>  
> 2.1. Mapping from lexical forms to values (schema-related, terminological)
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-03
> 2.2. Values without lexical forms (schema-related, important)
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-04
> 2.3. Lexical forms, strings, and character sequences (schema-related, 
> editorial)
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-05
> 2.4. Strings for natural-language data (substantive)
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-06
> 2.5. Typos and minor editorial notes
> No id, considered by myself alone.
> 
> ===
> 
> The resolutions for the first two issues are found in our minutes of the 9th 
> May:
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0138
> The resolution for issues xmlsch-03 xmlsch-04 are found in our minutes of the 
> 2nd May
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0031
> The resolution for issues xmlsch-05 and xmlsch-06 are found in our minutes of 
> the 16th May.
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0199
> 
> The latest editors draft, which has all the last call issues addressed is:
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/
> The changes sections and the IDs mentioned may help you see how your comments 
> have helped us.
> 
> ===
> 
> Blow by blow account:
> 
> xmlsch-01 1.1. Design question, complexity (substantive)
> ++++++++++++++++++++++++++++++++++++++++++++++
> you said:
> [[
> 1.1. Design question, complexity (substantive)
>  The introduction of pairs consisting of a lexical form and a type (or, 
> strictly speaking, a lexical form and a type label) seems at first glance to 
> complicate the RDF model somewhat. We have had the impression that in other 
> parts of RDF, typing is handled by adding further arcs and nodes. If the type 
> of a resource is identified by having an arc labeled rdf:type from it to (the 
> URI of) its (RDF) type, and if the type of an arc is similarly identified by 
> an arc, then surely a reason ought to be given for shifting to a different 
> method for typing literal strings. It seems like a dramatic shift in the 
> infrastructure of RDF, from "everything is a node, an arc, or a literal 
> value" to "everything is a node, an arc, or a typed literal value". Perhaps 
> not quite so dramatic, after all. But the question of design consistency 
> remains: why not "everything is a typed node, a typed arc, or a typed 
> literal"?
> ]] 
> 
> 
> Our resolution is:
> xmlsch-01 as in 0252 with amendment.
> i.e.
> [[
> The RDF Core WG interprets this comment as two questions and a comment:
> 
>    1)  Why is the type of a literal not described using a property arc, as 
> is done for other literals?
> 
>    2)  Having introduced typed literal nodes, why not introduce typed 
> resource nodes and typed property arcs as well
> 
>    3)  The WG should provide a rationale for this design in the specifications
> 
> Regarding question 1:
> 
> This would require that literals be allowed as subjects of RDF 
> statements.  This is not possible in current RDF/XML and would require 
> considerable change, beyond the scope of the WG, to  support it.    Further 
> it introduces problems of non-monotonicity in the semantics.  A property 
> whose value is plain literal is currently taken to denote a sequence 
> characters.  Adding a further statement could change that value to, say an 
> integer, invalidating previous inferences and breaking a fundamental tenet 
> of RDF.

On question 1: Thank you; that helps clarify the design.

> Regarding question 2:
> 
> No requirement justified a change to the notion of a URIREF node or an RDF 
> arc.

On question 2: In the final analysis this is your call and we don't
plan to lie down in the road over it. For the record, though, we
should record that we find your analysis unconvincing.  The
introduction of typed literals introduces a new idea into RDF, and it
is obvious that this new idea has possible applications elsewhere in
the design space. Your response amounts to saying that you chose not
to work through the design implications of introducing this kind of
type labeling, because it seemed possible to get by without such
re-thinking.  The result is that the new idea will continue to feel
incompletely integrated into RDF; it will feel like a patch added as
an afterthought rather than an integral part of the design.

> Regarding comment 3:
> 
> Providing a rationale document to accompany the specifications would 
> certainly be nice to have, but the working group chose to spend its writing 
> resource on explanatory text and formal specification
> rather than justification.  We reject this comment on the grounds that the 
> specifications are not intended to provide a rationale.
> ]]

On comment 3: We understand your desire not to work your editors to
death. Your one-paragraph response to question 1, however, does a good
job of clarifying the point that was obscure to us, and we think it
may not be beyond the wit of your editors to introduce its substance
into the text at some appropriate point.

Overall: we are not wholly convinced by your resolution of this issue
but do not wish to appeal to the Director on it.


> xmlsch-02 1.2. Whitespace handling (schema-related)
> +++++++++++++++++++++++++++++++++++++++++++
> you wrote:
> [[
> 1.2. Whitespace handling (schema-related)
>  Some members of the XML Schema WG have expressed concern that XML Schema's 
> rules for whitespace handling may interfere with expected behavior in other 
> contexts. This may be the appropriate place to bring this question up. 
> In brief, XML Schema's simple types each define a whitespace facet, which 
> governs the kind of whitespace pre-processing done by an XML Schema processor 
> before the lexical form is checked for type validity. Since the point of 
> whitespace normalization is to simplify subsequent processing, the lexical 
> spaces of XML Schema's simple types are (like those in many programming 
> languages) defined without reference to the preceding whitespace 
> normalization. Integers, for example, are represented by sequences of decimal 
> digits; sequences containing blanks are not legal lexical forms for integers. 
> Indeed, strictly speaking it is only after the whitespace pre-processing is 
> done that the XML Schema processor can be said to be working with a lexical 
> form at all. 
> For example, the integer type has a value of collapse for the whitespace 
> facet, which means leading and trailing whitespace is stripped, and internal 
> whitespace sequences are reduced to a single blank (x20) character. In an XML 
> document in which the element exterms:age is defined as having type 
> xs:integer, the following instances of exterms:age will all be type-valid: 
> <exterms:age>27</exterms:age>
> <exterms:age>
>   27
> </exterms:age>
> <exterms:age>   27  </exterms:age>
> <exterms:age>   2<!--* ha, ha, fooled your full-text indexer!
> *-->7  </exterms:age>
>  The input information set, in each case, contains a character information 
> item for "2" followed by a character information item for "7", with character 
> information items for whitespace characters, and a comment information item, 
> present in some of the examples. In all cases, the lexical form proper is the 
> character sequence "27" (i.e. the sequence of characters after white space 
> handling, and ignoring comments, processing instructions, entity boundaries, 
> and other distractions). This is a legal lexical form for an integer, so all 
> the examples are type valid. 
> Some members of the XML Schema WG have worried that it may not be obvious that 
> the whitespace processing is not part of the process of checking lexical 
> forms for type validity, but part of the process of extracting the lexical 
> forms from the XML information set presented to the processor. If an RDF 
> document contains 
> <exterms:age>   27  </exterms:age>
>  and a processor hands the contents of the element to a generic type-checker 
> for XML Schema's simple types, saying in effect "this purports to be the 
> lexical form of an integer; is that OK?", that type checker will be required 
> (if it conforms to the XML Schema spec's definition of the simple types) to 
> say "no, the character sequence '   27  ' is not a legal lexical form for an 
> integer." 
> It's not clear whether RDF, being type-system neutral, can directly address 
> this concern (e.g. by specifying that an RDF processor should do the 
> appropriate whitespace pre-processing, or by warning users that they should 
> not include vagrant whitespace in typed literals), or whether it suffices for 
> developers of RDF software with built-in support for XML Schema's simple 
> types to deal with it, e.g. by performing it themselves before handing the 
> resulting lexical form to a type checker. 
> As noted, some members of our WG feel that you need to be alerted to this as a 
> possible source of confusion and unexpected results. Other members of the WG 
> feel that it verges on disrespect to assume that you need instruction on this 
> point. We compromised by agreeing to point out the issue to you, and to leave 
> you to draw your own conclusions. 
> ]]
> 
> The RDF Core WG resolved:
> xmlsch-02 addressed by msg-0097
> where msg-0097 is
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0097.html
> and says
> 
> ***
> PROPOSE RDF Core accepts the comment xmlsch-02 and agree to add the
> following test case:
> 
> 
> <rdf:Description rdf:about="http://www.example.org/a">
>    <eg:prop rdf:datatype="&xsd;int">3</eg:prop>
> </rdf:Description>
> 
> Does not entail
> 
> <rdf:Description rdf:about="http://www.example.org/a">
>    <eg:prop rdf:datatype="&xsd;int"> 3 </eg:prop>
> </rdf:Description>
> 
> Moreover the following comment to be added to concepts:
> 
> [[
> NOTE: In [XML Schema (part 1)], white space normalization occurs
> during validation according to the value of the whiteSpace
> facet. The lexical-to-value mapping used in RDF datatyping
> occurs after this, so that the whiteSpace facet has no
> effect in RDF datatyping.
> ]]
> ***
> In fact more test cases were desired, and the test cases created are currently 
> awaiting final WG approval and can be found in:
> http://www.w3.org/2000/10/rdf-tests/rdfcore/xmlsch-02/
> The Manifest file describes four tests showing that:: 
> + A well-formed typed literal is not related to an ill-formed literal. Even if 
> they only differ by whitespace.
> + A simple test for well-formedness of a typed literal.
> + An integer with whitespace is ill-formed.
> 
> The actual text corresponding to the agreed note is found at the end of 
> section 5
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatypes
> a certain amount of editorial descretion was taken to consolidate notes 
> concerning your comments.
> 
> The full note from the editors draft is:
> 
> [[
> 
> Note: When the datatype is defined using XML Schema: 
> 
> ...
> 
> + In [XML-SCHEMA1], white space normalization occurs during validation 
> according to the value of the whiteSpace facet. The lexical-to-value mapping 
> used in RDF datatyping occurs after this, so that the whiteSpace facet has no 
> effect in RDF datatyping. 
> 
> ]]


Thank you for your reply.

The XML Schema Working Group is in agreement on one point of our reply
and divided in our opinion on a second point.

First, we are agreed that the position you sketch out is not a source
of logical inconsistency which will render your specification
meaningless or logically problematic. It is entirely possible for you
to handle whitespace in this way.

On the second point, our views are divided.

A minority of the Working Group believes that you have made a
reasonable design choice, given that RDF will only ever be produced by
and consumed by software, and that humans and issues of human
legibility are not and should not be matters of concern in your
design.

A larger portion of the Working Group vigorously disagrees and
believes that for RDF processors to treat your two test cases
differently is to build into RDF a potential for astonishing users and
leading to unexpected results which will haunt you and your users for
years to come.  In this view, it is not as a matter of compatibility
with XML Schema, but as a matter of common-sense concern for your
users that you should simply say that the whitespace processing
specified for the type in question should be performed by any RDF
processor.

Overall: we do not have consensus either to express satisfaction with
your resolution of this issue or to raise a formal dissent. In the
opinion of our chair, this means there is no formal dissent, but he
recommends that this point be listed during the review of formal
dissents as an issue on which there was not perfect consensus.


> xmlsch-03 2.1. Mapping from lexical forms to values
> +++++++++++++++++++++++++++++++++++++++++
> xmlsch-04 2.2. Values without lexical forms 
> +++++++++++++++++++++++++++++++++++
> You wrote:
> 
> [[
> 2.1. Mapping from lexical forms to values (schema-related, terminological)
> In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: 
> A datatype mapping is a set of pairs whose first element belongs to the 
> lexical space of the datatype, and the second element belongs to the value 
> space of the datatype: 
> We agree that it is useful to define a term to denote such mappings; in the 
> interests of inter-specification consistency, we wonder whether you would be 
> willing to consider using the term lexical mapping, which we are introducing 
> in our forthcoming draft of XML Schema 1.1. The term datatype mapping seems 
> unlikely to be usable in the XML Schema specification, where it would suggest 
> to some readers a mapping from one datatype to another, rather than as here a 
> mapping from lexical space to value space. (XML Schema 1.0 got by without a 
> term for this concept.) 
> 
> 2.2. Values without lexical forms (schema-related, important)
> In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: 
> 
> 
> Each member of the value space may be paired with any number (including zero) 
> of members of the lexical space (lexical representations for that value).
>  The provision for values without corresponding lexical forms contradicts an 
> assumption to which the XML Schema spec appeals from time to time. The 
> lexical space of any simple datatype in XML Schema is the domain of the 
> type's lexical mapping; the value space is its domain. There are no 
> meaningless lexical forms in the lexical space of the type, nor are there 
> ineffable values in the value space. By eliminating values from the value 
> space (e.g. by setting minimal and maximal values), the type definer may 
> indirectly also eliminate lexical forms from the lexical space; conversely, 
> by eliminating some items from the lexical space (e.g. by setting a pattern), 
> the type definer may eliminate items from the value space. 
> Are there crucial aspects of RDF which will break if the list item quoted 
> above is changed to read "paired with one or more members of the lexical 
> space"? 
> ]]
> 
> We decided:
> [[
> PROPOSED to clarify xmlsch-03 xmlsch-04 pfps-13
>   based on the proposal to close in
>     http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0368.html
> ]]
> i.e.
> [[
> PROPOSE
>  - xmlsch-03 - we globally use the term lexical-to-value mapping instead of 
> datatype mapping or any other term
>  - xmslch-04 - we do not change the definition of value space but add a note 
> clarifying the relationship with XML Schema datatypes.
> ]]
> 
> The new text can be found in the editors draft at:
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatypes
> and reads:
> [[
> 5. Datatypes (Normative)
> 
>  The datatype abstraction used in RDF is compatible with the abstraction used 
> in XML Schema Part 2: Datatypes [XML-SCHEMA2].
> 
>  A datatype consists of a lexical space, a value space and a lexical-to-value 
> mapping. 
> 
> The lexical space of a datatype is a set of Unicode [UNICODE] strings.
> 
>  The lexical-to-value mapping of a datatype is a set of pairs whose first 
> element belongs to the lexical space of the datatype, and the second element 
> belongs to the value space of the datatype: 
> 
> Each member of the lexical space is paired with (maps to) exactly one member 
> of the value space. 
> Each member of the value space may be paired with any number (including zero) 
> of members of the lexical space (lexical representations for that value). 
> 
> A datatype is identified by one or more URI references. 
> 
> RDF may be used with any datatype definition that conforms to this 
> abstraction, even if not defined in terms of XML Schema. 
> 
> Certain XML Schema built-in datatypes are not suitable for use within RDF. For 
> example, the QName datatype requires a namespace declaration to be in scope 
> during the mapping, and is not recommended for use in RDF. [RDF-SEMANTICS] 
> contains a more detailed discussion of specific XML Schema built-in 
> datatypes. 
> 
> 
> Note: When the datatype is defined using XML Schema: 
> 
> + All values correspond to some lexical form, either using the 
> lexical-to-value mapping of the datatype or if it is a union datatype with a 
> lexical mapping associated with one of the member datatypes. 
> + XML Schema facets remain part of the datatype and are used by the XML Schema 
> mechanisms that control the lexical space and the value space; however, RDF 
> does not define a standard mechanism to access these facets.
> ]]

Thank you; this looks better.


> xmlsch-05 2.3. Lexical forms, strings, and character sequences
> +++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> Your comment:
> [[
> 2.3. Lexical forms, strings, and character sequences (schema-related, 
> editorial)
> In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: 
> With one exception, the datatypes used in RDF have a lexical space consisting 
> of a set of strings. 
> Since "string" is used as the local name for a particular simple type in the 
> XML Schema namespace, we believe it will be less confusing for users, in the 
> long run, if the lexical representations of simple-datatype values are 
> described not as "strings" but as "character sequences". 
> This comment also applies to other uses of the term string to denote the 
> members of a lexical space.
> ]]
> 
> RESOLVED: do not accept xmlsch-05
> Rationale:
> It feels like a fairly extensive editorial change. Also in the semantic web
> activity documents xsd:string is always refered to in its qualified form, and
> so the possible confusion is diminished.

Thank you; we believe you are making a mistake but we will not insist
on our suggestion.


> xmlsch-06 Strings for natural-language data
> +++++++++++++++++++++++++++++++++++
> 
> Your comment:
> [[
> 2.4. Strings for natural-language data (substantive)
> In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: 
> 
> 
> A plain literal is a string combined with an optional language identifier. 
> This should be used for plain text in a natural language. As recommended in 
> the RDF formal semantics [RDF-SEMANTICS], these plain literals are 
> self-denoting. 
> We do not believe that simple strings are likely to be adequate for the 
> representation of arbitrary natural-language text. Even in English, 
> natural-language utterances (such as this document) may need some degree of 
> inline markup for clarity and adequate presentation; in natural-language 
> utterances requiring bidirectional display or ruby, the best authorities 
> (including the W3C I18n Working Group) recommend the use of markup within the 
> natural-language utterance. We thus suggest that you may wish to moderate 
> this recommendation that natural-language material be represented by 
> literals.
> This is not an area in which we claim particular technical expertise; we 
> merely call it to your attention in the hopes that doing so may be useful to 
> you.
> ]]
> 
> RESOLVED: to accept xmlsch-06, with revised wording as noted
> [[
> A plain literal is a string combined with an optional language
>          identifier. This may be used for plain text
>          in a natural language. As recommended in the RDF formal semantics
>          [RDF-SEMANTICS], these plain literals are self-denoting.
> ]]
> after other changes the text now reads:

Thank you. This wording is better.

We  believe (again, we claim no special expertise here and would defer
to the views of the Internationalization Working Group) that you might
usefully add a health warning here. For example

  ...  This  may,  if  necessary, be used for plain text in a natural
  language,  but in general this is not recommended; natural language
  is usually better represented with a more elaborate structure....

We  hope that you will be persuaded to add a health warning, but we do
not believe this point is worth registering a formal dissent for.


...
> 
> Please reply to this email, copying www-rdf-comments@w3.org indicating
> whether these decisions are acceptable (please clearly identify those which 
> are not).
> 
> Jeremy on behalf of RDF Core WG
> 
>
Received on Friday, 3 October 2003 16:31:21 UTC