- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Mon, 30 Jun 2003 23:10:44 +0300
- To: www-rdf-comments@w3.org, cmsmcq@acm.org, w3c-xml-schema-ig@w3.org
This is a combo reply to the following points in your message: http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0489.html http://www.w3.org/XML/Group/2003/03/xml-schema-rdf-notes.html 1.1. Design question, complexity (substantive) 1.2. Whitespace handling (schema-related) 2.1. Mapping from lexical forms to values (schema-related, terminological) 2.2. Values without lexical forms (schema-related, important) 2.3. Lexical forms, strings, and character sequences (schema-related, editorial) 2.4. Strings for natural-language data (substantive) 2.5. Typos and minor editorial notes While you made the first two comments against the RDF primer, the RDF Core WG took them as against our design, and it fell to the concepts editors to lead the group's efforts to address them. We assigned issue identifiers as follows: 1.1. Design question, complexity (substantive) http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-01 1.2. Whitespace handling (schema-related) http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-02 2.1. Mapping from lexical forms to values (schema-related, terminological) http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-03 2.2. Values without lexical forms (schema-related, important) http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-04 2.3. Lexical forms, strings, and character sequences (schema-related, editorial) http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-05 2.4. Strings for natural-language data (substantive) http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-06 2.5. Typos and minor editorial notes No id, considered by myself alone. === The resolutions for the first two issues are found in our minutes of the 9th May: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0138 The resolution for issues xmlsch-03 xmlsch-04 are found in our minutes of the 2nd May http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0031 The resolution for issues xmlsch-05 and xmlsch-06 are found in our minutes of the 16th May. http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0199 The latest editors draft, which has all the last call issues addressed is: http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/ The changes sections and the IDs mentioned may help you see how your comments have helped us. === Blow by blow account: xmlsch-01 1.1. Design question, complexity (substantive) ++++++++++++++++++++++++++++++++++++++++++++++ you said: [[ 1.1. Design question, complexity (substantive) The introduction of pairs consisting of a lexical form and a type (or, strictly speaking, a lexical form and a type label) seems at first glance to complicate the RDF model somewhat. We have had the impression that in other parts of RDF, typing is handled by adding further arcs and nodes. If the type of a resource is identified by having an arc labeled rdf:type from it to (the URI of) its (RDF) type, and if the type of an arc is similarly identified by an arc, then surely a reason ought to be given for shifting to a different method for typing literal strings. It seems like a dramatic shift in the infrastructure of RDF, from "everything is a node, an arc, or a literal value" to "everything is a node, an arc, or a typed literal value". Perhaps not quite so dramatic, after all. But the question of design consistency remains: why not "everything is a typed node, a typed arc, or a typed literal"? ]] Our resolution is: xmlsch-01 as in 0252 with amendment. i.e. [[ The RDF Core WG interprets this comment as two questions and a comment: 1) Why is the type of a literal not described using a property arc, as is done for other literals? 2) Having introduced typed literal nodes, why not introduce typed resource nodes and typed property arcs as well 3) The WG should provide a rationale for this design in the specifications Regarding question 1: This would require that literals be allowed as subjects of RDF statements. This is not possible in current RDF/XML and would require considerable change, beyond the scope of the WG, to support it. Further it introduces problems of non-monotonicity in the semantics. A property whose value is plain literal is currently taken to denote a sequence characters. Adding a further statement could change that value to, say an integer, invalidating previous inferences and breaking a fundamental tenet of RDF. Regarding question 2: No requirement justified a change to the notion of a URIREF node or an RDF arc. Regarding comment 3: Providing a rationale document to accompany the specifications would certainly be nice to have, but the working group chose to spend its writing resource on explanatory text and formal specification rather than justification. We reject this comment on the grounds that the specifications are not intended to provide a rationale. ]] xmlsch-02 1.2. Whitespace handling (schema-related) +++++++++++++++++++++++++++++++++++++++++++ you wrote: [[ 1.2. Whitespace handling (schema-related) Some members of the XML Schema WG have expressed concern that XML Schema's rules for whitespace handling may interfere with expected behavior in other contexts. This may be the appropriate place to bring this question up. In brief, XML Schema's simple types each define a whitespace facet, which governs the kind of whitespace pre-processing done by an XML Schema processor before the lexical form is checked for type validity. Since the point of whitespace normalization is to simplify subsequent processing, the lexical spaces of XML Schema's simple types are (like those in many programming languages) defined without reference to the preceding whitespace normalization. Integers, for example, are represented by sequences of decimal digits; sequences containing blanks are not legal lexical forms for integers. Indeed, strictly speaking it is only after the whitespace pre-processing is done that the XML Schema processor can be said to be working with a lexical form at all. For example, the integer type has a value of collapse for the whitespace facet, which means leading and trailing whitespace is stripped, and internal whitespace sequences are reduced to a single blank (x20) character. In an XML document in which the element exterms:age is defined as having type xs:integer, the following instances of exterms:age will all be type-valid: <exterms:age>27</exterms:age> <exterms:age> 27 </exterms:age> <exterms:age> 27 </exterms:age> <exterms:age> 2<!--* ha, ha, fooled your full-text indexer! *-->7 </exterms:age> The input information set, in each case, contains a character information item for "2" followed by a character information item for "7", with character information items for whitespace characters, and a comment information item, present in some of the examples. In all cases, the lexical form proper is the character sequence "27" (i.e. the sequence of characters after white space handling, and ignoring comments, processing instructions, entity boundaries, and other distractions). This is a legal lexical form for an integer, so all the examples are type valid. Some members of the XML Schema WG have worried that it may not be obvious that the whitespace processing is not part of the process of checking lexical forms for type validity, but part of the process of extracting the lexical forms from the XML information set presented to the processor. If an RDF document contains <exterms:age> 27 </exterms:age> and a processor hands the contents of the element to a generic type-checker for XML Schema's simple types, saying in effect "this purports to be the lexical form of an integer; is that OK?", that type checker will be required (if it conforms to the XML Schema spec's definition of the simple types) to say "no, the character sequence ' 27 ' is not a legal lexical form for an integer." It's not clear whether RDF, being type-system neutral, can directly address this concern (e.g. by specifying that an RDF processor should do the appropriate whitespace pre-processing, or by warning users that they should not include vagrant whitespace in typed literals), or whether it suffices for developers of RDF software with built-in support for XML Schema's simple types to deal with it, e.g. by performing it themselves before handing the resulting lexical form to a type checker. As noted, some members of our WG feel that you need to be alerted to this as a possible source of confusion and unexpected results. Other members of the WG feel that it verges on disrespect to assume that you need instruction on this point. We compromised by agreeing to point out the issue to you, and to leave you to draw your own conclusions. ]] The RDF Core WG resolved: xmlsch-02 addressed by msg-0097 where msg-0097 is http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0097.html and says *** PROPOSE RDF Core accepts the comment xmlsch-02 and agree to add the following test case: <rdf:Description rdf:about="http://www.example.org/a"> <eg:prop rdf:datatype="&xsd;int">3</eg:prop> </rdf:Description> Does not entail <rdf:Description rdf:about="http://www.example.org/a"> <eg:prop rdf:datatype="&xsd;int"> 3 </eg:prop> </rdf:Description> Moreover the following comment to be added to concepts: [[ NOTE: In [XML Schema (part 1)], white space normalization occurs during validation according to the value of the whiteSpace facet. The lexical-to-value mapping used in RDF datatyping occurs after this, so that the whiteSpace facet has no effect in RDF datatyping. ]] *** In fact more test cases were desired, and the test cases created are currently awaiting final WG approval and can be found in: http://www.w3.org/2000/10/rdf-tests/rdfcore/xmlsch-02/ The Manifest file describes four tests showing that:: + A well-formed typed literal is not related to an ill-formed literal. Even if they only differ by whitespace. + A simple test for well-formedness of a typed literal. + An integer with whitespace is ill-formed. The actual text corresponding to the agreed note is found at the end of section 5 http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatypes a certain amount of editorial descretion was taken to consolidate notes concerning your comments. The full note from the editors draft is: [[ Note: When the datatype is defined using XML Schema: ... + In [XML-SCHEMA1], white space normalization occurs during validation according to the value of the whiteSpace facet. The lexical-to-value mapping used in RDF datatyping occurs after this, so that the whiteSpace facet has no effect in RDF datatyping. ]] xmlsch-03 2.1. Mapping from lexical forms to values +++++++++++++++++++++++++++++++++++++++++ xmlsch-04 2.2. Values without lexical forms +++++++++++++++++++++++++++++++++++ You wrote: [[ 2.1. Mapping from lexical forms to values (schema-related, terminological) In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype: We agree that it is useful to define a term to denote such mappings; in the interests of inter-specification consistency, we wonder whether you would be willing to consider using the term lexical mapping, which we are introducing in our forthcoming draft of XML Schema 1.1. The term datatype mapping seems unlikely to be usable in the XML Schema specification, where it would suggest to some readers a mapping from one datatype to another, rather than as here a mapping from lexical space to value space. (XML Schema 1.0 got by without a term for this concept.) 2.2. Values without lexical forms (schema-related, important) In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: Each member of the value space may be paired with any number (including zero) of members of the lexical space (lexical representations for that value). The provision for values without corresponding lexical forms contradicts an assumption to which the XML Schema spec appeals from time to time. The lexical space of any simple datatype in XML Schema is the domain of the type's lexical mapping; the value space is its domain. There are no meaningless lexical forms in the lexical space of the type, nor are there ineffable values in the value space. By eliminating values from the value space (e.g. by setting minimal and maximal values), the type definer may indirectly also eliminate lexical forms from the lexical space; conversely, by eliminating some items from the lexical space (e.g. by setting a pattern), the type definer may eliminate items from the value space. Are there crucial aspects of RDF which will break if the list item quoted above is changed to read "paired with one or more members of the lexical space"? ]] We decided: [[ PROPOSED to clarify xmlsch-03 xmlsch-04 pfps-13 based on the proposal to close in http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0368.html ]] i.e. [[ PROPOSE - xmlsch-03 - we globally use the term lexical-to-value mapping instead of datatype mapping or any other term - xmslch-04 - we do not change the definition of value space but add a note clarifying the relationship with XML Schema datatypes. ]] The new text can be found in the editors draft at: http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatypes and reads: [[ 5. Datatypes (Normative) The datatype abstraction used in RDF is compatible with the abstraction used in XML Schema Part 2: Datatypes [XML-SCHEMA2]. A datatype consists of a lexical space, a value space and a lexical-to-value mapping. The lexical space of a datatype is a set of Unicode [UNICODE] strings. The lexical-to-value mapping of a datatype is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype: Each member of the lexical space is paired with (maps to) exactly one member of the value space. Each member of the value space may be paired with any number (including zero) of members of the lexical space (lexical representations for that value). A datatype is identified by one or more URI references. RDF may be used with any datatype definition that conforms to this abstraction, even if not defined in terms of XML Schema. Certain XML Schema built-in datatypes are not suitable for use within RDF. For example, the QName datatype requires a namespace declaration to be in scope during the mapping, and is not recommended for use in RDF. [RDF-SEMANTICS] contains a more detailed discussion of specific XML Schema built-in datatypes. Note: When the datatype is defined using XML Schema: + All values correspond to some lexical form, either using the lexical-to-value mapping of the datatype or if it is a union datatype with a lexical mapping associated with one of the member datatypes. + XML Schema facets remain part of the datatype and are used by the XML Schema mechanisms that control the lexical space and the value space; however, RDF does not define a standard mechanism to access these facets. ]] xmlsch-05 2.3. Lexical forms, strings, and character sequences +++++++++++++++++++++++++++++++++++++++++++++++++++ Your comment: [[ 2.3. Lexical forms, strings, and character sequences (schema-related, editorial) In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: With one exception, the datatypes used in RDF have a lexical space consisting of a set of strings. Since "string" is used as the local name for a particular simple type in the XML Schema namespace, we believe it will be less confusing for users, in the long run, if the lexical representations of simple-datatype values are described not as "strings" but as "character sequences". This comment also applies to other uses of the term string to denote the members of a lexical space. ]] RESOLVED: do not accept xmlsch-05 Rationale: It feels like a fairly extensive editorial change. Also in the semantic web activity documents xsd:string is always refered to in its qualified form, and so the possible confusion is diminished. xmlsch-06 Strings for natural-language data +++++++++++++++++++++++++++++++++++ Your comment: [[ 2.4. Strings for natural-language data (substantive) In http://www.w3.org/TR/rdf-concepts/#section-Datatypes: A plain literal is a string combined with an optional language identifier. This should be used for plain text in a natural language. As recommended in the RDF formal semantics [RDF-SEMANTICS], these plain literals are self-denoting. We do not believe that simple strings are likely to be adequate for the representation of arbitrary natural-language text. Even in English, natural-language utterances (such as this document) may need some degree of inline markup for clarity and adequate presentation; in natural-language utterances requiring bidirectional display or ruby, the best authorities (including the W3C I18n Working Group) recommend the use of markup within the natural-language utterance. We thus suggest that you may wish to moderate this recommendation that natural-language material be represented by literals. This is not an area in which we claim particular technical expertise; we merely call it to your attention in the hopes that doing so may be useful to you. ]] RESOLVED: to accept xmlsch-06, with revised wording as noted [[ A plain literal is a string combined with an optional language identifier. This may be used for plain text in a natural language. As recommended in the RDF formal semantics [RDF-SEMANTICS], these plain literals are self-denoting. ]] after other changes the text now reads: Finally you made the following minor editorial comments: [[ In http://www.w3.org/TR/rdf-concepts/#section-Literal-Value, for "the datatype mapping is applied to the pair form by the lexical form and the language identifier" read "the datatype mapping is applied to the pair formed by the lexical form and the language identifier". ]] Text has vanished in other changes. [[ In the same section, for "Such a case, while in error, is not syntacticly ill-formed " read "Such a case, while in error, is not syntactically ill-formed" (et passim). ]] done. [[ In section http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral, for "root element tag" read "root element". ]] this text has gone, however new text with start tag and end tag is now in place specifically: "when embedded between an arbitrary XML start tag and an end tag form a document" [[ In the same section, for "XML element content" read "XML data" (the term element content is used in some markup-related specs as a complement of mixed content to denote the content of elements which can contain other elements but cannot contain parsed character data). ]] done. Thank you for all your comments, and your detailed review. They have been very helpful. Please reply to this email, copying www-rdf-comments@w3.org indicating whether these decisions are acceptable (please clearly identify those which are not). Jeremy on behalf of RDF Core WG
Received on Monday, 30 June 2003 17:10:58 UTC