Re: more problems with closures from Peter F. Patel-Schneider on 2003-06-02 (www-rdf-comments@w3.org from April to June 2003)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Mon, 02 Jun 2003 10:57:55 -0400 (EDT)
To: phayes@ai.uwf.edu
Cc: www-rdf-comments@w3.org
Message-Id: <20030602.105755.50044411.pfps@research.bell-labs.com>
From: pat hayes <phayes@ai.uwf.edu>
Subject: Re: more problems with closures
Date: Sun, 1 Jun 2003 19:59:12 -0500

[...]


> >Also, as the canonical form of an XML document is some sort of string,
> 
> That is unfortunately a controversial claim. On some views of the 
> matter, XML documents and strings are distinct classes.  Therefore, 
> the MT deliberately allows the possibility that XML documents and the 
> character strings of plain literals can be distinct, so the following 
> entailment is not considered to be valid without some other 
> antecedents. In a word: plain literals and XML literals might be 
> disjoint sets in some interpretations.


From http://www.w3.org/TR/rdf-concepts/

5. XML Content within an RDF Graph (Normative)

RDF provides for XML content as a possible literal value. This typically
originates from the use of rdf:parseType="Literal" in the RDF/XML Syntax
[RDF-SYNTAX]. 

Such content is indicated in an RDF graph using a typed literal whose
datatype is a special built-in datatype, rdf:XMLLiteral. 

As part of the definition of this datatype, an ancillary definition is used.

The XML document corresponding to a pair ( str, lang ) is formed as follows:

Concatenate the five strings:

   1. "<rdf-wrapper xml:lang='"
   2. lang
   3. "'>"
   4. str
   5. "</rdf-wrapper>"

Encode the resulting Unicode string in UTF-8 to form the corresponding XML document.

No escaping is applied. The choice of rdf-wrapper is fixed but arbitrary.

The XML document corresponding to a string str is formed as the XML
document corresponding to the pair (str, ""). 

Using this, the datatype rdf:XMLLiteral is defined as follows.

The datatype URI
    is http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral.
The value space
    is the set of all XML documents that:

        * Have root element tag: <rdf-wrapper>
        * Have no attributes on the root element other than xml:lang
        * are Canonical XML [XML-C14N] (with comments).

The lexical space
    contains all pairs ( string, lang ) where lang is any language
    identifier [RFC-3066] in lowercase, and string is well-balanced,
    self-contained XML element content [XML], for which the XML document
    corresponding to the pair is a well-formed XML document [XML] that also
    conforms to XML Namespaces [XML-NS]. 
    also contains all strings string which are well-balanced,
    self-contained XML element content [XML], and for which the
    corresponding XML document is a well-formed XML document [XML] that
    also conforms to XML Namespaces [XML-NS]. 
The mapping
    is defined as the function that maps a pair or string to the canonical
    form [XML-C14N] (with comments) of the corresponding XML document. 



6.5 RDF Literals

A literal in an RDF graph contains three components called:

    * The lexical form being a Unicode [UNICODE] string in Normal Form C [NFC].
    * The language identifier as defined by [RFC-3066], normalized to lowercase.
    * The datatype URI being an RDF URI reference.

The lexical form is present in all RDF literals; the language identifier
and the datatype URI may be absent from an RDF literal. 

A plain literal is one in which the datatype URI is absent.


It sure looks to me as if XML Literals and plain literals have an
intersecting value space.  This is reinforced by
http://www.w3.org/TR/REC-xml 

2 Documents

[Definition: A data object is an XML document if it is well-formed, as
defined in this specification. A well-formed XML document may in addition
be valid if it meets certain further constraints.] 

2.1 Well-Formed XML Documents

[Definition: A textual object is a well-formed XML document if:]

   1.  Taken as a whole, it matches the production labeled document.

[ A whole bunch of wording and grammar that all bottom out to the fact that
  a document is a sequence of Unicode characters. ]



peter
Received on Monday, 2 June 2003 10:58:05 UTC