Re: more problems with closures from pat hayes on 2003-06-02 (www-rdf-comments@w3.org from April to June 2003)

From: pat hayes <phayes@ai.uwf.edu>
Date: Mon, 2 Jun 2003 11:47:27 -0500
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
Cc: www-rdf-comments@w3.org
Message-Id: <p05210624bb012a5eaf78@[10.0.100.24]>
>From: pat hayes <phayes@ai.uwf.edu>
>Subject: Re: more problems with closures
>Date: Sun, 1 Jun 2003 19:59:12 -0500
>
>[...]
>
>
>>  >Also, as the canonical form of an XML document is some sort of string,
>>
>>  That is unfortunately a controversial claim. On some views of the
>>  matter, XML documents and strings are distinct classes.  Therefore,
>>  the MT deliberately allows the possibility that XML documents and the
>>  character strings of plain literals can be distinct, so the following
>>  entailment is not considered to be valid without some other
>>  antecedents. In a word: plain literals and XML literals might be
>>  disjoint sets in some interpretations.
>
>
>>From http://www.w3.org/TR/rdf-concepts/
>
>5. XML Content within an RDF Graph (Normative)
>
>RDF provides for XML content as a possible literal value. This typically
>originates from the use of rdf:parseType="Literal" in the RDF/XML Syntax
>[RDF-SYNTAX].

Notice the term of art 'XML content'. That has been chosen to be 
noncommittal about exactly what that IS.

>
>Such content is indicated in an RDF graph using a typed literal whose
>datatype is a special built-in datatype, rdf:XMLLiteral.
>
>As part of the definition of this datatype, an ancillary definition is used.
>
>The XML document corresponding to a pair ( str, lang ) is formed as follows:
>
>Concatenate the five strings:
>
>    1. "<rdf-wrapper xml:lang='"
>    2. lang
>    3. "'>"
>    4. str
>    5. "</rdf-wrapper>"
>
>Encode the resulting Unicode string in UTF-8 to form the 
>corresponding XML document.

Notice the term of art "XML document". That might or might not be 
identifiable with a character string.

>No escaping is applied. The choice of rdf-wrapper is fixed but arbitrary.
>
>The XML document corresponding to a string str is formed as the XML
>document corresponding to the pair (str, "").
>
>Using this, the datatype rdf:XMLLiteral is defined as follows.
>
>The datatype URI
>     is http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral.
>The value space
>     is the set of all XML documents that:
>
>         * Have root element tag: <rdf-wrapper>
>         * Have no attributes on the root element other than xml:lang
>         * are Canonical XML [XML-C14N] (with comments).
>
>The lexical space
>     contains all pairs ( string, lang ) where lang is any language
>     identifier [RFC-3066] in lowercase, and string is well-balanced,
>     self-contained XML element content [XML], for which the XML document
>     corresponding to the pair is a well-formed XML document [XML] that also
>     conforms to XML Namespaces [XML-NS].
>     also contains all strings string which are well-balanced,
>     self-contained XML element content [XML], and for which the
>     corresponding XML document is a well-formed XML document [XML] that
>     also conforms to XML Namespaces [XML-NS].
>The mapping
>     is defined as the function that maps a pair or string to the canonical
>     form [XML-C14N] (with comments) of the corresponding XML document.
>
>
>
>6.5 RDF Literals
>
>A literal in an RDF graph contains three components called:
>
>     * The lexical form being a Unicode [UNICODE] string in Normal 
>Form C [NFC].
>     * The language identifier as defined by [RFC-3066], normalized 
>to lowercase.
>     * The datatype URI being an RDF URI reference.
>
>The lexical form is present in all RDF literals; the language identifier
>and the datatype URI may be absent from an RDF literal.
>
>A plain literal is one in which the datatype URI is absent.
>
>
>It sure looks to me as if XML Literals and plain literals have an
>intersecting value space.

I agree, it looks that way to me too, and to some other members of 
the RDF WG; but to some members of the XML WG, and others in the XML 
community, it apparently does not look that way, and some members of 
the RDF WG feel sympathetic to the other interpretation. Rather than 
take sides on this apparently deeply contentious issue - which, 
between ourselves, seems to me to be rooted in clashing philosophies 
of mathematics - I would prefer to have the MT be agnostic on the 
matter. There are those, for example, who assert with vehemence that 
Unicode character strings in plain literals must be considered to be 
distinct from elements of the value space of xsd:string; so the MT 
does not support any entailment of the form

aaa ppp "foo" .
|=
aaa ppp "foo"^^xsd:string .


>This is reinforced by
>http://www.w3.org/TR/REC-xml
>
>2 Documents
>
>[Definition: A data object is an XML document if it is well-formed, as
>defined in this specification. A well-formed XML document may in addition
>be valid if it meets certain further constraints.]
>
>2.1 Well-Formed XML Documents
>
>[Definition: A textual object is a well-formed XML document if:]
>
>    1.  Taken as a whole, it matches the production labeled document.
>
>[ A whole bunch of wording and grammar that all bottom out to the fact that
>   a document is a sequence of Unicode characters. ]
>

None of this prose is relevant to the other point of view, since the 
identity of a thing is not, on that view, considered in isolation; 
but rather is seen to be a function of the inherent 'type' it is 
viewed as being. On this view, for example, the real number zero and 
the integer zero are distinct entities, and maybe even the 
double-length real number zero and the octal number zero.  Look, 
don't shoot the messenger: I'm just telling you what they say. This 
is the way that many 'strongly typed' systems work (eg Specware) and 
it is also justified by topos theory (which has been touted as a 
rival to set theory for FOM work, as I expect you know), where one 
classifies things in terms of morphisms and categories rather than by 
using sets. On this view, all of the set-theoretical way of talking 
that we find so natural (eg a relation is a set of pairs, that kind 
of thing) is artificial and ontologically suspect, and the 'realist' 
idea that things in sets just are what they are, is ridiculous. On 
this other view, things have no identity in themselves: they are 
always seen as being of some type, and the type that they are viewed 
as being makes them distinct from anything (even the 'same' thing) of 
any other type. So the fact that an XML document is defined as being 
a Unicode character string is NOT, on this view, sufficient for one 
to conclude that an XML document can actually be *identical* to the 
corresponding Unicode character string: merely by describing it as an 
XML document, you have thereby automatically given it an XML 
identity which renders it distinct from the (isomorphic, but 
non-identical) non-XML string.

I really don't want to get into this debate. If people want to use 
RDF to describe a strongly typed vision of the universe, that is fine 
with me. My general semantic philosophy is, when faced with genuine 
controversy, find a way to be agnostic rather than take sides. So, in 
short, the RDF MT does not support any identities which can be 
inferred merely from a 'social' reading of English prose in a 
specification document. The specification has to explicitly say what 
is identical to what; and the XML Schema part 2 spec is quite clear 
that the value spaces of distinct datatypes are non-overlapping.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola              			(850)202 4440   fax
FL 32501           				(850)291 0667    cell
phayes@ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes
s.pam@ai.uwf.edu   for spam
Received on Monday, 2 June 2003 12:47:30 UTC