- From: pat hayes <phayes@ai.uwf.edu>
- Date: Mon, 2 Jun 2003 11:47:27 -0500
- To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- Cc: www-rdf-comments@w3.org
>From: pat hayes <phayes@ai.uwf.edu> >Subject: Re: more problems with closures >Date: Sun, 1 Jun 2003 19:59:12 -0500 > >[...] > > >> >Also, as the canonical form of an XML document is some sort of string, >> >> That is unfortunately a controversial claim. On some views of the >> matter, XML documents and strings are distinct classes. Therefore, >> the MT deliberately allows the possibility that XML documents and the >> character strings of plain literals can be distinct, so the following >> entailment is not considered to be valid without some other >> antecedents. In a word: plain literals and XML literals might be >> disjoint sets in some interpretations. > > >>From http://www.w3.org/TR/rdf-concepts/ > >5. XML Content within an RDF Graph (Normative) > >RDF provides for XML content as a possible literal value. This typically >originates from the use of rdf:parseType="Literal" in the RDF/XML Syntax >[RDF-SYNTAX]. Notice the term of art 'XML content'. That has been chosen to be noncommittal about exactly what that IS. > >Such content is indicated in an RDF graph using a typed literal whose >datatype is a special built-in datatype, rdf:XMLLiteral. > >As part of the definition of this datatype, an ancillary definition is used. > >The XML document corresponding to a pair ( str, lang ) is formed as follows: > >Concatenate the five strings: > > 1. "<rdf-wrapper xml:lang='" > 2. lang > 3. "'>" > 4. str > 5. "</rdf-wrapper>" > >Encode the resulting Unicode string in UTF-8 to form the >corresponding XML document. Notice the term of art "XML document". That might or might not be identifiable with a character string. >No escaping is applied. The choice of rdf-wrapper is fixed but arbitrary. > >The XML document corresponding to a string str is formed as the XML >document corresponding to the pair (str, ""). > >Using this, the datatype rdf:XMLLiteral is defined as follows. > >The datatype URI > is http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral. >The value space > is the set of all XML documents that: > > * Have root element tag: <rdf-wrapper> > * Have no attributes on the root element other than xml:lang > * are Canonical XML [XML-C14N] (with comments). > >The lexical space > contains all pairs ( string, lang ) where lang is any language > identifier [RFC-3066] in lowercase, and string is well-balanced, > self-contained XML element content [XML], for which the XML document > corresponding to the pair is a well-formed XML document [XML] that also > conforms to XML Namespaces [XML-NS]. > also contains all strings string which are well-balanced, > self-contained XML element content [XML], and for which the > corresponding XML document is a well-formed XML document [XML] that > also conforms to XML Namespaces [XML-NS]. >The mapping > is defined as the function that maps a pair or string to the canonical > form [XML-C14N] (with comments) of the corresponding XML document. > > > >6.5 RDF Literals > >A literal in an RDF graph contains three components called: > > * The lexical form being a Unicode [UNICODE] string in Normal >Form C [NFC]. > * The language identifier as defined by [RFC-3066], normalized >to lowercase. > * The datatype URI being an RDF URI reference. > >The lexical form is present in all RDF literals; the language identifier >and the datatype URI may be absent from an RDF literal. > >A plain literal is one in which the datatype URI is absent. > > >It sure looks to me as if XML Literals and plain literals have an >intersecting value space. I agree, it looks that way to me too, and to some other members of the RDF WG; but to some members of the XML WG, and others in the XML community, it apparently does not look that way, and some members of the RDF WG feel sympathetic to the other interpretation. Rather than take sides on this apparently deeply contentious issue - which, between ourselves, seems to me to be rooted in clashing philosophies of mathematics - I would prefer to have the MT be agnostic on the matter. There are those, for example, who assert with vehemence that Unicode character strings in plain literals must be considered to be distinct from elements of the value space of xsd:string; so the MT does not support any entailment of the form aaa ppp "foo" . |= aaa ppp "foo"^^xsd:string . >This is reinforced by >http://www.w3.org/TR/REC-xml > >2 Documents > >[Definition: A data object is an XML document if it is well-formed, as >defined in this specification. A well-formed XML document may in addition >be valid if it meets certain further constraints.] > >2.1 Well-Formed XML Documents > >[Definition: A textual object is a well-formed XML document if:] > > 1. Taken as a whole, it matches the production labeled document. > >[ A whole bunch of wording and grammar that all bottom out to the fact that > a document is a sequence of Unicode characters. ] > None of this prose is relevant to the other point of view, since the identity of a thing is not, on that view, considered in isolation; but rather is seen to be a function of the inherent 'type' it is viewed as being. On this view, for example, the real number zero and the integer zero are distinct entities, and maybe even the double-length real number zero and the octal number zero. Look, don't shoot the messenger: I'm just telling you what they say. This is the way that many 'strongly typed' systems work (eg Specware) and it is also justified by topos theory (which has been touted as a rival to set theory for FOM work, as I expect you know), where one classifies things in terms of morphisms and categories rather than by using sets. On this view, all of the set-theoretical way of talking that we find so natural (eg a relation is a set of pairs, that kind of thing) is artificial and ontologically suspect, and the 'realist' idea that things in sets just are what they are, is ridiculous. On this other view, things have no identity in themselves: they are always seen as being of some type, and the type that they are viewed as being makes them distinct from anything (even the 'same' thing) of any other type. So the fact that an XML document is defined as being a Unicode character string is NOT, on this view, sufficient for one to conclude that an XML document can actually be *identical* to the corresponding Unicode character string: merely by describing it as an XML document, you have thereby automatically given it an XML identity which renders it distinct from the (isomorphic, but non-identical) non-XML string. I really don't want to get into this debate. If people want to use RDF to describe a strongly typed vision of the universe, that is fine with me. My general semantic philosophy is, when faced with genuine controversy, find a way to be agnostic rather than take sides. So, in short, the RDF MT does not support any identities which can be inferred merely from a 'social' reading of English prose in a specification document. The specification has to explicitly say what is identical to what; and the XML Schema part 2 spec is quite clear that the value spaces of distinct datatypes are non-overlapping. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32501 (850)291 0667 cell phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes s.pam@ai.uwf.edu for spam
Received on Monday, 2 June 2003 12:47:30 UTC