Re: The X Datatype Proposal from Pat Hayes on 2001-11-13 (w3c-rdfcore-wg@w3.org from November 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Tue, 13 Nov 2001 17:44:04 -0600
To: Patrick.Stickler@nokia.com
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101037b81713d40390@[65.212.118.147]>
>           Definition of X Proposal, with examples

....

>GLOSSARY OF TERMS
>
>value space
>
>         An abstract set of entities sharing common properties
>         (very loose definition)
>
>value
>
>         A member of a value space
>
>representation space
>
>         A set of concrete representations mapping to values in a
>         value space which facilitate automated operations
>         in terms of those values -- e.g. the reification of
>         a value space within an computer system

If I follow you, this is what I was calling a datype mapping, ie a 
mapping from a domain of lexical literal forms into a set of literal 
values; an example might be the standard mapping from decimal 
numerals to natural numbers, right?

>representation
>
>         Within a representation space, a concrete representation
>         of a value in the corresponding value space
>
>canonical representation space
>
>         A representation space where each value in the value space
>         has only one possible representation in the representation
>         space (the internal representation space of a computer system
>         is a canonical representation space)
>
>lexical space
>
>         An set of concrete lexical representations (strings) which
>         represent values in a specific value space, defined in
>         terms of a lexical grammar
>
>lexical form
>
>         Within a lexical space, a concrete lexical representation
>         (string) of a value in the corresponding value space, which
>         is valid according to the defined lexical grammar
>
>canonical lexical space
>
>         A lexical space where each value in the value space
>         has only one possible representation in the lexical space

I fail to follow the distinction between 'representation' and 
'lexical' in your usage.

>data type
>
>         An explicit lexical space whose members map to
>         values in an explicit value space
>
>(RDF) literal
>
>         A string
>
>typed (RDF) literal
>
>         A lexical form
>
>local type
>
>         A data type associated directly with an occurrence of a
>         value serving as the object of a statement

1. I do not know what 'associated directly' means.
2. Why is the datype - a *lexical* space - associated with the 
occurrence of a *value* ?? I find this puzzling. For example, suppose 
the value in question is the prime number 37. What would it mean to 
associate a lexical space with 37?

>global type
>
>         A data type associated globally with all occurrences of a
>         value serving as the object of a statement having a particular
>         predicate (i.e. via an rdfs:range definition)
>
>descriptive range
>
>         A range definition for a particular predicate defining a global
>         type for all values of that predicate
>
>prescriptive range
>
>         A range constraint for a particular predicate defining a global
>         type which all local types for all values must be equivalent to
>         (either identical to, or a subclass of, the defined range class)

I see no difference here between prescriptive and descriptive. The 
former seems to be the same as the latter with the provisio added 
that everything must be consistent; but that is a vacuous condition 
in an assertional language.

>node
>
>         The basic construct of an RDF graph, per this proposal
>
>node facet
>
>         A primitive property of a graph node serving as the
>         label of an arc

?? What about two different arcs coming out of a single node? I don't 
see any utility to this idea of a 'facet'.

>
>arc
>
>         A named relation between two nodes, from the
>         perspective of one node (source node) towards the
>         other (target node), corresponding to a facet
>
>LNode
>
>         A node representing a resource labeled by an RDF Literal
>
>UNode
>
>         A node representing a resource labeled by a URI Reference
>
>SNode
>
>         A node representing an RDF Statement

Interesting, I was not aware there were any such nodes.

>
>BNode
>
>         A node representing an anonymous resource with no label
>
>qualifying statement
>
>         A statement where the subject is represented by an SNode
>
>statement qualification
>
>         A limitation on the applicability of a statement for
>         certain processes; such as scope, source, authority,
>         or authentication
>
>literal match
>
>         The binding of a statement to a query where the statement and
>         query are expressed in the same vocabulary and in terms of the
>         same data typing scheme

We don't really have any notion of 'query' yet, other than in terms 
of entailment.

>inferred match
>
>         The binding of a statement to a query where the statement and
>         query are not expressed in the same vocabulary and/or in terms
>         of the same data typing scheme but which are deemed equivalent
>         according to rdfs:subClassOf or rdfs:subPropertyOf relations
>         between those vocabularies
>
>level 0 graph
>
>         A maximal representation of an X Proposal graph
>         where every node from every statement is distinct, and
>         having no compression whatsoever
>
>level 1 merge
>
>         A transformation on a level 0 graph such that all UNodes with
>         identitical uriref labels and SNodes where subject, predicate,
>         and object nodes are all UNodes with identitical uriref labels
>         respectively are merged
>
>level 1 graph
>
>         A graph which is derived from a level 0 graph by means
>         of a level 1 merge, either virtually or destructively
>
>======================================================================
>
>PROPOSAL IN A NUTSHELL
>
>Assumptions and Assertions:
>
>The representation and interpretation of data types should be:
>a. consistent
>b. explicitly defined by the RDF specification
>c. as neutral as possible with regards to data type scheme
>d. compatible with XML Schema data types
>
>The solution adopted must:
>a. not deviate significantly from the present specification, either
>    with regards to XML serialization or graph representation
>b. be sufficiently future proof to allow for extension to address
>    known or future issues with minimal impact to existing systems
>
>No interpretation of data types will be provided by RDF. Any
>interpretation of RDF encoded knowledge based on a defined correlation
>between an RDF node and a particular data type is application
>specific and beyond the scope of RDF.  RDF will only concern itself
>with the specification of relationships between nodes and types,
>and the preservation of such information for interpretation in
>contexts outside the scope of RDF, not the interpretation itself.
>
>Typed literals constitute lexical forms within a given lexical
>space and which map to values in a given value space.
>
>The proper interpretation of a typed literal requires both the
>lexical form and the identity of the lexical and value space for
>which the lexical form is expressed.

It also requires the mapping between them; what you called the 
representation space and I earlier called the datatype mapping.

>Separation of a lexical form from either the lexical space or
>value space for which it was originally expressed renders it
>uninterpretable in a reliable manner.

That isn't obvious.

>The rdfs:range property may function as either prescriptive
>or descriptive, depending on the presence or absence of a local
>type for the object of a statement.

Again, I fail to see the meaning of this distinction.

>In order for rdfs:range to function prescriptively, there must
>be both:
>a. a range value defined for the property of a statement
>b. a local type defined for the object of the statement
>
>In the absence of a local type, and in the presence of a range
>definition for a given property, the type of the object of a statement
>is taken to be that defined as the range of the property.

And in the presence of a local type, it is taken to be the local 
type, provided that is consistent with the range statement, right? 
The inferences involved are the same in both cases: all the 
information that can be obtained about the datatype of the literal, 
by any means, local or global, is combined, provided it is 
consistent. (If it isn't consistent, something is wrong. )

>
>Query processes, while not explicitly defined by the RDF specification,
>should be taken into account with regards to the representation and
>interpretation of RDF encoded knowledge.
>
>Query processes which employ inference based on rdfs:subPropertyOf
>relations may bind objects to predicates which are superordinate to
>the predicate of the original statement.
>
>Query processes which employ inference based on rdfs:subClassOf
>relations may bind literals to types which are superordinate to
>the type originally defined for the literals.
>
>Query processes which bind a non-locally typed literal to a superordinate
>predicate different from that of the original statement and which
>may have a range defined which differs from the range defined
>for the original predicate effectively separate the lexical form
>embodied in that literal from the lexical space for which it was
>originally expressed, rendering it uninterpretable in a reliable
>manner.

Again,  that begs some important questions.

>
>Query processes which bind a locally typed literal to a superordinate
>type different from that originally defined for the literal effectively
>separate the lexical form embodied in that literal from the lexical
>space for which it was originally expressed, rendering it uninterpretable
>in a reliable manner.
>
>----------------------------------------------------------------------
>
>Conclusions:
>
>In the absence of a local type, range may be descriptive.
>
>In the absence of a local type, range cannot be prescriptive.
>
>In the presence of a local type, range may be prescriptive.
>
>We MUST impose the requirement that all data type classes
>define a value space that is a proper subset of the value
>space of all superordinate data type classes.
>
>We CANNOT impose the requirement that all data type classes
>define a lexical space that is a proper subset of the lexical
>space of all superordinate data type classes.
>
>The reliable interpretation of non-locally typed literals
>by rdfs:range definitions requires the absolute persistent
>preservation of the binding between predicate and object per the
>original statement.
>
>The reliable interpretation of locally typed literals
>requires the absolute persistent preservation of the binding
>between object and type per the original statement.
>
>----------------------------------------------------------------------
>
>Proposed Solution:
>
>The basis for the graph representation, and all operations and
>interpretations, should be the explicit reification of the
>statement.

NO!!  I refuse to have anything to do with a proposal that requires 
global reification just to handle literals. It is unworkable, 
impossibly baroque, incompatible with all known uses of RDF 
(including DAML ) and with XML, and semantically confused.

>An RDF graph should represent the statements which
>constitute knowledge,

Quite.  Not statements that *describe* the statements that represent knowledge.

There is a well-known dodge referred to in Krep circles as 'escaping 
to the metalevel'. When things get awkward, just *describe the 
syntax* rather than trying to get the meaning straight.  Syntax is 
usually better-behaved than meanings, so it will be easier. However, 
this doesn't solve the problems, it just takes out a kind of 
intellectual loan. In order to be of actual inferential use, 
something is going to have to figure out what to actually DO with the 
expressions that you are now describing.

>and the present RDF graph model should be
>seen as a higher level resource-centric view or interpretation
>of that underlying statement-centric graph.
>
>Thus, rather than the present graph representation:
>
>    [urn:foo] --- urn:someProperty ---> "bar"
>
>we should have instead, for every statement, a canonical
>underlying representation as follows:
>
>       [ ]
>        |
>        ---- ID ----------> 1
>        |
>        ---- type --------> SNode
>        |
>        ---- subject -----> [ ]
>        |                    |
>        |                    ------ ID ------> 2
>        |                    |
>        |                    ------ type ----> UNode
>        |                    |
>        |                    ------ label ---> <urn:foo>
>        |
>        ---- predicate ---> [ ]
>        |                    |
>        |                    ------ ID ------> 3
>        |                    |
>        |                    ------ type ----> UNode
>        |                    |
>        |                    ------ label ---> <urn:someProperty>
>        |
>        -----object ------> [ ]
>                             |
>                             ------ ID ------> 4
>                             |
>                             ------ type ----> LNode
>                             |
>                             ------ label ---> "bar"

I rest my case.

Pat


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Tuesday, 13 November 2001 18:44:58 UTC