RE: Cutting the Patrician datatype knot from Patrick.Stickler@nokia.com on 2001-11-22 (www-rdf-interest@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Thu, 22 Nov 2001 14:27:19 +0200
To: pfps@research.bell-labs.com, www-rdf-interest@w3.org
Cc: joint-committee@daml.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C0C8@trebe003.NOE.Nokia.com>
> -----Original Message-----
> From: ext Peter F. Patel-Schneider 
> [mailto:pfps@research.bell-labs.com]
> Sent: 21 November, 2001 20:34
> To: www-rdf-interest@w3.org
> Cc: joint-committee@daml.org
> Subject: Cutting the Patrician datatype knot
> 
> 
> Hi:
> 
> Here is my Thanksgiving turkey for you all. :-)


I think the WG is already "stuffed" with DT proposals ;-)


> Suppose one decided that nodes in an RDF graph were one of
> 	1/ URIs
> 	2/ blank nodes
> 	3/ data values
> 	4/ text (untidy)

Why not let text nodes be tidy if they don't map to any
values? I.e. if they don't themselves denote a resource,
then why worry if they have other interpretations in
other contexts? They're just strings in this case, right?
so go ahead and tidy them up.

> and that interpretations mapped
> 	1/ URIs into resources [as before]
> 	2/ blank nodes into ... [as before]
> 	3/ data values into themselves!
> 	4/ text into arbitrary literal values!
> 
> Then a datatype scheme for the model theory is quite simple, 
> 
> 	Let DT be a collection of datatypes.
> 	For d in DT let DTC(d) be a set, the extension of d.
> 
> The model theory for datatypes is also quite simple.
> 
> 	For d in DT  ICEXT(d) = DTC(d)
> 	For d in DT  ICEXT(rdfs:Literal) >= DTC(d)
> 
> 
> An RDF/XML serialization of an RDF graph element of the form
> 	< s , p , v > for v a data value
> is of the form
> 	<... s ...>
> 	  ...
> 	  <p xsi:type="du">x</p>
> 	  ...
> 	</...>

Unfortunately, this isn't legal RDF/XML. It'd have to
be something like:

     ...
     <p xsi:type="du" rdf:value="x"/>
     ...

which gives us the graph:

     s --p--> _:1 --xsi:type---> "foo:du"
                |
                ----rdf:value--> "x"

So essentially, this is the DAML idiom (and very similar
to the DC idiom) but using xsi:type instead of rdf:type, 
right?

Though, why use xsi:type rather than rdf:type? Are we saying
that a typed literal resource is a different kind of resource
than a typed non-literal resource, and hence the typing is
declared differently? Are we sure we want to say that? And
are we adopting the full semantics attributed by the XML Schema
spec to xsi:type? What are the implications for broader statements
about XML Schema constructs in general in RDF if we use it for
typing literal resources?

What is the difference between:

     <p xsi:type="foo:du" rdf:value="x"/>
     <p rdf:type="foo:du" rdf:resource="foo:du:x"/>

I.e. why would a literal resource be typed by an xsi:type property
when a non-literal resource is typed by an rdf:type property? In
both cases, it is the resource denoted by the node that bears
the typing property, so why not use the same mechanism? Do we
then also need to add an xsi:range in addition to rdfs:range?

Also, the xsi:type attribute value will be interpreted as a literal
not a URI Ref by an RDF parser. I.e. for the above you get

     s --p--> _:1 --xsi:type---> "foo:du"
     |          |
     |          ----rdf:value--> "x"
     |
     ----p--> <foo:du:x> --rdf:type--> <foo:du>

Note that in this case, the type value of xsi:type is treated
as something different than the value of rdf:type (then again,
maybe this is what you are trying to do...?)

If we use rdf:type instead of xsi:type, we get something
perhaps more consistent, both insofar as literal and
non-literal resources are concerned, as well as in the
treatment of type values as URI Refs by the RDF parser:

     <p rdf:type="foo:du" rdf:value="x"/>
     <p rdf:type="foo:du" rdf:resource="urn:foo:x"/>


     s --p--> _:1 --rdf:type--------------|
     |          |                         |
     |          ----rdf:value--> "x"      |
     |                                    v
     ----p--> <foo:du:x> --rdf:type--> <foo:du>

Both objects of the 'p' property are nodes denoting
values and both value nodes are typed, and the literal
resource value node has the extra information about
its lexical form, which is needed for literal resources
but not for non-literal resources -- but otherwise, it
is a consistent representation and consistent treatment
of data typing for both literal and non-literal resources.

Eh?

> Thus in the serialization we need access to the 
> lexical-to-value mapping,
> but not in the model theory.
> 
> An RDF/XML serialization of an RDF graph element of the form
> 	< s , p , t > for t some text
> is of the form
> 	<... s ...>
> 	  ...
> 	  <p>t'</p>
> 	  ...
> 	</...>
> where t' is the appropriate XML version of t.
> 
> 
> What is lost?  
> 
> Well, the ability to provide the lexical-to-data mapping once, as in
> 
> 	<Property rdf:about="age">
> 	  <rdfs:range rdf:resource="xsd:integer">
> 	</Property>
>
> and the related ability to do anything useful with
> 
> 	<Person>
> 	  <age>10</age>
> 	</Person>
 
How so? Since type is being ascribed to the object of a statement,
and thus to the node (not the literal), why doesn't rdfs:range 
work as expected?

I.e., the range constraint as defined above *implies* the
following knowledge

   <Person>
      <age rdf:type="xsd:integer" rdf:value="10"/>
   </Person>

even if it is not defined as such in the explicit statement.

Right?

And the actual assertion of implied statements based on rdfs:range
constraints could result in a modification of the graph itself
to accomodate the expanded, explicitly typed idiom.

Thus, the two idioms (DAML/DC and P respectively):

1)    X PROP [ rdf:value "LIT" ; rdf:type "TYPE" ] .

2)    X PROP "LIT" .
      PROP rdfs:range TYPE .

are semantically synonymous.

They both define the pairing ("LIT",TYPE) which is
(I believe) the agreed denotation of a value in the
value space of a given data type for a lexical form
(literal).


> However, some of both of these can be regained by employing 
> XML Schemas,
> i.e., taking any XML Schema information in an XML document 
> and using that
> to determine the actual datavalue for literals.

I'm not sure we'd like to have to do that. It could be (rightly)
seen as an unreasonable burden on an RDF system to have to use
an XML Schema parser/component just to be able to make sense of 
typed data literals, especially if the system wishes simply to 
make inferences about type relations and never interpret the 
literal values' lexical forms themselves.
 
> Also, if anyone comes up with an acceptable (i.e., acceptable to both
> Pat and Patrick :-) as well as others) method for working 
> with text, i.e.,
> text nodes that do not get a type from XML Schema 
> information, then it can
> be added to the proposal.
> 
> 
> What is gained?
> 
> Better conformance with XML and XML Schema!

Seems like the P+DAML dual idiom approach has equal "conformance"
to XML and XML Schema (or maybe I've missed something, again ;-)

> Fewer messages on rdf-core-wg!!!

Let's hope!  Where's that wish-bone...?  ;-)

> PS:  The name of this proposal is PFPS (or, if you really need to save
>      bits, PS).

Fair enough.

And the name of the revised P+DC+DAML multi-idiom proposal as
outlined immediately above should probabably be PFFTTTHT!!!  
(or just '!' for short ;-)

Happy Turkey Day Y'all! 

(not that it means much here in Finland...)

Cheers,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Thursday, 22 November 2001 07:36:48 UTC