RE: Cutting the Patrician datatype knot from Peter F. Patel-Schneider on 2001-11-22 (www-rdf-interest@w3.org from November 2001)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Thu, 22 Nov 2001 09:08:29 -0500
To: Patrick.Stickler@nokia.com
Cc: www-rdf-interest@w3.org, joint-committee@daml.org
Message-Id: <20011122090829Y.pfps@research.bell-labs.com>
From: Patrick.Stickler@nokia.com
Subject: RE: Cutting the Patrician datatype knot
Date: Thu, 22 Nov 2001 14:27:19 +0200

> > -----Original Message-----
> > From: ext Peter F. Patel-Schneider 

[...]

> > Suppose one decided that nodes in an RDF graph were one of
> > 	1/ URIs
> > 	2/ blank nodes
> > 	3/ data values
> > 	4/ text (untidy)
> 
> Why not let text nodes be tidy if they don't map to any
> values? I.e. if they don't themselves denote a resource,
> then why worry if they have other interpretations in
> other contexts? They're just strings in this case, right?
> so go ahead and tidy them up.

Precisely to leave the door open for a better handling of them.  This is
why I injected my comment into rdf-core-wg quite a while ago---the model
theory was making literals tidy, which prevented any possibility of having
the lexical-to-data mapping of a literal depend on its type.

And no, text is not just strings, at least not as far as XML Schema is
concerned.

> > An RDF/XML serialization of an RDF graph element of the form
> > 	< s , p , v > for v a data value
> > is of the form
> > 	<... s ...>
> > 	  ...
> > 	  <p xsi:type="du">x</p>
> > 	  ...
> > 	</...>
> 
> Unfortunately, this isn't legal RDF/XML. It'd have to
> be something like:
> 
>      ...
>      <p xsi:type="du" rdf:value="x"/>
>      ...

I'm proposing making it legal.

> which gives us the graph:
> 
>      s --p--> _:1 --xsi:type---> "foo:du"
>                 |
>                 ----rdf:value--> "x"
> 
> So essentially, this is the DAML idiom (and very similar
> to the DC idiom) but using xsi:type instead of rdf:type, 
> right?

No.  I'm proposing that the end result be much more like

	s --p--> xsd:du:x

> Though, why use xsi:type rather than rdf:type? Are we saying
> that a typed literal resource is a different kind of resource
> than a typed non-literal resource, and hence the typing is
> declared differently? Are we sure we want to say that? And
> are we adopting the full semantics attributed by the XML Schema
> spec to xsi:type? What are the implications for broader statements
> about XML Schema constructs in general in RDF if we use it for
> typing literal resources?

Because the xsi:type plays a much different role than rdf:type.  Yes,
yes. Yes.  Maybe, although all you need for this to go through is to
understand the primitive (and, maybe, the built-in) XML Schema datatypes.

I'm not aware of any problems with the rest of XML Schema.  You should
still be able to use xsd:integer as a class, for example.

> What is the difference between:
> 
>      <p xsi:type="foo:du" rdf:value="x"/>
>      <p rdf:type="foo:du" rdf:resource="foo:du:x"/>
> 
> I.e. why would a literal resource be typed by an xsi:type property
> when a non-literal resource is typed by an rdf:type property? In
> both cases, it is the resource denoted by the node that bears
> the typing property, so why not use the same mechanism? Do we
> then also need to add an xsi:range in addition to rdfs:range?

Because literals are, indeed, different from non-literals.   Because the
typing of literals also provides the lexical-to-data mapping.  No,
rdfs:range works fine to type the object position of a property, it just
does not have any lexical-to-data mapping implications, which are what are
causing the current problems.

> Also, the xsi:type attribute value will be interpreted as a literal
> not a URI Ref by an RDF parser. I.e. for the above you get
> 
>      s --p--> _:1 --xsi:type---> "foo:du"
>      |          |
>      |          ----rdf:value--> "x"
>      |
>      ----p--> <foo:du:x> --rdf:type--> <foo:du>
> 
> Note that in this case, the type value of xsi:type is treated
> as something different than the value of rdf:type (then again,
> maybe this is what you are trying to do...?)

Note that I *am* extending the syntax of RDF/XML, so *I* get to say how
this extended syntax maps to RDF graphs!

> If we use rdf:type instead of xsi:type, we get something
> perhaps more consistent, both insofar as literal and
> non-literal resources are concerned, as well as in the
> treatment of type values as URI Refs by the RDF parser:
> 
>      <p rdf:type="foo:du" rdf:value="x"/>
>      <p rdf:type="foo:du" rdf:resource="urn:foo:x"/>
> 
> 
>      s --p--> _:1 --rdf:type--------------|
>      |          |                         |
>      |          ----rdf:value--> "x"      |
>      |                                    v
>      ----p--> <foo:du:x> --rdf:type--> <foo:du>
> 
> Both objects of the 'p' property are nodes denoting
> values and both value nodes are typed, and the literal
> resource value node has the extra information about
> its lexical form, which is needed for literal resources
> but not for non-literal resources -- but otherwise, it
> is a consistent representation and consistent treatment
> of data typing for both literal and non-literal resources.
> 
> Eh?

But, again, literals are different from non-literals.  For example,
consider the following

	<foo> 
	  <rdfs:range rdf:resrouce="xsd:integer"/> 
	</foo>

	<Person rdf:about="John">
          <foo xsi:type="xsd:decimal">7</foo>
        </Person>

This is perfectly fine.

Using rdf:type gets into the problems with clashing mappings.

For example

		 ----rdf:type----->[string union integer]
		 |
      s --p--> _:1 --rdf:type----->[integer union string]
                 |                        
                 ----rdf:value--> "7"     

results in uncertainty over whether s's p is 7 (the integer) or ``7'' (the
string).

 
[..]

> > 
> > Well, the ability to provide the lexical-to-data mapping once, as in
> > 
> > 	<Property rdf:about="age">
> > 	  <rdfs:range rdf:resource="xsd:integer">
> > 	</Property>
> >
> > and the related ability to do anything useful with
> > 
> > 	<Person>
> > 	  <age>10</age>
> > 	</Person>
>  
> How so? Since type is being ascribed to the object of a statement,
> and thus to the node (not the literal), why doesn't rdfs:range 
> work as expected?

Because text has no type.  Text is text.  Text is not string. Text is not
integer.  This would, in fact, be invalid, just as if you said

 	<Person>
 	  <age xsi:type="xsd:string">10</age>
 	</Person>

> I.e., the range constraint as defined above *implies* the
> following knowledge
> 
>    <Person>
>       <age rdf:type="xsd:integer" rdf:value="10"/>
>    </Person>
> 
> even if it is not defined as such in the explicit statement.
> 
> Right?

Not in this proposal.

> And the actual assertion of implied statements based on rdfs:range
> constraints could result in a modification of the graph itself
> to accomodate the expanded, explicitly typed idiom.
> 
> Thus, the two idioms (DAML/DC and P respectively):
> 
> 1)    X PROP [ rdf:value "LIT" ; rdf:type "TYPE" ] .
> 
> 2)    X PROP "LIT" .
>       PROP rdfs:range TYPE .
> 
> are semantically synonymous.
> 
> They both define the pairing ("LIT",TYPE) which is
> (I believe) the agreed denotation of a value in the
> value space of a given data type for a lexical form
> (literal).

If the datatyping scheme cooperates, then the rdf:type solution works well.
If it doesn't, then you probably need some other mechanism for providing the
lexical-to-data map.

> > However, some of both of these can be regained by employing 
> > XML Schemas,
> > i.e., taking any XML Schema information in an XML document 
> > and using that
> > to determine the actual datavalue for literals.
> 
> I'm not sure we'd like to have to do that. It could be (rightly)
> seen as an unreasonable burden on an RDF system to have to use
> an XML Schema parser/component just to be able to make sense of 
> typed data literals, especially if the system wishes simply to 
> make inferences about type relations and never interpret the 
> literal values' lexical forms themselves.

I am totally mystified by this comment.  First, one could just say
that only the primitive and built-in types are supported for now.  Second,
so what?  XML Schema is a W3C Recommendation.  Is RDF, and, certainly
RDF/XML, not part of the World-Wide Web.  Is the RDF Core WG not chartered
to ``build upon XML Schema datatypes to the fullest extent practical and
appropriate''?  

Now you might argue (as Dan Connolly does) that it is impractical to build
on full XML Schema datatypes because the code for XML Schema (datatypes) is
too big.  I don't buy that argument, but it is a permissable one.  However,
your argument is fundamentally different, and, in my view, counter to the
RDF Core WG charter.

> > Also, if anyone comes up with an acceptable (i.e., acceptable to both
> > Pat and Patrick :-) as well as others) method for working 
> > with text, i.e.,
> > text nodes that do not get a type from XML Schema 
> > information, then it can
> > be added to the proposal.
> > 
> > 
> > What is gained?
> > 
> > Better conformance with XML and XML Schema!
> 
> Seems like the P+DAML dual idiom approach has equal "conformance"
> to XML and XML Schema (or maybe I've missed something, again ;-)

No.  The DAML+OIL approach is somewhat similar, but does not fit as well into
XML and XML Schema documents.  You might consider this to be a
stripped-down version of the DAML+OIL approach recast into XML Schema
clothing.  

One of our requirements in DAML+OIL was to come up with syntactically-correct
RDF/XML.  In fact, one of the proposals for DAML+OIL was to use precisely
the syntax I'm proposing here.  This was shot down because it was not
syntactically-correct RDF/XML.  The RDF Core WG is, I think, free to
(slightly) extend RDF/XML syntax in this way, so that they can indeed use
``XML Schema datatypes to the fullest extent that is practical and
appropriate''.

[...]

> Cheers,
> 
> Patrick

Gobble, gobble.

peter
Received on Thursday, 22 November 2001 09:09:14 UTC