RE: RDF specifications from Patrick.Stickler@nokia.com on 2001-12-17 (www-rdf-interest@w3.org from December 2001)

From: <Patrick.Stickler@nokia.com>
Date: Mon, 17 Dec 2001 03:19:45 +0200
To: pfps@research.bell-labs.com
Cc: www-rdf-interest@w3.org
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B160B43@trebe006.NOE.Nokia.com>
> How can RDF exist without some sort of API?  

This is like asking how XML can exist without an API. Of
course, it can't, per se. But that API need not be a native
component of the representation. Thus, we have SAX, DOM, etc.
as APIs for XML and we should, I fully agree, have a standardized
API for RDF -- but it need not be defined by RDF.
 
> RDF has to be able to pass data to applications
> somehow?  How is this to be done?

No. RDF does not "pass" anything to any application. It is
akin to XML. It is only the representation of knowledge, not
any specific procedures or functions for the manipulation 
or interpretation of that knowledge.

I agree that a standardized RDF API is sorely needed, but
this is a layer above RDF, not part of RDF itself (taking
the very strict meaning of "RDF" referring to the three
components (a) graph model, (b) serialization, and (c) core
vocabulary/ontology.

The API is not part of that strict, core definition of RDF.
And the key point is that the API should not define or dictate
how data typing is expressed in the graph, though it may
provide useful abstractions or higher level views of data
typing knowledge.

> > That's out of
> > scope insofar as how we are going to capture that data typing
> > knowledge in the RDF graph itself, in a way that is independent of
> > any data typing scheme or any particular representation used by
> > any given platform.
> > 
> > I will reiterate, RDF should not use XML Schema data types natively
> > in the graph.
> 
> I just don't understand this approach.  If RDF is going to 
> have datatypes
> at all, then it has to have some understanding of the 
> datatypes, otherwise
> entailment cannot be performed.

This is perhaps the point of disconnect. RDF should not (and I expect
will not) *have* data types. RDF will provide consistent, explicit,
and standardized mechanisms for associating data types with literals
(or any resource). It is then up to the application to infer and
execute all mappings from lexical forms to values, and that execution
will differ from platform to platform as the internal canonical
representations for values are specific to platform.

RDF will provide the knowledge required to test entailment, but does
not itself "perform" such entailment -- i.e. an RDF parser or RDFS
validator is not going to say anything about that, no more so than
whether two URIs actually denote the same "thing".

 
> A formalism that allows various datatyping schemes to be incorporated
> has to solve all the problems that a formalism that just
> incorporates a particular datatyping scheme, and more.

Again, RDF is not "incorporating" multiple data typing schemes, it is
supporting arbitrary data typing schemes by providing a generic method
of associating the identity of specific data types to resources. Those
are two very different things. Simple typed values are not explicit in
the RDF graph, only the knowledge which uniquely identifies those
values. Because RDF is a platform neutral representation that must work
with a unrestricted range of languages, platforms, etc. you cannot
have any explicit, internalized, canonical representation for values in
the RDF graph.

> In particular, how are you going to determine whether 
> 
> 	age rdfs:range xsd:integer .
> 	John age "10" .
> 
> entails
> 
> 	John age "010" .

Just as any application which is going to utilize knowledge expressed
in RDF must have some basic "understanding" of the semantics attributed
to the ontologies used, likewise any application which is going to
manipulate typed literals expressed in RDF must have some basic
"understanding" of the data types associated with those literals.

So, an application which knows about the data type xsd:integer will
know how to map those two lexical forms to values in its own internal
canonoical value space corresponding to the abstract value space for
xsd:integer and once they are mapped to internal values, it can compare 
them. 


> > > In this case RDF does not have datatypes.  
> > 
> > BINGO! RDF itself should not *have* datatypes. It should provide
> > a generic mechanism for allowing data types to be associated with
> > literals (or really, any resource).
> 
> But then you haven't done anything.  If you truely believe 
> this, then why
> are you arguing for PDU?  

PDU does not add data types to RDF. PDU is (a) a view of how data typing
can be defined in terms of a pairing of lexical form and data type
identity which uniquely and unambigously identifies a specific value
in the value space of that data type and (b) a set of idioms which all
synonomously provide for defining such pairings in the RDF graph, each
with benefits in particular areas: P providing for local definition,
D(AML) providing for local explicit definition, and U providing for
local definition which facilitates in maximal graph compression.

In fact, the S proposal can be interpreted as a fourth idiom which
defines these pairings by means of property constructs.

> > The PDU proposal accomplishes this fully, and is also the way folks
> > are doing data typing in RDF now. And it works.
> 
> I dispute all three of these claims.  First, I have recently 
> sent you a
> message concerning difficulties in PDU.  

I've not yet gotten to that, but will reply separately.

> Second, PDU places 
> restrictions on
> data typing that are not enforced by current RDF and, I 
> expect, are not
> followed by many uses of data typing in RDF.  

What restrictions?

> Third, PDU does 
> not provide a
> full account of what datatypes mean in RDF.

While the documentation may at the moment be a bit sparse, and
thus, you have a (temporarily) valid criticism, the conceptual
foundation of PDU, namely the view of data typing based on pairings
of lexical form and data type which infer the mapping of lexical
form to value, is compatible with (and I would venture to say is)
the general concensus of what data typing "means" in RDF, and
will have (and likely already has) full treatment in the MT.

Pat and others are of course free to correct me here if I'm wrong.

The latest Data Typing draft produced by Sergey is analogous to
the PDU view, though it does not mention (yet ;-) PDU pairings.

The attached graphic should help to illustrate these relations.

I understand that you would like the RDF data typing solution
to address all three levels explicitly (including the Application
Value Space) and it's my assertion that this would be contrary
to the generic, application neutral purpose of RDF.

> > > Because 
> > > RDF does not
> > > understand the conventions no RDF document should mention them.
> > 
> > I don't follow your argument here.
> 
> If something is not defined by RDF then no RDF document 
> should talk about
> that thing.  (For example, if datatypes are not defined in 
> RDF then no RDF
> document should be talking about datatyping.)

If you are proposing that RDF should add mechanisms to actually
define (a) the canonical representation for all values for all
data types and (b) the algorithms needed to define the actual
mappings from lexical space to value space, then RDF is quickly
going to become terribly limited and impractical. XML doesn't
define the semantics of XHTML or SOAP, why should RDF define 
the semantics of a particular data typing scheme?

> And, by the way, XML did make changes to <p>.  Just try to pass an
> old-style HTML document through an XML parser and see how 
> many errors you
> get.

Ummm, no. The mis-usage of <p> as an empty element in HTML was
both bad SGML and bad XML. XML did not change anything. Browsers
are still free to interpret non-well-formed XML just as they were
free to interpret non-well-formed SGML, but that doesn't mean
such encodings are correct.

> If "2+5" denotes '7' then RDF needs to know it so that it can 
> determine
> that 
> 
> 	John age "2+5" .
> 
> entails
> 
> 	John age "7" .

Well, if we are presuming the typing

   age rdfs:range xsd:integer .

then "2+7" is not a legal lexical form for that data type. Though
if you replaced "2+7" in your example with the earlier "010", then
I do understand what you are trying to accomplish, but I also
still feel that the graph itself is the wrong level to do this.

Just as you need a higher level of processing logic to validate
RDFS constraints, catch and deal with contradictions, handle
statements of equivalence or subclassing, etc. you also need that
higher level of dealing with entailment of typed data literal
values. What is important/crucial is that you have all the knowledge
you need to do that expressed in the graph. The mechanisms, vocabulary,
and representations employed in the graph itself should be application
neutral, generic, flexible, and as future proof as possible, and adding
native data types with explicit internalized canonical representations
of typed values in the graph defeats that purpose, and IMO would
substantially reduce the utility of RDF.

Regards,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Attachments

application/octet-stream attachment: data-types.png
Received on Sunday, 16 December 2001 20:19:58 UTC