RE: RDF specifications

First a couple of general notes and then some specific arguments.

This discussion was started it by my stating that the interface between RDF
and applications did not have to pass constructed XML datatypes, but
instead could just use the primitive one.  In this context, one has to
assume the existence of the interface.

Also, I don't understand how the interface between RDF and applications is
considered to be outside of RDF.  If you don't have a standard interface
then how can anyone consider RDF to be a standard?


From: Patrick.Stickler@nokia.com
Subject: RE: RDF specifications
Date: Mon, 17 Dec 2001 03:19:45 +0200

> > How can RDF exist without some sort of API?  
> 
> This is like asking how XML can exist without an API. Of
> course, it can't, per se. But that API need not be a native
> component of the representation. Thus, we have SAX, DOM, etc.
> as APIs for XML and we should, I fully agree, have a standardized
> API for RDF -- but it need not be defined by RDF.

How can I then write applications that can use different implementations of
RDF?

> > RDF has to be able to pass data to applications
> > somehow?  How is this to be done?
> 
> No. RDF does not "pass" anything to any application. It is
> akin to XML. It is only the representation of knowledge, not
> any specific procedures or functions for the manipulation 
> or interpretation of that knowledge.

OK, RDF is not any specific procedures or functions.  No standard should
dictate the specific procedures or functions.  However, a standard is
certainly concerned with the manipulation or interpretation of the entities
in its domain.  Knowledge representation standards are no exception to this
general statement.

For example, if RDF(S) did not manipulate or interpret the knowledge that
it is given then it wouldn't be able to
1/ do any inheritance, including determining that a resource belongs to the
   superclasses of its types
2/ do anything with domain or range information
along with many other manipulations and interpretations of the knowledge
that are required by the RDF and RDFS specifications.

> I agree that a standardized RDF API is sorely needed, but
> this is a layer above RDF, not part of RDF itself (taking
> the very strict meaning of "RDF" referring to the three
> components (a) graph model, (b) serialization, and (c) core
> vocabulary/ontology.

This strict meaning of ``RDF'' doesn't tell me anything about what you
think RDF(S) should do.  It doesn't even mention taking a serialization and
constructing the RDF graph that corresponds to that serialization.  Further
it doesn't mention any required processing for RDF graphs.  Does your
mention of core vocabulary/ontology commit an RDF(S) implementation to
perform all the inferences alluded to in the RDF recommendation and the
RDFS candidate recommendation?  Or, perhaps, you mean that an RDF(S)
implementation is committed to determine entailment in the new RDF model
theory?

> The API is not part of that strict, core definition of RDF.
> And the key point is that the API should not define or dictate
> how data typing is expressed in the graph, though it may
> provide useful abstractions or higher level views of data
> typing knowledge.

Sure, but then it is not valid to argue that an approach to RDF datatyping
is too complex for applications just because it requires RDF to handle XML
Schema datatypes.  The API could hide this complexity as long as it
provided access to the information relevant to applications.



> > > That's out of
> > > scope insofar as how we are going to capture that data typing
> > > knowledge in the RDF graph itself, in a way that is independent of
> > > any data typing scheme or any particular representation used by
> > > any given platform.
> > > 
> > > I will reiterate, RDF should not use XML Schema data types natively
> > > in the graph.
> > 
> > I just don't understand this approach.  If RDF is going to 
> > have datatypes
> > at all, then it has to have some understanding of the 
> > datatypes, otherwise
> > entailment cannot be performed.
> 
> This is perhaps the point of disconnect. RDF should not (and I expect
> will not) *have* data types. RDF will provide consistent, explicit,
> and standardized mechanisms for associating data types with literals
> (or any resource). It is then up to the application to infer and
> execute all mappings from lexical forms to values, and that execution
> will differ from platform to platform as the internal canonical
> representations for values are specific to platform.
> 
> RDF will provide the knowledge required to test entailment, but does
> not itself "perform" such entailment -- i.e. an RDF parser or RDFS
> validator is not going to say anything about that, no more so than
> whether two URIs actually denote the same "thing".

If RDF is not required to perform entailment then why is Pat Hayes
producing a document that defines RDF(S) entailment?

> > A formalism that allows various datatyping schemes to be incorporated
> > has to solve all the problems that a formalism that just
> > incorporates a particular datatyping scheme, and more.
> 
> Again, RDF is not "incorporating" multiple data typing schemes, it is
> supporting arbitrary data typing schemes by providing a generic method
> of associating the identity of specific data types to resources. Those
> are two very different things. Simple typed values are not explicit in
> the RDF graph, only the knowledge which uniquely identifies those
> values. Because RDF is a platform neutral representation that must work
> with a unrestricted range of languages, platforms, etc. you cannot
> have any explicit, internalized, canonical representation for values in
> the RDF graph.

How can RDF work with an unrestricted range of languages, platforms, etc.?
This is just *not* possible.  (To take an extreme example, my first
programming was done on an early programmable calculator.  There is no way
that RDF could have worked on this platform nor with the programming
language for this platform.)  

So lets take another extremely impoverished platform and language: a Turing
machine.  It certainly is possible to build an implementation of RDF(S) for
a Turing machine that has an explicit, internalized, and canonical
representation for values in an RDF graph.   If it is possible on a Turing
machine, then why can't it be done on other, reasonable languages and
platforms?

As long as I build an implementation of RDF(S) that acts the right way,
whatever that is, then I get to choose my internal data structures.  This
goes for how I represent resource identifiers, how I represent edges in the
graph, and for how I represent literals, whether literals are untyped or
part of datatyping scheme.  As long as I can present an interface to
datatyped literals that meets the specification, I am free to do whatever I
want internally.

If the RDF specification *only* requires that an implementation present
datatyped literals as pairs consisting of a type and a lexicalization, then
I can build an implementation that will only present primitive XML Schema
datatypes. 


> > In particular, how are you going to determine whether 
> > 
> > 	age rdfs:range xsd:integer .
> > 	John age "10" .
> > 
> > entails
> > 
> > 	John age "010" .
> 
> Just as any application which is going to utilize knowledge expressed
> in RDF must have some basic "understanding" of the semantics attributed
> to the ontologies used, likewise any application which is going to
> manipulate typed literals expressed in RDF must have some basic
> "understanding" of the data types associated with those literals.
> 
> So, an application which knows about the data type xsd:integer will
> know how to map those two lexical forms to values in its own internal
> canonoical value space corresponding to the abstract value space for
> xsd:integer and once they are mapped to internal values, it can compare 
> them. 
> 
> > > > In this case RDF does not have datatypes.  
> > > 
> > > BINGO! RDF itself should not *have* datatypes. It should provide
> > > a generic mechanism for allowing data types to be associated with
> > > literals (or really, any resource).
> > 
> > But then you haven't done anything.  If you truely believe 
> > this, then why
> > are you arguing for PDU?  
> 
> PDU does not add data types to RDF. PDU is (a) a view of how data typing
> can be defined in terms of a pairing of lexical form and data type
> identity which uniquely and unambigously identifies a specific value
> in the value space of that data type and (b) a set of idioms which all
> synonomously provide for defining such pairings in the RDF graph, each
> with benefits in particular areas: P providing for [global] definition,
> D(AML) providing for local explicit definition, and U providing for
> local definition which facilitates in maximal graph compression.
> 
> In fact, the S proposal can be interpreted as a fourth idiom which
> defines these pairings by means of property constructs.

I just do not believe that this view of datatyping is a supportable one.
To convince me you will have to present a fully-worked out view of how PDU
is going to work, including how it interacts with the model theory of RDF.


> > > The PDU proposal accomplishes this fully, and is also the way folks
> > > are doing data typing in RDF now. And it works.
> > 
> > I dispute all three of these claims.  First, I have recently 
> > sent you a
> > message concerning difficulties in PDU.  
> 
> I've not yet gotten to that, but will reply separately.
> 
> > Second, PDU places 
> > restrictions on
> > data typing that are not enforced by current RDF and, I 
> > expect, are not
> > followed by many uses of data typing in RDF.  
> 
> What restrictions?

That the blank nodes that represent literals can only have one rdf:type and
one rdf:value.

> > Third, PDU does 
> > not provide a
> > full account of what datatypes mean in RDF.
> 
> While the documentation may at the moment be a bit sparse, and
> thus, you have a (temporarily) valid criticism, the conceptual
> foundation of PDU, namely the view of data typing based on pairings
> of lexical form and data type which infer the mapping of lexical
> form to value, is compatible with (and I would venture to say is)
> the general concensus of what data typing "means" in RDF, and
> will have (and likely already has) full treatment in the MT.
> 
> Pat and others are of course free to correct me here if I'm wrong.

I just do not see how the view that you espouse can be reconciled with the
RDF model theory.  I may be wrong, but I haven't seen any indication that
this reconciliation is possible.

> The latest Data Typing draft produced by Sergey is analogous to
> the PDU view, though it does not mention (yet ;-) PDU pairings.
> 
> The attached graphic should help to illustrate these relations.
> 
> I understand that you would like the RDF data typing solution
> to address all three levels explicitly (including the Application
> Value Space) and it's my assertion that this would be contrary
> to the generic, application neutral purpose of RDF.

First of all I don't believe that the attached graphic provides a
reasonable way of looking at the problem, at least as far as RDF is
concerned.  Second, I don't have any problem with RDF working only in the
abstract value space, at least conceptually.  However, the abstract value
space, if I am understanding your graphic correctly, must have in it a
theory of equality, and thus RDF would have to abide by that theory.


> > > > Because 
> > > > RDF does not
> > > > understand the conventions no RDF document should mention them.
> > > 
> > > I don't follow your argument here.
> > 
> > If something is not defined by RDF then no RDF document 
> > should talk about
> > that thing.  (For example, if datatypes are not defined in 
> > RDF then no RDF
> > document should be talking about datatyping.)
> 
> If you are proposing that RDF should add mechanisms to actually
> define (a) the canonical representation for all values for all
> data types and (b) the algorithms needed to define the actual
> mappings from lexical space to value space, then RDF is quickly
> going to become terribly limited and impractical. XML doesn't
> define the semantics of XHTML or SOAP, why should RDF define 
> the semantics of a particular data typing scheme?

If RDF claims to handle datatyping, then it must abide by the semantics of
datatyping schemes, at least so far as to be able to determine entailment.
If it doesn't, then it is not handling that scheme correctly.

[...]

> > If "2+5" denotes '7' then RDF needs to know it so that it can 
> > determine
> > that 
> > 
> > 	John age "2+5" .
> > 
> > entails
> > 
> > 	John age "7" .
> 
> Well, if we are presuming the typing
> 
>    age rdfs:range xsd:integer .
> 
> then "2+7" is not a legal lexical form for that data type. Though
> if you replaced "2+7" in your example with the earlier "010", then
> I do understand what you are trying to accomplish, but I also
> still feel that the graph itself is the wrong level to do this.
> 
> Just as you need a higher level of processing logic to validate
> RDFS constraints, catch and deal with contradictions, handle
> statements of equivalence or subclassing, etc. you also need that
> higher level of dealing with entailment of typed data literal
> values. What is important/crucial is that you have all the knowledge
> you need to do that expressed in the graph. 


Here is where we have a basic disagreement.
I say that if you claim
1/ to be doing reasonable entailment
2/ to be handling a datatype scheme
then you *must* have

age rdfs:range xsd:integer.
John age "010".

entail

John age "10".

and, moreover, this has to happen within RDF.

If it doesn't happen within RDF then RDF is not doing entailment.


Your claim that this is not necessary, because ``you have all the knowledge
you need to do [it] expressed in the graph'', is just the same as claiming
that you don't need to entail

John rdf:type Person.

from

Student rdfs:subClassOf Person.
John rdf:type Student.

> The mechanisms, vocabulary,
> and representations employed in the graph itself should be application
> neutral, generic, flexible, and as future proof as possible, and adding
> native data types with explicit internalized canonical representations
> of typed values in the graph defeats that purpose, and IMO would
> substantially reduce the utility of RDF.

I don't see how this follows at all.  How can the presence of datatype
values, canonicalized or not, in the RDF graph do any of this?


> Regards,
> 
> Patrick

Peter F. Patel-Schneider
Bell Labs Research

Received on Monday, 17 December 2001 09:23:13 UTC