RE: Literals (Re: model theory for RDF/S)

From: Patrick.Stickler@nokia.com
Subject: RE: Literals (Re: model theory for RDF/S)
Date: Tue, 2 Oct 2001 15:01:51 +0300 

> > [Peter Patel-Schneider]
> > What I see in
> > this proposal is a method for providing a general mechanism 
> > for providing
> > special cases for RDF.  An RDF processor would have to understand, and
> > parse, all sorts of different syntax.
> > 
> > Consider the situation with a hypothetical integer scheme.  If an RDF
> > processor is given
> > 
> > int:5 #loves #Susan .
> > 
> > and 
> > 
> > int:05 #loves #Jackie .
> > 
> > then it has to understand that int:5 and int:05 are the same
> > and respond to a query about the loves of 5 that it #loves 
> > both #Susan and
> > #Jackie.
> 
> Well, not really (IMO)...
> 
> It is true that int:5 and int:05 would technically constitute
> different URIs and hence different resources, but that's how
> RDF does things, eh? Different URI, different resource. I'm
> sure we don't want to shift that foundational pillar...  ;-)

If RDF treats int:5 and int:05 different then it doesn't understand
integers.

> Insofar as as a generalized, consistent, global representation
> for a given data type, though, one would expect that there would
> be constraints defined which prohibit semantically vacuous variant
> forms, such as above. So yes, you bring up a very valid requirement
> for e.g. an int: scheme, that we wouldn't get int:00000000005, etc.
> but that's an issue for the particular scheme, not the methdology of
> URI encoded literals itself, I think (apart from specifying it as
> an expected quality of every such scheme to not have semanticly
> vacuous variant forms).

So then you need special-purpose parsers for each scheme. More than that,
you need at most one literal for each value in the datatype, which is
certainly not the normal way of doing things.

> And on a practical level, one would not necessarily expect URI 
> encoded literals to act as the subject of statements, or to
> serve as indirect identifiers of other resources, even if technically
> they could be coerced to do so (and regardless of whether they
> were guarunteed to be free of semantically vacuous variant forms).

Even if they are just objects of triples, you can run into problems.

Consider

#Susan #favorite-integer int:05 .
#Susan #favorite-integer int:5 .

how is an RDF query system supposed to respond when asked about Susan's
favorite integers?

> > Similarly, consider the situation with a hypothetical scheme for web
> > pages.  These are supposed to represent actual web pages, not 
> > URIs.  If an
> > RDF processor is given  
> > 
> > wp://www-db.research.bell-labs.com/user/pfps #loves #Susan .
> > 
> > and
> > 
> > wp://db.bell-labs.com/user/pfps #loves #Jackie .
> > 
> > then an RDF processor has to understand that these two are 
> > the same 
> 
> Not at all. 
> 
> URIs to an RDF processor are just opaque, globally unique identifiers. 
> An RDF processor does not, and should not have to understand anything 
> about any URI, insofar as the semantics of the URI or URI scheme themselves 
> are concerned (I stress that last clause of the assertion, re-read as
> needed). 

Again, if the RDF processor does not understand something about the URI
scheme then it has not captured anything about the URI scheme.  If these
strange URIs are not given some sort of theory, and the part of that theory
that makes a difference to RDF is not followed by an RDF processor, then
you have not captured the meaning of the URI scheme.  (You may think that
RDF is so expressively impoverished that it doesn't need to know about any
part of the theory, but this is just not so.)

> Granted, a given RDF application will generally need to know about or infer 
> (RDF defined) semantics attributed to a given URI according to one or more 
> ontologies, or dereference a given URI to access additional data, 
> but that is beyond the scope of RDF proper and even in the case of
> dereferencing a URI, the RDF application itself does not have to understand
> anything about the URI, only how to interact with an dereferencing agent
> which itself understands the URI.
> 
> And furthermore, since URLs are URIs, this issue is present in 
> RDF's present incarnation. 
> 
> E.g.
> 
>    http://www-db.research.bell-labs.com/user/pfps #loves #Susan .
> 
>    and
>   
>    http://db.bell-labs.com/user/pfps #loves #Jackie .
> 
> Same problem. Right?
>
> The use of URI encoded literals does not itself introduce any such problem
> nor complicate the problem (even if it does not solve the problem). 


Not at all.  URLs have the (potentially strange) characteristic that two
different URLs that map to the same ``place'' are different objects.  There is
nothing theoretically wrong with this way of looking at URLs.  You may say
that it is incorrect, and if you convince enough people some W3C
recommendations may have to change, but this view of URLs is internally
consistent.

However, if you want a different URI scheme, like wp:, that has a different
meaning, and you want to have RDF abide by this meaning, then RDF
processors may have to do something different for this URI scheme.

> If two URI strings differ in any way, than they represent different RDF 
> resources, and mechanisms such as daml:equivalentTo or similar must be
> employed 
> to address such relationships between resources. I.e.
> 
>    <rdf:Description
> rdf:about="http://www-db.research.bell-labs.com/user/pfps">
>       <daml:equivalentTo rdf:resource="http://db.bell-labs.com/user/pfps"/>
>    </rdf:Description>
> 
> Right?

Yes, in DAML+OIL you can equate two different URIs, via equivalentTo.
However, that is certainly not how DAML+OIL handles datatypes, like
integers.  A DAML+OIL (March 2001) processor has to understand a portion of
XML Schema, not just the syntax but also the semantics.

> > As far as I can see, no matter how you do it, any scheme for providing
> > different semantic domains, be they integers or whatever, will require
> > special purpose parsing and special purpose understanding in RDF.  The
> > situation only becomes more complex in more-powerful representation
> > systems.
> 
> Yet that is the present situation with RDF as it is now defined, and I
> don't see how the proposed approach introduces additional complexity
> or in any way makes it more difficult or involved for RDF applications
> to function within the context of the fundamental "one-URI, one-Resource" 
> philosophy embodied in RDF.

It is true that you can make a consistent view of all this from this
``RDF'' viewpoint, but you do have to be a bit careful.  In particular, if
you want to allow RDF to be consistent with different URI schemes, you have
to modify the "one-URI, one-Resource" philosophy to a "one-URI, possibly
one-Resource".  

This has consequences with respect to the *meaning* of answers returned by
an RDF query system.  In particular, the notion of cardinality becomes more
than a little suspect.  (Consider asking how many favorite integers Susan
has.  If an RDF query system answers two, then it is not just incomplete,
it is wrong.  This can be fixed by some rather technical means, but must be
done carefully.)

> It does, though, make the serialization simpler, as (a) it reduces the
> number of variant syntactic realizations, (b) removes (either real or
> percieve) inconsistencies between the way contracted forms are mapped
> to the graph for resource versus literal objects, and (c) it removes
> all of the presently needed discussion about what literals are and
> how they differ from other resources.

Perhaps, but it does mean that RDF moves further away from the XML / XML
Schema way of representing literals and datatypes.

> To me, that's simpler and more consistent (but I'll happily and humbly
> admit that I suffer from sporatic ignorance and may very well be playing
> out in la'la land on this one ;-)
> 
> Regards,
> 
> Patrick

There is nothing technically wrong with the URI philosophy in RDF nor with
the treatment of literals in RDF, at least nothing wrong that can't be
fixed by the above patch, but if you stick with this philosophy I don't
think that you can claim to be representing anything besides uninterpreted
URIs.  Moreover, there are certain consequences of this approach that need
to be analyzed.

Peter F. Patel-Schneider

Received on Tuesday, 2 October 2001 08:43:38 UTC