RE: Literals (Re: model theory for RDF/S) from Patrick.Stickler@nokia.com on 2001-10-03 (www-rdf-logic@w3.org from October 2001)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 3 Oct 2001 10:25:04 +0300
To: pfps@research.bell-labs.com
Cc: drew.mcdermott@yale.edu, www-rdf-logic@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C003@trebe003.NOE.Nokia.com>
> -----Original Message-----
> From: ext Peter F. Patel-Schneider 
> [mailto:pfps@research.bell-labs.com]
> Sent: 02 October, 2001 22:52
> To: Stickler Patrick (NRC/Tampere)
> Cc: drew.mcdermott@yale.edu; www-rdf-logic@w3.org
> Subject: RE: Literals (Re: model theory for RDF/S)
> 
> 
> From: Patrick.Stickler@nokia.com
> Subject: RE: Literals (Re: model theory for RDF/S)
> Date: Tue, 2 Oct 2001 22:19:03 +0300 
> 
> > > > > It is true that you can make a consistent view of all 
> > > this from this
> > > > > ``RDF'' viewpoint, but you do have to be a bit careful.  In 
> > > > > particular, if
> > > > > you want to allow RDF to be consistent with different URI 
> > > > > schemes, you have
> > > > > to modify the "one-URI, one-Resource" philosophy to a 
> > > > > "one-URI, possibly
> > > > > one-Resource".  
> > > > 
> > > > Maybe, but that is yet another issue.
> > > 
> > > No!!!  If RDF is to have any support for URI schemes that 
> > > have a built-in
> > > semantics that maps different URIs into the same semantic 
> > > object, then it
> > > will HAVE to admit the possibility that different URIs map 
> > > into the same
> > > resource.  To do anything other is to be WRONG!  
> > 
> > Well, I'm not arguing here that this mapping shouldn't be done.
> >
> > But should it be at the RDF level specifically? Or is this something
> > that might be more effectively or optimally addressed in RDFS or
> > higher?
> > 
> > The problem with opening that Pandora's Box of understanding some
> > URIs, is that suddenly, to be fair, you must understand *all* URIs.
> > And since there are not (to my admitedly poor knowledge) any current
> > proposals about how the semantics of URI Schemes would be defined
> > in a generic, portable, and RDF-compatible manner, I don't see how
> > it benefits us to start shifting around fundamental pillars such
> > as URI opaqueness unless we are darn sure that there are 
> other pillars
> > first in place to hold things up, eh?
> > 
> > It is perhaps true that the "one URI, one resource" view could be
> > too simplistic or even naiive, and it likely warrants scrutiny, but
> > RDF also seems to get alot of milage out of it, so I don't see it
> > as being without value or at least some fundamental degree of
> > validity -- even if it needs to evolve in time to something more
> > comprehensive.
> 
> I think that you are not understanding my point.  If RDF is 
> going to be
> compatible with---not even understand, but just be compatible 
> with---any
> scheme that identifies URIs, then it cannot require that 
> different URIs
> denote different resources, or even require that different 
> literals denote
> different literal values.  Instead RDF has to admit the 
> possibility that
> different URIs denote the same thing.

I actually do think that I'm following you (though I could
be wrong ;-)  

I think we are simply having a conflict of terminology...

I don't believe that RDF precludes that two URIs would represent
the same "thing" -- it simply does not provide a built-in
mechanism for defining equivalence between those two URIs. Perhaps 
it should, but in any case, one should not ever lose the distinction 
between those different URIs (representing the same thing) at the
lowest nuts-n-bolts level (i.e. the graph), insofar as their separate 
participation in different statements is concerned -- as the statements 
may originate from specific contexts using specific vocabularies, and
that is information that should not be compromised as it may be
significant in one way or another to various applications.

After all, *you* may say that "5" and "05" are equivalent, but *I*
may disagree ;-)

The foundation layer of RDF should not force me to accept anyone's views
about anything, insofar as general statements about the universe are 
concerned, No?   RDF simply provides the framework within which to make
such statements and to evaluate such statements. The bottom layer
should be totally neutral with regards to *what* the URIs might
denote and any relationships between those URIs (or indirectly, 
between the things they denote).

Thus, I simply do not see URI equivalence as belonging at the
RDF layer, in the graph (though perhaps at the RDFS layer).

After all, we are really talking about synonyms in one or more
vocabularies. The relationship 'URI1 equivalentTo URI2' is of the same 
class of relations as 'URI1 subClassOf URI2' or 'URI1 subPropertyOf URI2',
no? URIs which encode resources such as typed data literals are an
interesting case, but not necessarily any different from any other
case of two or more URIs denoting the same "thing". It may be *significant*
that one application says 'int:5' and another says 'int:5.0' and
another says 'int:000005'. Or if you like: 'int:5', 'xsd:float:5.0', and
'foo:intPadTo4:0005', eh?

URIs are not themselves the "things" they denote (usually ;-). They are 
symbols which denote objects in an explicit symbol system. If two symbols 
denote the same object, then, sure, let's define them as equivalent -- but 
not at the loss of distinction between the two symbols themselves (as that 
distinction may be significant for some operations) -- and again,
equivalence
may be a contextualized opinion, not a "fact" of the universe.

I absolutely agree that if I have two statements

   #foo #hasValue int:5 .
   #foo #hasValue xsd:float:5.0 .

and I "know" that (i.e. have a set of rule that define that) 'int:5',
'xsd:float:5.0', and 'foo:intPadTo4:0005' all represent the same
"value", then a query corresponding to the statement "template":

   #foo #hasValue foo:intPadTo4(X)

should give me only one answer, and represented in terms of the
specified URI scheme, namely 'foo:intPadTo4:0005'.

Do such mechanisms belong at the core RDF layer? I don't think so.

Do such mechanisms belong somewhere in the official W3C RDF set of
standards such that everyone does it the same way, etc? Absolutely!
Likely in RDFS...

Do such mechanisms need to have any specific understanding of any
particular URI scheme?  Not at all.

Will it be easy to define such mechanisms in an efficient yet
generic manner in as low an RDF layer as possible and which
does not discriminate against any particular data type scheme or
URI scheme. Probably not ;-)  but I strongly feel that should be
our goal, and that such a goal is achievable with reasonable effort.

> > > ... Note also that the RDF model 
> > > theory does not
> > > embody the "one-URI, one-Resource" philosophy.  
> > 
> > I'm not sure I get this from the specs. Or is this part of the
> > recent work on "clarifying" ;-)  the specs?
> 
> This is the recent (excellent) RDF model theory that Pat Hayes has
> produced.  In this model theory there is no requirement that 
> different URIs
> denote different resources or that different literals denote different
> literal values.

But those URIs and literal values retain their full identity disjunct
from any other URIs or literal values, right?

Does this mean that in the (new?) RDF graph model, a resource node
can have multiple labels? 

If so, then how does one maintain the distinction between which synonymous
symbol denoting the resource participates in a particular statement about,
or referencing that resource? Hmmmm.....   Houston, we may have a problem...

Pat? Anyone? Comments?   Starting to worry here...   ;-)

> > > Note further 
> > > that if RDF
> > > retains the "one-URI, one-Resource" philosophy then 
> > > daml:equivalentTo is
> > > not very useful as an extension to RDF, as it will introduce
> > > inconsistencies unless applied to equal URIs.
> > 
> > I'm not a KR person, per se, so I'm on slippery ground here, but...
> > 
> > From the perspective of semiotics and the three way distinction
> > between objects, concepts, and symbols (a'la Sowa) does not 
> RDF deal with
> > symbols and not objects?
> > 
> > I.e. even if a given resource (object) can have multiple identities
> > (symbols/URIs), RDF itself is a symbol system, and since objects
> > themselves have no realization in RDF which could serve as an 
> > explicit point of intersection for equivalent symbols, if we wish to
> > treat multiple symbols as equivalent, we must explicitly define
> > that equivalence with mechanisms such as daml:equivalentTo.
> > 
> > It may very well be that two URLs dereference to the same 
> byte string,
> > but is that within the scope of the RDF conceptual model, or is that
> > a characteristic of the world that needs to modelled at a 
> higher level?
> 
> But if RDF requires that different URLs denote/dereference
> to/represent/... different things then extensions of RDF are 
> forever stuck
> with that decision.

True, but only at the level at which that distinction is maintained.
The distinction can become transparent at higher levels (possibly
the next immediatly higher level).

You have to keep separate the concepts of "node" and "some thing 
out in the universe represented by that node". These are not the
same.

A graph node is a point of reference for us to talk about "things" in
the universe (concrete or abstract). These nodes are named by
symbols (and URIs are adopted opaquely, simply for their quality
of being globally unique "for free"). Often, there arises synonymy
in the language(s) used to talk about things, but that doesn't mean
that the symbols (URIs) which are synonymous are either 'wrong' or
must be irretrievably merged in the graph such that we can't 
differentiate between them in the actual statements constituting our 
collective knowledge about the universe. 

In fact, it may be that those cases of synonymy are very "interesting"
in various ways (I guess my computational linguistics background is
leaking through here a bit ;-)

Some forms of symbol equivalence may be just semantically vacuous
lexical variation (e.g. "5" vs. "00005", etc.) and for practical
reasons, we should try to reduce or eliminate such cases; but
many forms of symbol equivalence will occur because more than one
vocabulary/language is being used to talk about the same "things",
and we cannot simply say that everyone has to use the same language.
And of course there are other sources for synonymy of symbols, e.g.
for human convenience -- such as abbreviations or shorthand notations
for fully normalized, but more cumbersome forms, etc. yet the
preservation of the variant form used may be very important (e.g.
for data mining, auto-analysis of user traits/preferences, etc).

We cannot lose the distinction between which language or which
specific symbols are used to say something about some "thing".

What has to exist are mechanisms for equating and otherwise relating
equivalent symbols -- and at *some* level of interaction with the
knowledge base, those equivalences should be fully transparent to 
applications (but yet still explicit and distinct at a lower level for
other applications that need to know the gory details).

Eh?

> > > > Or rather, its a question about at what functional layer
> > > > you wish to add that "patch" and address the URI equivalence
> > > > issue.
> > > 
> > > No, the problem is if you require that different URIs map 
> to different
> > > resources, then you can't patch the problem.  If you remain 
> > > uncommitted,
> > > then there is the possibility of a patch.
> > 
> > Uhhh, no. Again, it's not about not allowing that patch at 
> all, ever,
> > at any layer, no matter what -- but at *which* layer it *should* be 
> > applied. Being against it at one layer is not the same as 
> being against
> > it at any layer.
> 
> Again.  If you require that DIFFERENT URIs denote DIFFERENT things you
> cannot later turn around and allow any two different URIs to 
> denote the
> same thing.  You have already said that they denote different 
> things, and
> this would introduce a contradiction.

That's not what is being required (at least I don't think so ;-)

I said that different URIs denote different resources, with 'resource'
meaning a node in an RDF graph, not a "web resource" or other "thing"
in the universe. I probably was not sufficiently clear in that regard.

Different URIs can denote the same "thing", they just can't denote
the same node in the graph (according to my present, though possibly
flawed understanding of the RDF conceptual model) -- but any 
equivalence between multiple URIs that denote the same "thing"
needs to be defined in terms of the graph, not in the fundamental
realization of the graph itself.
 
> > I'm not against such a URI equivalence mechanism, I just 
> don't (at the
> > moment) see it belonging at the fundamental RDF layer. 
> Maybe it *does*
> > belong there, but I'm not convinced. 
> 
> This has nothing to do with whether RDF has any mechanism for 
> equivalence
> or difference.  It has to do with whether anything built on 
> top of RDF can
> ever have a non-trivial theory of URI or literal equivalence.

I just don't see how maintaining the distinction between different
symbols (URIs) in the graph in any way precludes having a set of
(standardized, possibly official RDF) mechanisms which allow a
"non-trivial theory of URI or literal equivalence".

In fact, preserving the integrity of the "one URI, one resource node"
principle allows for more than one such theory to be defined and
employed for RDF encoded knowledge -- making the core RDF standard
more future proof and useful to different communities who may view
such matters differently.

Again, it's not about *not* providing such mechanisms or solutions,
but whether they belong at the "bottom" core layer -- i.e. in the
graph model itself rather than a layer defined in terms of the graph
model.

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Nokia Research Center                 Fax:    +358 7180 35409
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
Received on Wednesday, 3 October 2001 03:25:36 UTC