RE: Literals (Re: model theory for RDF/S)

From: Patrick.Stickler@nokia.com
Subject: RE: Literals (Re: model theory for RDF/S)
Date: Wed, 3 Oct 2001 10:25:04 +0300 

> > -----Original Message-----
> > From: ext Peter F. Patel-Schneider 
> > [mailto:pfps@research.bell-labs.com]
> > Sent: 02 October, 2001 22:52
> > To: Stickler Patrick (NRC/Tampere)
> > Cc: drew.mcdermott@yale.edu; www-rdf-logic@w3.org
> > Subject: RE: Literals (Re: model theory for RDF/S)
> > 
> > 
> > From: Patrick.Stickler@nokia.com
> > Subject: RE: Literals (Re: model theory for RDF/S)
> > Date: Tue, 2 Oct 2001 22:19:03 +0300 
> > 
> > > > > > It is true that you can make a consistent view of all 
> > > > this from this
> > > > > > ``RDF'' viewpoint, but you do have to be a bit careful.  In 
> > > > > > particular, if
> > > > > > you want to allow RDF to be consistent with different URI 
> > > > > > schemes, you have
> > > > > > to modify the "one-URI, one-Resource" philosophy to a 
> > > > > > "one-URI, possibly
> > > > > > one-Resource".  
> > > > > 
> > > > > Maybe, but that is yet another issue.
> > > > 
> > > > No!!!  If RDF is to have any support for URI schemes that 
> > > > have a built-in
> > > > semantics that maps different URIs into the same semantic 
> > > > object, then it
> > > > will HAVE to admit the possibility that different URIs map 
> > > > into the same
> > > > resource.  To do anything other is to be WRONG!  
> > > 
> > > Well, I'm not arguing here that this mapping shouldn't be done.
> > >
> > > But should it be at the RDF level specifically? Or is this something
> > > that might be more effectively or optimally addressed in RDFS or
> > > higher?
> > > 
> > > The problem with opening that Pandora's Box of understanding some
> > > URIs, is that suddenly, to be fair, you must understand *all* URIs.
> > > And since there are not (to my admitedly poor knowledge) any current
> > > proposals about how the semantics of URI Schemes would be defined
> > > in a generic, portable, and RDF-compatible manner, I don't see how
> > > it benefits us to start shifting around fundamental pillars such
> > > as URI opaqueness unless we are darn sure that there are 
> > other pillars
> > > first in place to hold things up, eh?
> > > 
> > > It is perhaps true that the "one URI, one resource" view could be
> > > too simplistic or even naiive, and it likely warrants scrutiny, but
> > > RDF also seems to get alot of milage out of it, so I don't see it
> > > as being without value or at least some fundamental degree of
> > > validity -- even if it needs to evolve in time to something more
> > > comprehensive.
> > 
> > I think that you are not understanding my point.  If RDF is 
> > going to be
> > compatible with---not even understand, but just be compatible 
> > with---any
> > scheme that identifies URIs, then it cannot require that 
> > different URIs
> > denote different resources, or even require that different 
> > literals denote
> > different literal values.  Instead RDF has to admit the 
> > possibility that
> > different URIs denote the same thing.
> 
> I actually do think that I'm following you (though I could
> be wrong ;-)  
> 
> I think we are simply having a conflict of terminology...
> 
> I don't believe that RDF precludes that two URIs would represent
> the same "thing" -- it simply does not provide a built-in
> mechanism for defining equivalence between those two URIs. 

Fine.

> Perhaps 
> it should, but in any case, one should not ever lose the distinction 
> between those different URIs (representing the same thing) at the
> lowest nuts-n-bolts level (i.e. the graph), insofar as their separate 
> participation in different statements is concerned -- as the statements 
> may originate from specific contexts using specific vocabularies, and
> that is information that should not be compromised as it may be
> significant in one way or another to various applications.

This is a bit less fine, but perfectly reasonable.  (To see how it is a bit
less fine, you need to look at the data model for XML, which attempts to
keep all information from the input files, even whitespace.  You have to be
careful not to retain ``bits'' that have no meaning.)

> After all, *you* may say that "5" and "05" are equivalent, but *I*
> may disagree ;-)

Sure.

> The foundation layer of RDF should not force me to accept anyone's views
> about anything, insofar as general statements about the universe are 
> concerned, No?   RDF simply provides the framework within which to make
> such statements and to evaluate such statements. The bottom layer
> should be totally neutral with regards to *what* the URIs might
> denote and any relationships between those URIs (or indirectly, 
> between the things they denote).
> 
> Thus, I simply do not see URI equivalence as belonging at the
> RDF layer, in the graph (though perhaps at the RDFS layer).
> 
> After all, we are really talking about synonyms in one or more
> vocabularies. The relationship 'URI1 equivalentTo URI2' is of the same 
> class of relations as 'URI1 subClassOf URI2' or 'URI1 subPropertyOf URI2',
> no? URIs which encode resources such as typed data literals are an
> interesting case, but not necessarily any different from any other
> case of two or more URIs denoting the same "thing". It may be *significant*
> that one application says 'int:5' and another says 'int:5.0' and
> another says 'int:000005'. Or if you like: 'int:5', 'xsd:float:5.0', and
> 'foo:intPadTo4:0005', eh?

This again is fine, but, again, is getting a bit close to retaining
irrelevant bits.  If RDF supports URI schemes that have a built-in notion
of equality, such as the hypothetical(?) int, then int:5 and int:05 are the
same, just as < and &lt; are the same when they appear in unicode strings
(assuming that they indeed both represent <, that is, which I am not sure
of, not knowing very much about unicode).

> URIs are not themselves the "things" they denote (usually ;-). They are 
> symbols which denote objects in an explicit symbol system. If two symbols 
> denote the same object, then, sure, let's define them as equivalent -- but 
> not at the loss of distinction between the two symbols themselves (as that 
> distinction may be significant for some operations) -- and again,
> equivalence
> may be a contextualized opinion, not a "fact" of the universe.

OK, now we are getting into the nuts and bolts of systems, not logics.  As
far as (most) logics are concerned, if two symbols denote the same thing,
then there is no way of determining which one was used, from within the
logic.  However, it is possible to design a system that allows access to
several interfaces, one to the logical level and one to the symbolic
level.  At the logical level, there may be no way of determining which
symbol is being used, to the point that queries should probably return sets
of symbols in many cases, and not just single symbols.  At the symbolic
level, there may indeed be a difference between the two symbols.  There may
also be other levels, such as a proof-theoretic level, where the
distinction also makes sense.

> I absolutely agree that if I have two statements
> 
>    #foo #hasValue int:5 .
>    #foo #hasValue xsd:float:5.0 .
> 
> and I "know" that (i.e. have a set of rule that define that) 'int:5',
> 'xsd:float:5.0', and 'foo:intPadTo4:0005' all represent the same
> "value", then a query corresponding to the statement "template":
> 
>    #foo #hasValue foo:intPadTo4(X)
> 
> should give me only one answer, and represented in terms of the
> specified URI scheme, namely 'foo:intPadTo4:0005'.

See above.

> Do such mechanisms belong at the core RDF layer? I don't think so.
>
> Do such mechanisms belong somewhere in the official W3C RDF set of
> standards such that everyone does it the same way, etc? Absolutely!
> Likely in RDFS...
> 
> Do such mechanisms need to have any specific understanding of any
> particular URI scheme?  Not at all.
> 
> Will it be easy to define such mechanisms in an efficient yet
> generic manner in as low an RDF layer as possible and which
> does not discriminate against any particular data type scheme or
> URI scheme. Probably not ;-)  but I strongly feel that should be
> our goal, and that such a goal is achievable with reasonable effort.

I agree, except, perhaps, with not placing any of this stuff at the RDF
layer.  

> > > > ... Note also that the RDF model 
> > > > theory does not
> > > > embody the "one-URI, one-Resource" philosophy.  
> > > 
> > > I'm not sure I get this from the specs. Or is this part of the
> > > recent work on "clarifying" ;-)  the specs?
> > 
> > This is the recent (excellent) RDF model theory that Pat Hayes has
> > produced.  In this model theory there is no requirement that 
> > different URIs
> > denote different resources or that different literals denote different
> > literal values.
> 
> But those URIs and literal values retain their full identity disjunct
> from any other URIs or literal values, right?

URIs and literals are on the ``symbol layer'' and thus retain their
identity on that layer even if they denote the same resource or literal
value.  However in the model theory, there is no way of determining which
URI was used reference a resource, which is how it should be.  

> Does this mean that in the (new?) RDF graph model, a resource node
> can have multiple labels? 

A graph node is also in the symbol layer, and thus has a single label.  A
resource, on the other hand, is in the logical layer.  Many graph nodes, or
one, or none, can map to a resource.

> If so, then how does one maintain the distinction between which synonymous
> symbol denoting the resource participates in a particular statement about,
> or referencing that resource? Hmmmm.....   Houston, we may have a problem...

At the logical layer you can't.   At other layers you may be able to.

> Pat? Anyone? Comments?   Starting to worry here...   ;-)
> 
> > > > Note further 
> > > > that if RDF
> > > > retains the "one-URI, one-Resource" philosophy then 
> > > > daml:equivalentTo is
> > > > not very useful as an extension to RDF, as it will introduce
> > > > inconsistencies unless applied to equal URIs.
> > > 
> > > I'm not a KR person, per se, so I'm on slippery ground here, but...
> > > 
> > > From the perspective of semiotics and the three way distinction
> > > between objects, concepts, and symbols (a'la Sowa) does not 
> > RDF deal with
> > > symbols and not objects?
> > > 
> > > I.e. even if a given resource (object) can have multiple identities
> > > (symbols/URIs), RDF itself is a symbol system, and since objects
> > > themselves have no realization in RDF which could serve as an 
> > > explicit point of intersection for equivalent symbols, if we wish to
> > > treat multiple symbols as equivalent, we must explicitly define
> > > that equivalence with mechanisms such as daml:equivalentTo.
> > > 
> > > It may very well be that two URLs dereference to the same 
> > byte string,
> > > but is that within the scope of the RDF conceptual model, or is that
> > > a characteristic of the world that needs to modelled at a 
> > higher level?
> > 
> > But if RDF requires that different URLs denote/dereference
> > to/represent/... different things then extensions of RDF are 
> > forever stuck
> > with that decision.
> 
> True, but only at the level at which that distinction is maintained.
> The distinction can become transparent at higher levels (possibly
> the next immediatly higher level).
> 
> You have to keep separate the concepts of "node" and "some thing 
> out in the universe represented by that node". These are not the
> same.

Yes, but this is not the point.  The point is that if the denotation
mapping is injective at the lowest level, there is no way of changing that
at upper levels.   This is very similar to what would happen if the RDF
model theory mandated that IR and LV were disjoint, or mandated that LV was
a subset of IR.  No upper level could undo this.

> A graph node is a point of reference for us to talk about "things" in
> the universe (concrete or abstract). These nodes are named by
> symbols (and URIs are adopted opaquely, simply for their quality
> of being globally unique "for free"). Often, there arises synonymy
> in the language(s) used to talk about things, but that doesn't mean
> that the symbols (URIs) which are synonymous are either 'wrong' or
> must be irretrievably merged in the graph such that we can't 
> differentiate between them in the actual statements constituting our 
> collective knowledge about the universe. 
> 
> In fact, it may be that those cases of synonymy are very "interesting"
> in various ways (I guess my computational linguistics background is
> leaking through here a bit ;-)
> 
> Some forms of symbol equivalence may be just semantically vacuous
> lexical variation (e.g. "5" vs. "00005", etc.) and for practical
> reasons, we should try to reduce or eliminate such cases; but
> many forms of symbol equivalence will occur because more than one
> vocabulary/language is being used to talk about the same "things",
> and we cannot simply say that everyone has to use the same language.
> And of course there are other sources for synonymy of symbols, e.g.
> for human convenience -- such as abbreviations or shorthand notations
> for fully normalized, but more cumbersome forms, etc. yet the
> preservation of the variant form used may be very important (e.g.
> for data mining, auto-analysis of user traits/preferences, etc).
> 
> We cannot lose the distinction between which language or which
> specific symbols are used to say something about some "thing".
> 
> What has to exist are mechanisms for equating and otherwise relating
> equivalent symbols -- and at *some* level of interaction with the
> knowledge base, those equivalences should be fully transparent to 
> applications (but yet still explicit and distinct at a lower level for
> other applications that need to know the gory details).
> 
> Eh?

No.  At some level you may want to keep some of this.  However, I would be
very upset if RDF required one to keep all of the ``bits'' in the input
file.  If the RDF model has any advantage over the XML model it has to do
with the lack of a requirement to be able reconsitute the entire input.

> > > > > Or rather, its a question about at what functional layer
> > > > > you wish to add that "patch" and address the URI equivalence
> > > > > issue.
> > > > 
> > > > No, the problem is if you require that different URIs map 
> > to different
> > > > resources, then you can't patch the problem.  If you remain 
> > > > uncommitted,
> > > > then there is the possibility of a patch.
> > > 
> > > Uhhh, no. Again, it's not about not allowing that patch at 
> > all, ever,
> > > at any layer, no matter what -- but at *which* layer it *should* be 
> > > applied. Being against it at one layer is not the same as 
> > being against
> > > it at any layer.
> > 
> > Again.  If you require that DIFFERENT URIs denote DIFFERENT things you
> > cannot later turn around and allow any two different URIs to 
> > denote the
> > same thing.  You have already said that they denote different 
> > things, and
> > this would introduce a contradiction.
> 
> That's not what is being required (at least I don't think so ;-)
> 
> I said that different URIs denote different resources, with 'resource'
> meaning a node in an RDF graph, not a "web resource" or other "thing"
> in the universe. I probably was not sufficiently clear in that regard.

Please read the model theory.  Resource has a particular meaning there.

> Different URIs can denote the same "thing", they just can't denote
> the same node in the graph (according to my present, though possibly
> flawed understanding of the RDF conceptual model) -- but any 
> equivalence between multiple URIs that denote the same "thing"
> needs to be defined in terms of the graph, not in the fundamental
> realization of the graph itself.

Again, this does not make sense in the model theory.  URIs label nodes,
nodes denote (map into via I) resources or literal values.

> > > I'm not against such a URI equivalence mechanism, I just 
> > don't (at the
> > > moment) see it belonging at the fundamental RDF layer. 
> > Maybe it *does*
> > > belong there, but I'm not convinced. 
> > 
> > This has nothing to do with whether RDF has any mechanism for 
> > equivalence
> > or difference.  It has to do with whether anything built on 
> > top of RDF can
> > ever have a non-trivial theory of URI or literal equivalence.
> 
> I just don't see how maintaining the distinction between different
> symbols (URIs) in the graph in any way precludes having a set of
> (standardized, possibly official RDF) mechanisms which allow a
> "non-trivial theory of URI or literal equivalence".
> 
> In fact, preserving the integrity of the "one URI, one resource node"
> principle allows for more than one such theory to be defined and
> employed for RDF encoded knowledge -- making the core RDF standard
> more future proof and useful to different communities who may view
> such matters differently.

Again, you need to use the correct terminology.

> Again, it's not about *not* providing such mechanisms or solutions,
> but whether they belong at the "bottom" core layer -- i.e. in the
> graph model itself rather than a layer defined in terms of the graph
> model.

No.  It is about providing a flexible ``bottom'' core layer, i.e., one that
can support various upper layers.

> Cheers,
> 
> Patrick


Peter F. Patel-Schneider

Received on Wednesday, 3 October 2001 06:11:32 UTC