RE: Literals (Re: model theory for RDF/S) from Pat Hayes on 2001-10-04 (www-rdf-logic@w3.org from October 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Thu, 4 Oct 2001 18:21:53 -0500
To: Patrick.Stickler@nokia.com
Cc: www-rdf-logic@w3.org
Message-Id: <p0510100db7e29989c76e@[205.160.76.189]>
>  > >But that would really have to be defined as a more general rule than
>>  >just statments about specific instances of integers, otherwise, we'd
>>  >have to define an infinite number of statements to be sure they all
>>  >were known to any system trying to ask, is this resource "odd", etc.
>>
>>  No, you wouldn't, that's the point. If we can define classes of
>>  literal values in terms of properties of the literals themselves,
>>  then we have, potentially, a new way to determine membership in a
>>  class. Right now, the only way to determine membership in anything in
>>  RDF is to check to see if there are assertions in the graph which
>>  entail an explicit assertion that the thing is a member of the class
>>  (or container, etc.). But to find out if some literal is in
>>  odd-integer, you don't even need to look at the RDF graph, you just
>>  need to do divide it by 2 and see if theres a remainder. You can get
>>  infinitely much assertional bang for your buck.
>
>Right. I.e., we're not going to make a potentially
>infinite set of statements like
>
>   int:1 rdf:type foo:odd .
>   int:3 rdf:type foo:odd .
>   int:5 rdf:type foo:odd .
>   ...
>
>But rather capture the essence of "oddness" via a function just
>as you describe.
>
>But one would still like to know that a given literal value is
>an integer to be sure that the application of that function is
>valid.

True.

>  >
>>  >I.e., rather than define explicit facts, we would define functions
>>  >from which we could infer the facts implicitly only as needed.
>>  >
>>  >>  ... If RDF had the ability to
>>  >>  assert properties of literals, the expressive power of
>>  the language
>>  >>  would be quite radically increased.
>>  >
>>  >Well, I guess I'm not sure that that is the case with literals
>>  >given a URI representation -- as then the literals are no longer
>>  >literals but resources,
>>
>>  No, wait: they are resources, but they are still the values of
>>  literals, and that is (well, might be) enough to give us a different
>>  kind of handle on them.
>
>I'm not sure I quite follow what you mean by "resources being the
>values of literals" here. A property value (object) is either a
>literal or a resource, and if we say 'int:5' instead of "5" then
>the value is a resource, not a literal.

I understand 'literal' to be a syntactic category, eg a numeral might 
be a kind of literal, a character string might be another. Literals 
are used as labels in an RDF graph, they are on a par with URIs. They 
are a species of name (rather a special kind of name, but...). 
Literal values and resources are the things named by literals and 
URIs respectively; the literal values named by numerals are integers, 
for example. (It is possible that some kinds of literals might be 
their own literal values, eg character strings might be described 
this way; that is permitted, but not mandated.) . However, the model 
theory is deliberately agnostic as to the 'true nature' of resources 
and literal values, and in particular it does not insist that 
resources and literal values must be disjoint sets. It is quite 
reasonable to have a URI denote an integer, for example, seems to me, 
even if there are also numerals used as literals elsewhere in the 
graph. So a literal value may well also be a resource in the universe 
IR.  Now, one may want to forbid such overlapping of the semantic 
spaces, and treat URIs and literals as strongly typed; that is done 
in DAML+OIL, and the RDF model theory permits this as an extension. 
If you do assume that, however, it is possible to utter falsehoods in 
RDFS, eg the following triple is an explicit denial of such strong 
typing:

_:xxx rdf:type rdfs:Literal .


>
>Whether a given RDF based application applies some interpretation
>to either representation is outside the scope of the MT, right?
>In either case, they are just opaque logical constants -- though
>of different types.

No, not necessarily of different types. That is left open.

>
>So, if I choose to only use resource (URI) values in all of my
>RDF graphs, even for typed data values that other folks might
>represent as literals, I've not required any changes to the
>current MT semantics, right.

No changes, but you have made an extra assumption. The MT in its 
present form would permit interpretations that violate this rule, so 
you would need to impose some extra semantic conditions to enforce it 
somehow, and that will change the notion of entailment.

>A resource is a resource is a
>resource, and "no assumptions are made about the nature of
>resources". Right? So such a treatment would not require any
>increase in the expressive power of the language.

Well, it would if you wanted to insist on that constraint and have 
that insistence reflected in the notion of valid consequence. The MT 
permits many things that it does not enforce. It can be extended to 
enforce them, but one does need to state the extension conditions 
precisely; the MT does not do it for free. Think of it as a kind of 
foundation on which you can build different kinds of semantic 
buildings. Even as a foundation it isn't perfect. (Parts of it have 
no rebar and can only support a certain limited weight, and there are 
still a few  cracks.)

>
>>  >so how does that actually increase the
>>  >expressive power of RDF, per se -- since the semantics of the
>>  >data typing is in the URI and RDF doesn't understand the semantics
>>  >of URI schemes...?
>>
>>  You keep assuming that literals have already been assimilated to
>>  URIs, and I'm trying to point out that doing that loses something
>>  important. Sure, once you've lost it, its not there.
>
>I'm not making that assumption. Have literals if you like. I was
>simply trying to clarify that if I use resources where otherwise
>I might have used literals, and there exist no literals in any
>of my graphs, then those portions of the MT concerning literals
>are not relevant to my graphs (which have no literals).

Ah, I see. Yes, if I follow you here, that is correct and can be 
proved in the current MT (under a few reasonable assumptions, eg that 
the graph is finite)

>
>Taking that a step further, if everyone used resources rather
>than literals, then, logically, there would not be any need for
>special treatment of literals in the MT. Right?
>
>*** Disclaimer: I'm not proposing doing away with literals! Though
>*** I'm perhaps using that as a way to explore the relation between
>*** URI labled resources and literal "resources" in general.

Oh, go on, propose it. Someone needs to start the ball rolling :-)

>
>What I'm not clear on is what you feel is being lost by using
>a URI to represent a typed data value rather than a literal.
>I.e., how does using an approach such as '#a #b int:5 .' lose
>anything in comparison to '#a #b "5" .' per se?  Does not the
>URI form provide greater potential for defining or inferring
>knowledge about the value?

I may simply have not been following the point properly. Coming from 
logic, I have an acute sense of the difference between information 
which is conveyed as part of the very syntax of a language, and that 
conveyed by making assertions in the language. This seems like a very 
sharp and important distinction to me. My understanding of the 
proposal was that the syntactic encoding of, say, integers implicit 
in the notion of literal was to be abandoned and replaced by an 
assertional encoding in RDF triples. That may be a good idea, but it 
does potentially throw away a lot of valuable properties implicit in 
the syntactic typing of literals. However, if this proposal is better 
thought of as one to introduce a more uniform notion of syntactic 
typing for URIs in general, then I'm all for it. Sorry if my 
ignorance is a barrier to communication.

>Granted, in order to have "extra" knowledge about the actual
>data type used, we need to either interpret the URI scheme (which
>is outside the scope of the MT) or employ mechanisms such as
>the (now unfortunately deprecated) rdf:aboutEachPrefix, e.g.

If aboutEachPrefix was only used in this way it wouldnt be so bad, 
but it got all mixed up with containers.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Thursday, 4 October 2001 19:22:02 UTC