RE: Literals (Re: model theory for RDF/S) from Patrick.Stickler@nokia.com on 2001-10-08 (www-rdf-logic@w3.org from October 2001)

From: <Patrick.Stickler@nokia.com>
Date: Mon, 8 Oct 2001 07:42:38 +0300
To: conen@gmx.de, phayes@ai.uwf.edu
Cc: www-rdf-logic@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C019@trebe003.NOE.Nokia.com>
> -----Original Message-----
> From: ext Wolfram Conen [mailto:conen@gmx.de]
> Sent: 05 October, 2001 17:06
> To: Stickler Patrick (NRC/Tampere); phayes@ai.uwf.edu
> Cc: www-rdf-logic@w3.org
> Subject: Re: Literals (Re: model theory for RDF/S)
> 
> 
> Patrick, Pat, and interested readers,
> 
> allow me to add a tiny aspect to the voluminous and interesting
> discussion led by you and others.

Fire away...
 
> Patrick.Stickler@nokia.com wrote:
> > 
> > ...
> > A string is a string is a string and the only thing we can
> > deduce from it is that it is a sequence of bytes. Whether
> > that sequence of bytes corresponds to a lexical representation
> > of a numerical magnitude which we wish to classify as 'integer'
> > is a matter of interpretation of the string. No such knowledge
> > exists 'inherent' in the string itself, explicitly or
> > implicitly.
> > 
> > So, any data type knowledge about a given literal value *must*
> > be asserted in one way or another externally from that literal
> > itself. Either by an anonymous node construct such as
> > 
> >    <rdf:Description>
> >       <rdf:value>5</rdf:value>
> >       <rdf:type rdf:resource="#integer"/>
> >    </rdf:Description>
> > 
> > ...
> > 
> > ... one has to take deliberate steps to define
> > the data typing in a way that can be utilized as knowledge
> > explicit within the graph...
> 
> 
> Hm, doesn't that allow for a nice and RDFish solution to the 
> problem of
> attaching properties to literals that can also be carried to the
> semantics? Ok, let me try to demonstrate how this may solve 
> the problem
> of "Literals can not be in the subject position". 
> 
> To start, let me add an ID to the RDF expression above (to ease
> reference to the RDF spec mainly, and to avoid the otherwise surfacing
> necessity to explain why I do not buy into the treatment of anonymous
> nodes in the MT (short for Pat's RDF model theory))
> 
>    <rdf:Description rdf:ID="five">
>        <rdf:value>5</rdf:value>
>    </rdf:Description>
> 
> 
> The RDF spec says the following with respect to ID'd resources: 
> 
> "Such a resource might be a surrogate, or proxy, for some other
> physical resource that does not have a recognizable URI."
> 
> Let's try to capture this spirit.
> 
> Now, I will fix the interpretation of Literals as follows: if E is a
> literal, I(E) = E (or XL(E) if we fix XL to be E ;). (Pat, I need some
> more emails to explain my motivation, so don't be too furious 
> about this
> now - the motivation is somewhat like the following: it considers
> "literals" as the Urmutter of resources [in fact the only 
> type of thing
> that can be given a name in an RDF graph AND whose "content" is also
> available ("literally") immediately], it is a way to 
> communicate (parts
> of) the mapping between resource names (URIs) and their 
> actual "content"
> from within RDF documents, which allows to ground (parts of) RDF in an
> "processible" universe. Ah, ok, much explanation is still 
> missing here,
> but this is not the place to further advance this.)
> 
> Now, by giving the literal an "name", it is "promoted" to a 
> resource (in
> the sense of RDF: a resource is something that we have given an URI
> name). We can now use the resource name in the RDF graph like 
> any other
> resource name - but more, we can even substitute the content for the
> name along the extension of rdf:value (which we will first have to
> capture appropriately) in the interpretation (and, in this 
> respect, the
> "literal" resource becomes a pretty "real" part of RDF statements). I
> will do this in a two stage process:
> 
> Given a tidy, ground RDF graph E where all nodes that have outgoing
> edges are labeled with URIs and all other nodes with URIs or Literals.
> 
> First (give an interpretation I' for the part of E that is 
> relevant for
> rdf:value)
> * map literals l to itself with defining XL'(l) = l.
> * map URIs u to itself with defining IS'(u) = u. Set IR' to be the set
> of all URIs in the graph, and LV' the set of all literals in 
> the graph.)
> * Define IEXT' for rdf:value as follows: IEXT'(I(rdf:value)) =
> IEXT'(rdf:value) = {<x,y>  |  <x rdf:value y> is in E }  
> (should be done
> with labeled edges, triple notation considered a shorthand). Note:
> assume that the inverse of rdf:value is injective.
> * Determine I' as in the MT, with IS' and XL' instead of IS/XL (and
> IEXT' restricted to rdf:value instead of IEXT)
> 
> 
> Second (give an (partially fixed) interpretation for E))
> * continue to map literals to itself (XL(l) = l).
> * construct from the extension of rdf:value a mapping IS as follows:
> IS(x) = y iff exists <x,y> in IEXT'(rdf:value), otherwise not fixed as
> usual. (hm,  one may want to through out the (already "used")
> rdf:value-related stuff from graph and interpretation because 
> reduces to
> identidy and does not seem to be anymore useful)
> * Determine I as in MT.
> 
> 
> Now, the triples resulting from 
>     <rdf:Description rdf:ID="five">
>        <rdf:value>5</rdf:value>
>        <rdf:type rdf:resource="#integer"/>
>     </rdf:Description>,
> i.e. (leaving out reifications)
> 	<five rdf:value "5">
> 	<five rdf:type #integer>
> will have a fixed part in each interpretation, let us denote this as
> follows (to make it obvious)
> 	<"5" I(rdf:type) I(#integer)>   
> (that is: the above triple set resp. the corresponding 
> subgraph becomes
> true, if <"5" I(#integer)> is in the extension of I(rdf:type) for a
> given interpretation I.
> 
> 
> Ok, now we can "RDFishly" (IMHO) say what we want about 
> literals (and it
> even has some sort of "accesible meaning ;). Ah, sorry, looks ugly and
> does probably not communicate my intentions perfectly - but maybe it
> adds a hopefully not too unreasonable opinion to the spectrum.

The approach you outline above, at first glance, seems comparable in
functionality to the URI encoding of literals, in that we end up with an
actual URI labeled resource for every literal value which can participate
as the subject of statements, etc.  However, it has a few drawbacks which
would IMO not make it preferable to (in place of) simple URI encoding of 
literals:

1. It does not scale to infinite data type sets. The advantage of using
a representation such as a URI is that it can have defined for it a
standardized set of interpretation rules which apply to all instances,
and which are known irrespective of which intances are encountered.
To use the methodology suggested above would require that we both know
about all possible encountered integer values before hand and define the
needed semantics for each one. Clearly that is not possible in practice.

2. A URI encoding is far more concise -- both in the syntax and the graph, 
albeit at the expense of some knowledge implicit in the URI scheme itself
(though per #1 above, this is unnavoidable), making it more tractable for 
human consumption

3. The extra level of indirect resource labels for literals (the URIs) are 
grounded in an explicit URI scheme which has a single, standardized
definition 
and thereby provides a point of intersection for the broad, global
interchange 
of knowledge using that scheme, rather than every Dick, Jane, and Mary
rolling 
their own names to associate additional semantics with literal values.

Granted, one could see the named node approach as above as implicitely
present
in any URI encoded literal -- such that one could envision a URI-aware
processor
working in conjunction with the RDF parser, defining for each URI literal
seen
on input exactly such a set of explicit statements about the URI literal
based
on the known URI scheme to alleviate the need for any higher level "native"
knowledge regarding such URI schemes. E.g.

   "int:5"  -->  <rdf:Description rdf:about="int:5"
                    <rdf:value>5</rdf:value>
                    <rdf:type rdf:resource="#integer"/>
                 </rdf:Description>

The URI encoding provides global consistency. The automated expansion of
implicit
to explicit knowledge per Wolfram's suggestions maintains a fully URI
ambivalent
processing space compatible with any "generic" RDF application.

The two pieces of the overall solution which are "new" to (and external to,
I will
stress) RDF are (a) the standardized URI scheme(s) and their known lexical
form and 
semantics, and (b) the system component which, for an explicit body of
knowledge
(at some point in processing) expands the implicit knowledge embodied in the
URI
instances and URI scheme into explicit, asserted statements.

These two components could be bundled in what would be a methodology for
strongly
typed data literals, and with a well defined ontology for defining the
semantics
of those URI schemes, the expansion component could be highly genericized
and
suitable for use with any arbitrary URI scheme defined in terms of that
ontology,
providing a flexible and portable solution all around.

Eh?

Cheers,

Patrick
Received on Monday, 8 October 2001 00:42:46 UTC