Re: Literals

> From: Patrick.Stickler@nokia.com
> Subject: RE: Literals (Re: model theory for RDF/S)
> Date: Tue, 2 Oct 2001 13:09:35 +0300 
> 
> > > -----Original Message-----
> > > From: ext Drew McDermott [mailto:drew.mcdermott@yale.edu]
> > > Sent: 02 October, 2001 00:26
> > > To: www-rdf-logic@w3.org
> > > Subject: Re: Literals (Re: model theory for RDF/S)
> > > 
> > > 
> > > 
> > >    [Patrick Stickler]
> > >    > > I would myself love to see a data type URI approach by which 
> > >    > > otherwise "literal" values could be defined as instances of a 
> > >    > > given data type URI. E.g.
> > >    > > 
> > >    > >    dt:integer:5
> > >    > >    dt:token:en
> > >    > >    dt:date:2001-09-27
> > >    > >    dt:time:2000-11-01T17:32:20Z
> > >    > >    dt:float:38829.11883292
> > >    > >    ...
> > >    > 
> > >    > So would I...
> > > 
> > >    Anyone else think this would be a good idea to pursue?
> > > 
> > > Yes, although it's not clear to me how we are to interpret
> > > dt:date:2001-09-27.  Is the idea that 'date' is a resource (namespace?
> > > URI?) identifying a convention for how the literal is to be parsed and
> > > internalized?
> > 
> > Basically yes. It follows a similar, though not exactly equivalent,
> > approach to e.g. defining how a urn:doi:10.9882 or isbn:#### URI 
> > would be structured and interpreted.
> 
> [..]
> 
> > [...] The idea is to eliminate the need for the concept of literals
> > entirely from RDF, such that *everything* is a resource, period. How
> > any particular resource is interpreted, the semantics associated with
> > any class of resource, and the constraints placed on the identifier
> > schemes use to represent any given resource can then all be handled
> > in a consistent manner -- and, the serialization can be simplified since
> > no syntactic forms for differentiating between "literals" and resources,
> > nor mapping logic from those serializations (or condensed forms of
> > serializations) to graph representations are necessary.
> > 
> > More simplicity and consistency in both the semantics and syntax.
> 
> I am unclear as to how this proposal would provide more simplicity or
> consistency in either the semantics or the syntax of RDF.  What I see in
> this proposal is a method for providing a general mechanism for providing
> special cases for RDF.  An RDF processor would have to understand, and
> parse, all sorts of different syntax.
> 
> Consider the situation with a hypothetical integer scheme.  If an RDF
> processor is given
> 
> int:5 #loves #Susan .
> 
> and 
> 
> int:05 #loves #Jackie .
> 
> then it has to understand that int:5 and int:05 are the same
> and respond to a query about the loves of 5 that it #loves both #Susan and
> #Jackie.
> 
> Similarly, consider the situation with a hypothetical scheme for web
> pages.  These are supposed to represent actual web pages, not URIs.  If an
> RDF processor is given  
> 
> wp://www-db.research.bell-labs.com/user/pfps #loves #Susan .
> 
> and
> 
> wp://db.bell-labs.com/user/pfps #loves #Jackie .
> 
> then an RDF processor has to understand that these two are the same web
> page and respond that either one #loves both #Susan and #Jackie,
> independant of whether RDF treats different URIs that map to the ``same''
> place as the same.  However, given 
> 
> wp://research.bell-labs.com/user/pfps #loves #Sandy .
> 
> it has to understand that this is a different web page (even though,
> suppose, it has the same content as the previous web page).
> 
> As far as I can see, no matter how you do it, any scheme for providing
> different semantic domains, be they integers or whatever, will require
> special purpose parsing and special purpose understanding in RDF.  The
> situation only becomes more complex in more-powerful representation
> systems.

And how else could we address this?  I see at least two clear and
simple approaches:

1) Have only byte-string literals in the RDF graph, as nodes denoting
   finite sequences of integers in the range 0...255.  Staying away
   from characters makes equality clear.  If you want to encode a date
   or a number or a character string in a byte-string-literal, you
   must say so in the RDF graph.  For example

      :ThirdAnnualMeetingOfPhilosophersGuild time:startTime _:t
      _:t time:EncodingAsUnixTimeOrderedInDecendingSignificance 0x3bb9acbf 

Alternatively, we can take avoid literals in the RDF graph:

2) Say that "literals" are a technique which RDF encoding languages
   can use to convey some kinds of information in a compact manner,
   but that the RDF graph only represents information with agreed-upon
   symbols.  For example in some RDF language with integer literals,
   we might say

       :Jim :ageInYears 3

   That would parse to an RDF graph without literals: the graph would
   express the number 3 in terms of some standard RDF symbols, such as
   the Peano-style "numbers:zero" and "numbers:nextGreaterInteger":

       :Jim :ageInYears _:x
       numbers:zero numbers:nextGreaterInteger _:one
       _:one numbers:nextGreaterInteger _:two
       _:two numbers:nextGreaterInteger _:x

   Obvious, this technique isn't a great way to communicate integers,
   but it does allow communication using integers between agents who
   shared only an RDF encoding language without integers.

   You can argue that any decent standard language would allow integer
   literals, but where do you draw the line?  This basic technique of
   encoding the information of the literal in the RDF graph allows us
   to communicate clearly about objects which are not so likely to be
   literals, like numbers greater than 2^32, or dates, or all sorts of
   more domain-specific information.  It works for character strings
   and XML infosets, too.

   You can view an API as a language, and as such, one would expect
   native types and possibly all serializable classes to be treated as
   literals (ie passed around much like symbols, but in an
   RDF-interoperable way).

    -- sandro

Received on Tuesday, 2 October 2001 10:29:04 UTC