Re: big issue (2001-09-28#13)

With reference to...

[1] Sergey's message:
   http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0444.html

[2] Some concerns expressed about DLs and literals-as-resources:
   http://lists.w3.org/Archives/Public/www-rdf-logic/2001Sep/0077.html
Specifically:
[[[
Peter F. Patel-Schneider:
 > >DAML+OIL depends somewhat on the separation between resources and
 > >literals.  Some Description Logics may break severely if their separation
 > >between abstract (resources) and concrete (literals) domains is breached.
 >
 > Right, that is what worries me. I recall this being a sticking point
 > in the DAML discussions for some people, so I presume it is fairly
 > critical there also, no?

Right now, it is probably the case that the theory of XML Schema datatypes
is weak enough and the constructs that use them in DAML+OIL are also weak
enough that no undecidabilities would arise if literals were also
resources.  (Implementation headaches do arise, however!)  If you want to
have a stronger theory for datatypes or more DAML+OIL constructs that use
them, you can easily introduce undecidabilites.  Combining two formalisms
requires great care!
]]]

[3] DanC's thoughts on literal values:
   http://www.w3.org/2001/01/ct24

[4] A comment by Peter Patel-Schneider about literals:
   http://lists.w3.org/Archives/Public/www-rdf-interest/2001Sep/0135.html

[5] My exchange with Brian about literals and strings:
   http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0445.html
and
   http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0001.html

[6] The currently-published model theory:
   http://www.w3.org/TR/2001/WD-rdf-mt-20010925/


I am concerned that Sergey's approach may be introducing more problems than 
it solves... I'm having a hard time getting my head around the 
implications, so, instead, I'm going to stand back and try another tack, 
taking a somewhat different view than Sergey.


1.  Inspired by [5], distinguish between "strings" and "literals":

- a string is a sequence of UCS/Unicode codepoints.

- a literal is, informally, that kind of RDF object value whose is 
specified by a string and possibly some additional information (such as a 
language tag).

I think that a "literal" in this sense exists only in the context of some 
concrete syntax, and its nature is somewhat dependent on that syntax.


2.  The model theory [6] presumes:
     XL : literals -> LV
          -- (fixed mapping for literals to literal values in the domain of 
interpretation)
     IS : V -> IR
          -- (mapping for vocabulary of URIs  used to to resources in the 
domain of interpretation)
but does not make presumptions about the nature of LV, or whether there is 
any overlap between LV and IR.  Exhibit [2] suggests that there might be 
problems if LV and IR are not disjoint, but that such problems don't arise 
if the data structuring primitives are weak enough and/or constructs that 
use them are weak enough.

I'm not sufficient logician to know what might constitute "weak enough", 
but I have an intuition that one source of problems might be if the same 
structure that is expressed within the data type of a literal can also be 
expressed using "simpler" literal values related by RDF properties.  That 
would, I think, require the subsumption computation to examine the internal 
structure of literals.

It seems to me, then, that the structure of literals should be, in some 
sense, atomic or opaque, and composite structures should be expressed using 
RDF relations.  Any value (in the domain of interpretation) that can be 
expressed in terms of relationships between other values should not be 
admissible as a literal value.

This rules out having an LV which is a composition of a string and a 
language tag.

[[[Trouble is, it also seems to rule out anything but individual 
characters, as a string of length >1 can be expressed as a concatenation of 
other strings.  I think this is a purely lexical/syntactic issue, but I'm 
on shaky ground here.]]]


3. Inspired partly by [3], I suggest that literal attributes (xml:lang, 
maybe others in future) are handled by some kind of syntactic 
transformation when constructing the RDF graph, rather than being 
represented somehow within graph literal nodes.  Thus, within the RDF graph 
syntax, "literals" are simply "strings".

Example:

     <Subject>
        <property xml:lang="us-EN">Property string</property>
     </Subject>

might yield a graph like this:

     [Subject] --property--> [  ] --xml:lang--> "us-EN"
                             [  ] --property--> "Property string"

or, following DanC's lead [3], figure 1:

     [Subject] --???--> [  ] --xml:lang---> "us-EN"
     [       ]          [  ] --rdf:value--> "Property string"
     [       ]
     [       ] --property--> "Property string"


The details of the transformation aren't fixed;  the key idea is the 
transformation to graph form reduces all literals to "string" form.


4. Wrapping up

The upshot of this is that a literal value (in LV) is always a string 
without additional adornment.  For RDF graph syntax, the LX mapping can be 
a unity mapping.  Any deeper interpretation of a literal (a string in a 
given language, a number, etc) is in the interpretation of some resource 
for which that literal is an rdf:value.

Then:

- Do LV and IR overlap?  It seems to me unclear how one would exclude a 
mapping in IS from some URI to a Unicode string in LV;  e.g. 
<data:,text/plain;charset=utf-8,Property string>.  I think this could be 
resolved either way.  If disjointness of IR and LV is required, then the 
above example might map to something like:

    [ ] --rdf:value----------> "Property string"
    [ ] --meta:content-type--> [Content-type:text/plain]

- Does overlapping resources with the very simple domain of Unicode strings 
for literals cause problems for description logics?  I don't know.

- Does it make sense for literals to have properties; e.g.
   "Property string" --length--> "15"
I think any such properties would be trivial, in the sense that they always 
can be determined by examination of the literal itself.  So, if prohibited, 
no expressive power is lost.


#g

Received on Monday, 1 October 2001 06:40:46 UTC