Re: Dataypes, literals, syntax from Sampo Syreeni on 2002-08-01 (www-rdf-logic@w3.org from August 2002)

From: Sampo Syreeni <decoy@iki.fi>
Date: Fri, 2 Aug 2002 02:51:26 +0300 (EEST)
To: Geoff Chappell <geoff@sover.net>
cc: <www-rdf-logic@w3.org>
Message-ID: <Pine.SOL.4.30.0208020204020.21349-100000@kruuna.Helsinki.FI>
On 2002-08-02, Geoff Chappell uttered to Sampo Syreeni:

>It strikes me that it is legitimate to pack langid into literals because
>the langid is really a statement about the string/label and not the
>thing that it denotes.

Huh? But that's *exactly* what it is. The literal string is by no means an
unambiguous label for a given literal, but precisely an extra attribute
which is necessary in order to both disambiguate which literal we are
talking about *and* to interpret the string value coherently. Consider:

<s,p,o1>
<s,p,o2>

where

o1==("aho","fi",false)
o2==("aho","ja",false) .

You have two strings which are precisely equivalent in the literal sense,
but which clearly mean two entirely different things in the languages
denoted. (Assume away the trouble with hiragana vs. romaji for Japanese,
for the sake of an example.) I would contend such a difference constitutes
what is properly called a semantic distinction. The situation wouldn't
really be different if we substituted identical languages and parse types
"xsd:decimal" and "xsd:string".

AFAICT, the part having to do with subtyping relations within XSD is well
beyond basic RDF, just as rdfs:subPropertyOf isn't supposed to be
understood by RDF-only parsers. I would tend to think that two lexically
equal literal strings should be treated as RDF-inequal if they had
separate language and/or separate parse type (even given that parse types
include all XSD data types), and only be treated as equal at the higher
level handled by XSD aware API's. After all, that's what's being done to
anonymous nodes with daml:UniqueProperty's and the like, now, or with
identical string values with different parse types and/or languages.

>By the same token, it seems to make some sense to pack a datatype into a
>literal as long as it is only saying something about the string (i.e.
>"10" is in the lexical space of xsd:integer) but seems odd for that
>packed statement to be saying anything about the value denoted by that
>string

On the contrary. "aho" is both in the lexical space of (romanized)
Japanese and Finnish, yet the difference needs to be made in order to be
express both values for a single property on a given subject. There is a
clear difference, both in the semantic and RDF-equality terms, here, as
there would be if we were talking about xsd:integer"1001" and
xsd:string"1001". Kind of a special case, I grant that, but it's elegance
I'm after.

>(assuming of course that literals can denote things other than
>themselves).

They can, of course. Otherwise textual encodings of anything other than
literal strings would be meaningless.

>Otherwise what's the distinction between statements packed inside
>literals, and statements represented in the graph?

A derivative of the one that is currently being made between resources and
literals, of course. Literals are an artifact of us wanting to represent
attributes separately from relations. They call for extra data, like
language and parse type, which aren't present in the case of normal
resources because *every* distinquishing feature of a resource can be
assumed to be represented by its name. The same doesn't hold for literals
which may very well represent anything at all. That's why we get language
and parse type, but also quite a number of extra features we might want to
talk about.

>I guess if rdf evolves some sort of quoting mechanism, we wouldn't need
>to pack things within literals at all (at least not as a way of making
>statements about the string).

The trouble is, language and parse type are part of the identity of a
string. ("aho","fi",0)!=("aho","ja",0), so you cannot represent "aho" in
the graph and just talk about it separately from its other attributes.
IOW, you cannot name the Finnish "aho" separately from the Japanese one
without referring to the language. That is also a distinction which arises
solely out of the semantic difference between the two strings, much like
the difference between xsd:integer"1001" and xsd:string"1001".

If there were no literals, we could always assume that any difference in
identity would be encapsulated by the name of the object (that's pretty
much the definition of a "name", after all), but when we refer to objects
by their content (like we do with literals), any distinctive attribute
whatsoever will have to be represented. Granting an open type mechanism is
one way to accomplish precisely that. (If you want a distinction, you make
it by allocating a new type.) Without it, there's Inelegance and Badness.
(I.e. a literal might very well share all the currently defined
attributes, but might *still* be different because of an characteristic
not defined. Currently one example of such a characteristic is the fact
that one literal might be an xsd:integer and another an xsd:string.)
-- 
Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Thursday, 1 August 2002 19:51:30 UTC