Re: Dataypes, literals, syntax from Geoff Chappell on 2002-08-02 (www-rdf-logic@w3.org from August 2002)

From: Geoff Chappell <geoff@sover.net>
Date: Thu, 1 Aug 2002 23:40:04 -0400
To: "Sampo Syreeni" <decoy@iki.fi>
Cc: <www-rdf-logic@w3.org>
Message-ID: <086a01c239d6$4a15ea50$825ec6d1@goat1>
----- Original Message -----
From: "Sampo Syreeni" <decoy@iki.fi>
To: "Geoff Chappell" <geoff@sover.net>
Cc: <www-rdf-logic@w3.org>
Sent: Thursday, August 01, 2002 7:51 PM
Subject: Re: Dataypes, literals, syntax


>
> On 2002-08-02, Geoff Chappell uttered to Sampo Syreeni:
>
> >It strikes me that it is legitimate to pack langid into literals because
> >the langid is really a statement about the string/label and not the
> >thing that it denotes.
>
> Huh? But that's *exactly* what it is. The literal string is by no means an
> unambiguous label for a given literal, but precisely an extra attribute
> which is necessary in order to both disambiguate which literal we are
> talking about

If we're talking about (as opposed to with) literals, I guess we're taking
the position that literals are tidy? i.e. that they denote themselves? Sure,
in that case an associated langid is making a statement about the thing the
literal denotes - since the thing it denotes is itself.  (or did I
misunderstand you?)

>*and* to interpret the string value coherently. Consider:
>
> <s,p,o1>
> <s,p,o2>
>
> where
>
> o1==("aho","fi",false)
> o2==("aho","ja",false) .

As names, these two things (rdf literals) are clearly different. But _taking
the untidy position_, absent other information, how do I know that these two
names don't refer to the same object? Or how about the case:

    o1==("aho","fi",false)
    o2==("aho","fi",false) .

Can I assume that the two names do refer to the same object given that many
(most?) words have multiple senses. I guess we'd have to assume that if
langids were put on equal footing with datatypes since datatypes are assumed
to functionally bind a lexical representation to a value.

>
> You have two strings which are precisely equivalent in the literal sense,
> but which clearly mean two entirely different things in the languages
> denoted. (Assume away the trouble with hiragana vs. romaji for Japanese,
> for the sake of an example.) I would contend such a difference constitutes
> what is properly called a semantic distinction. The situation wouldn't
> really be different if we substituted identical languages and parse types
> "xsd:decimal" and "xsd:string".
>
> AFAICT, the part having to do with subtyping relations within XSD is well
> beyond basic RDF, just as rdfs:subPropertyOf isn't supposed to be
> understood by RDF-only parsers. I would tend to think that two lexically
> equal literal strings should be treated as RDF-inequal if they had
> separate language and/or separate parse type

I'd agree if tidy literals are the rule, disagree otherwise (assuming that
RDF-inequal is a measure of the inequality of the things the literals
denote, not the literals themselves).

>(even given that parse types
> include all XSD data types), and only be treated as equal at the higher
> level handled by XSD aware API's.

>After all, that's what's being done to
> anonymous nodes with daml:UniqueProperty's and the like, now, or with
> identical string values with different parse types and/or languages.
>
> >By the same token, it seems to make some sense to pack a datatype into a
> >literal as long as it is only saying something about the string (i.e.
> >"10" is in the lexical space of xsd:integer) but seems odd for that
> >packed statement to be saying anything about the value denoted by that
> >string
>
> On the contrary. "aho" is both in the lexical space of (romanized)
> Japanese and Finnish, yet the difference needs to be made in order to be
> express both values for a single property on a given subject. There is a
> clear difference, both in the semantic and RDF-equality terms, here, as
> there would be if we were talking about xsd:integer"1001" and
> xsd:string"1001". Kind of a special case, I grant that, but it's elegance
> I'm after.
>
> >(assuming of course that literals can denote things other than
> >themselves).
>
> They can, of course. Otherwise textual encodings of anything other than
> literal strings would be meaningless.

I guess I'm a bit confused whether you're arguing for or against tidy
literals. Most places you seem to take a tidy stance, but here it sounds
otherwise. Is it fair to say that you want a literal to be able to be an
unambiguous referrer by definition (by always affixing a datatype/context)?
if so, why not just use a uri scheme?

>
> >Otherwise what's the distinction between statements packed inside
> >literals, and statements represented in the graph?
>
> A derivative of the one that is currently being made between resources and
> literals, of course. Literals are an artifact of us wanting to represent
> attributes separately from relations. They call for extra data, like
> language and parse type, which aren't present in the case of normal
> resources because *every* distinquishing feature of a resource can be
> assumed to be represented by its name. The same doesn't hold for literals
> which may very well represent anything at all. That's why we get language
> and parse type, but also quite a number of extra features we might want to
> talk about.
>
> >I guess if rdf evolves some sort of quoting mechanism, we wouldn't need
> >to pack things within literals at all (at least not as a way of making
> >statements about the string).
>
> The trouble is, language and parse type are part of the identity of a
> string. ("aho","fi",0)!=("aho","ja",0), so you cannot represent "aho" in
> the graph and just talk about it separately from its other attributes.
> IOW, you cannot name the Finnish "aho" separately from the Japanese one
> without referring to the language. That is also a distinction which arises
> solely out of the semantic difference between the two strings, much like
> the difference between xsd:integer"1001" and xsd:string"1001".
>
> If there were no literals, we could always assume that any difference in
> identity would be encapsulated by the name of the object (that's pretty
> much the definition of a "name", after all),

That seems to me a better definition of uriref under rdf than of name (i.e.
urirefs are assumed to be unambiguous though not necessarily unique names).

>but when we refer to objects
> by their content (like we do with literals), any distinctive attribute
> whatsoever will have to be represented. Granting an open type mechanism is
> one way to accomplish precisely that. (If you want a distinction, you make
> it by allocating a new type.) Without it, there's Inelegance and Badness.
> (I.e. a literal might very well share all the currently defined
> attributes, but might *still* be different because of an characteristic
> not defined. Currently one example of such a characteristic is the fact
> that one literal might be an xsd:integer and another an xsd:string.)
> --
> Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
> student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
> openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

--geoff
Received on Thursday, 1 August 2002 23:10:37 UTC