Re: Dataypes, literals, syntax

On 2002-08-01, Geoff Chappell uttered to Sampo Syreeni:

>If we're talking about (as opposed to with) literals, I guess we're
>taking the position that literals are tidy? i.e. that they denote
>themselves?

Emm... As far as syntax goes, I'm in the tidy camp. However, I do not
agree with the interpretation of tidy as "denoting itself". I view
tidiness purely as a syntactic construct, that is, one consequence of
constraining the model to a set of triples.

>> o1==("aho","fi",false)
>> o2==("aho","ja",false) .
>
>As names, these two things (rdf literals) are clearly different. But _taking
>the untidy position_, absent other information, how do I know that these two
>names don't refer to the same object?

You don't. Again, I would take this from the point of view of the layer
cake. At this level we're dealing with syntax, so if the kind of extension
I propose were to be made, the current two parse types would have to be
taken as datatypes with datatype mappings taking strings with certain
languages to themselves. They would make the literals tidy (as in "denote
themselves"), whereas other datatypes perhaps wouldn't. However, at a
higher level you might well treat certain syntactically unequal things as
equal. "ad hoc" as a Latin and English string is two different things as
far as syntax (the base RDF data model) goes, but had you something above
in the cake which actually understood the semantics (one such thing being
XSD), it might well want to treat these as equal. The point is that both
have to be expressible separately in case the upper layers take a
different viewpoint, and decide to treat them as inequal.

More generally, I view equality as an equivalence relation like any other.
A given layer in the cake may define equality for its own needs by an
arbitrary rule, within the contraint that its equality relation is
compatible with the one used in the layer immediately below. Even if we
have the most carefully defined datatype mappings in XSD, I still think
one only has to go a layer up before discovering that, indeed,
xsd:decimal"1001" and xsd:string"1001" might be treated as equal. So, from
my point of view, there is no such thing as strict ambiguity. Just levels
of differentiation between objects.

>Or how about the case:
>
>    o1==("aho","fi",false)
>    o2==("aho","fi",false) .
>
>Can I assume that the two names do refer to the same object given that
>many (most?) words have multiple senses. I guess we'd have to assume
>that if langids were put on equal footing with datatypes since datatypes
>are assumed to functionally bind a lexical representation to a value.

Yes. I would tend to think that a proper untidy interpretation of the
above example is that current parse types always bind to entire sets of
meanings, whereas with other datatypes, we might get something more
conventional.

>I'd agree if tidy literals are the rule, disagree otherwise (assuming
>that RDF-inequal is a measure of the inequality of the things the
>literals denote, not the literals themselves).

To summarize, I think RDF in itself should only be concerned with equality
between names, whereas higher layers might want to treat syntactically
inequal literals as equal. XSD certainly does this. But to try and define
a general notion of semantic equivalence, I think is pretty much
impossible. The best which can be done is to go with the best information
we have at each layer. With XSD, the problem is easy enough, being that
there's already a well-defined type hierarchy.

>I guess I'm a bit confused whether you're arguing for or against tidy
>literals. Most places you seem to take a tidy stance, but here it sounds
>otherwise. Is it fair to say that you want a literal to be able to be an
>unambiguous referrer by definition (by always affixing a
>datatype/context)?

Yes. I also tend to think that literals should carry their datatype around
to facilitate type checking. As you'd guess, I'm not a big fan of the two
implicit typing idioms in the current datatyping WD.

>if so, why not just use a uri scheme?

Because if ambiguity is what we want, we want to be able to express that
unambiguously. Taking the current parse types as signalling a datatype
mapping which takes the string-language pair to itself, you get tidiness
(in the sense of literals denoting themselves) and ambiguity. Using
another datatype will get you unambiguity and untidiness (in the sense of
literals not denoting themselves). Nevertheless, the model will be tidy
(in the sense of not having duplicate labels).

Second, it's about elegance. It just seems to me language and parse type
are doing part of what datatypes are supposed to do, and so layering them
on top of each other just seems off. Especially since the datatyping WD
implicitly acknowledges this in saying that only the string value will be
seen by XSD.

(BTW, in my perfect world there would be no literals. I'd do precisely as
you suggest, and use uriref's for everything.)
-- 
Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Friday, 2 August 2002 08:02:11 UTC