Re: Subjects as Literals from Sampo Syreeni on 2010-07-07 (public-lod@w3.org from July 2010)

From: Sampo Syreeni <decoy@iki.fi>
Date: Wed, 7 Jul 2010 05:51:46 +0300 (EEST)
To: Pat Hayes <phayes@ihmc.us>
cc: Linked Data community <public-lod@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <Pine.LNX.4.64.1007070454510.8363@lakka.kapsi.fi>
On 2010-07-05, Pat Hayes wrote:

> This objection strikes me as completely wrong-headed. Of course 
> literals are machine processable.

What precisely does "Sampo" as a plain literal mean to a computer? Do 
give me the fullest semantics you can. As in, is it the Finnish Sampo as 
in me, my neighbour, or what would be roughly translated as "cornucopia" 
in some languages? You could of course just answer that it's just a 
literal, but then you'd be telling precisely the same thing I did: that 
sort of thing has only axiomatic semantics, lacking the real world 
denotation which is needed if we want to actually apply this stuff to 
something tangible.

So what is it? As opposed to me as an OID (I don't think the URI 
namespace registration went through yet): 1.3.6.1.4.1.12798.1.2049 ? I 
mean, if your semweb killer app ordered that, the user should mostly 
receive a no-thanks for hairy mail prostitution. If they ordered the 
third kind of Sampo -- they should probably receive hard psychedelics 
instead. (And yes, I know this is rather concrete bound. I think it 
should be, too.)

> Well, nobody is suggesting allowing literals as predicates [...]

Why? Is there a lesson to be learnt there?

> But it is easy to give 'ridiculous' examples for any syntactic 
> possibility. I can write apparent nonsense using nothing but URIs, but 
> this is not an argument for disallowing URIs in RDF.

In fact it could be. Whatever format you accept, you should be liberal 
with, but at the same time you should always have an unambiguous, safe, 
productive and well-documented interpretation for it all.

> This is WRONG. The type specifiers *completely* disambiguate the text 
> in the body of the literal.

A language signifier tacked onto a plain literal doesn't, as I just 
showed. An integer annotation on a number just says it's a number, not 
what unit it perhaps carries; those are two completely different kinds 
of numbers, carrying different operational semantics. With literals, 
typing has come up but it hasn't been fully integrated with the rest of 
the RDF grammar; you can still say things like 'ten(integer) much-likes 
"Sampo"@fi' without any usual type system catching the error.

I'd say that's pretty far from well defined semantics. Even in the 
simplest, axiomatic sense. The literal is then the primary culprit -- 
otherwise you and others have done a swell job in tightening it up.

> For plain literals, the meaning of the literal is the string itself, a 
> unique string of characters.

That I know too.

>> With Schema derived or otherwise strictly derived types, the level of 
>> disambiguation can be the same as or even better than with URI's, 
>> true. But then that goes the other way around, too: URI's could take 
>> the place of any such precise type.
>
> No, they cannot. For numbers, for example, one would need infinitely many 
> URIs; but in any case, why bother creating all these URIs?

There are just as many URI's in abstract as there are integers. Just 
take oid:integer:1 and go right past oid:integer:<googol> if necessary. 
Certainly even today the practical maximum GET strings over even HTTP go 
right upto thousands of digits of potential numerical capacity, quite 
without the need to compress further.

In theory, it can be argued that we can think about only such many 
discrete concepts. As long as they are discrete, they can be enumerated, 
and as long as the number stays finite, we could just give all of them 
separate numbers. Then just tack them onto a very big namespace prefix, 
like my number above. Theoretically it's easy; in pracitce you'd like 
the kind of hierarhical namespace that URI's and OID's buy you. But 
still, naming something like 10^100 discrete objects would still be 
easy. And then !!!:

> We have (universally understood) names for the numbers already, called 
> numerals. For dates, times and so forth, there are many formats in use 
> throughout human societies, of course. That is WHY the work of 
> establishing datatype standards work was done. To ignore all this, to 
> reject a widely accepted standard, and advocate reversion to a 
> home-made URI scheme seems to me to be blatantly irresponsible.

What I want is for more stuff to be standardized and their format 
shared. That is *squarely* my problem, here: RDF literals invite misuse. 
Perhaps if we banned plain literals, it would be better. But right now, 
few people type their literals well, and the typing mechanism even 
invites people to treat typed values as separate from the rest of the 
triple oriented data model. Which is extra work; which means your 
typical lazy nerd won't like it enough to implement it proper.

Personally, I'd like to see data standardized as broadly as possible. 
I'd like to have broad datasets out there, will well defined semantics. 
That is pretty much why I then oppose literals within the semantic web: 
they encourage sloppy typing which can kill the whole deal. Especially 
if we start to allow them all-round.
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Wednesday, 7 July 2010 02:52:45 UTC