Re: Subjects as Literals from Pat Hayes on 2010-07-08 (semantic-web@w3.org from July 2010)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 8 Jul 2010 12:06:15 -0500
To: Sampo Syreeni <decoy@iki.fi>
Cc: Linked Data community <public-lod@w3.org>, Semantic Web <semantic-web@w3.org>
Message-Id: <20D2CAB4-B064-454D-A20C-203051B6E608@ihmc.us>
On Jul 6, 2010, at 9:51 PM, Sampo Syreeni wrote:

> On 2010-07-05, Pat Hayes wrote:
>
>> This objection strikes me as completely wrong-headed. Of course  
>> literals are machine processable.
>
> What precisely does "Sampo" as a plain literal mean to a computer?  
> Do give me the fullest semantics you can.

In RDF, it means the five-character string ess-ay-em-pee-oh, in that  
order. It does not mean anything else. This meaning is fixed by the  
RDF specification documents themselves. BTW, these are Unicode  
characters, so consult the Unicode documentation for more detail on  
what exactly is meant by a "character" (it is surprisingly  
complicated, and makes fascinating reading.)

> As in, is it the Finnish Sampo as in me, my neighbour, or what would  
> be roughly translated as "cornucopia" in some languages?

As you did not specify any language tag, the characters are presumed  
to be in the English ("Latin") alphabet. Technically, the characters  
are all in unicode plane 0.

> You could of course just answer that it's just a literal, but then  
> you'd be telling precisely the same thing I did: that sort of thing  
> has only axiomatic semantics, lacking the real world denotation  
> which is needed if we want to actually apply this stuff to something  
> tangible.

Not at all. Character strings may not be 'tangible', but they are real  
things in the world. Being tangible isn't a necessary condition for  
being real. The world comprises many things, probably more kinds of  
thing than any of us are capable of imagining at any given moment (the  
'horatio principle': it is a mistake to want to exclude things from  
the universe of everyone else's discourse, or to presume that one's  
own ontological biases are going to be universally shared by others.)

> So what is it? As opposed to me as an OID (I don't think the URI  
> namespace registration went through yet): 1.3.6.1.4.1.12798.1.2049 ?  
> I mean, if your semweb killer app ordered that, the user should  
> mostly receive a no-thanks for hairy mail prostitution. If they  
> ordered the third kind of Sampo -- they should probably receive hard  
> psychedelics instead. (And yes, I know this is rather concrete  
> bound. I think it should be, too.)
>
>> Well, nobody is suggesting allowing literals as predicates [...]
>
> Why? Is there a lesson to be learnt there?

Only that the world in general probably isn't ready yet for this kind  
of generalized logic. It is being used by specialists and those who  
really need it, like the security agencies (who have been using it for  
several years now).

>> But it is easy to give 'ridiculous' examples for any syntactic  
>> possibility. I can write apparent nonsense using nothing but URIs,  
>> but this is not an argument for disallowing URIs in RDF.
>
> In fact it could be. Whatever format you accept, you should be  
> liberal with, but at the same time you should always have an  
> unambiguous, safe, productive and well-documented interpretation for  
> it all.
>
>> This is WRONG. The type specifiers *completely* disambiguate the  
>> text in the body of the literal.
>
> A language signifier tacked onto a plain literal doesn't, as I just  
> showed.

Actually it does. The literal denotes the string, no more and no less.

> An integer annotation on a number just says it's a number

And that ends the matter, right there. A number is a real thing in the  
world, it is the denotation of a numeral. It doesn't "carry" anything  
else. If you want to talk about numbers of zlotys, or numbers of  
centimeters, then you need ontologies of zlotys and centimeters (or,  
perhaps, new datatypes for these things.)

> , not what unit it perhaps carries; those are two completely  
> different kinds of numbers, carrying different operational semantics.

No, they are not different kinds of *numbers*. There is only one kind  
of number, AKA the natural numbers (Im ignoring reals, rationals, and  
complex numbers.)

> With literals, typing has come up but it hasn't been fully  
> integrated with the rest of the RDF grammar; you can still say  
> things like 'ten(integer) much-likes "Sampo"@fi' without any usual  
> type system catching the error.

LIteral types don't check 'errors' in RDF. (Though this one ought to  
be caught by any RDF parser, in fact.) This is a complicated issue in  
the design of RDF, one which absorbed a great deal of the WG's time.  
Its probably not relevant to go into this here; it has to do with  
keeping RDF monotonic. I can wax lyrical on this if you really want me  
to.

>
> I'd say that's pretty far from well defined semantics. Even in the  
> simplest, axiomatic sense. The literal is then the primary culprit  
> -- otherwise you and others have done a swell job in tightening it up.
>
>> For plain literals, the meaning of the literal is the string  
>> itself, a unique string of characters.
>
> That I know too.

Well then, isn't that unambiguous enough for you?

>
>>> With Schema derived or otherwise strictly derived types, the level  
>>> of disambiguation can be the same as or even better than with  
>>> URI's, true. But then that goes the other way around, too: URI's  
>>> could take the place of any such precise type.
>>
>> No, they cannot. For numbers, for example, one would need  
>> infinitely many URIs; but in any case, why bother creating all  
>> these URIs?
>
> There are just as many URI's in abstract as there are integers. Just  
> take oid:integer:1 and go right past oid:integer:<googol> if  
> necessary. Certainly even today the practical maximum GET strings  
> over even HTTP go right upto thousands of digits of potential  
> numerical capacity, quite without the need to compress further.
>
> In theory, it can be argued that we can think about only such many  
> discrete concepts. As long as they are discrete, they can be  
> enumerated, and as long as the number stays finite, we could just  
> give all of them separate numbers. Then just tack them onto a very  
> big namespace prefix, like my number above. Theoretically it's easy;  
> in pracitce you'd like the kind of hierarhical namespace that URI's  
> and OID's buy you. But still, naming something like 10^100 discrete  
> objects would still be easy.

Of course, but then you are presuming that your URI scheme obeys the  
rules of a datatyped literal, but they don't.  If I see the URI  
sampo:thingie.567, who tells me that I should apply the decimal rules  
for figuring out that this means five hundred and sixty seven? And  
even if you can put some weird PHP script at the end of sampo:thingie  
which can autogenerate some (what? OWL? HTML? RDFa?) which 'tells' me  
what that number means, that doesn't help me when I see sampo:thingie. 
568. Not to mention the issue of why should I use YOUR URI--numerals?  
What if someone else wants to take over the natural numbers, and they  
have a faster server? So we need aleph-0  sameAs links between  
sampo:thingie.<numeral> and someotherguy:betternumber.<numeral> ?   
This is completely absurd, worse than email spam, to choke up the Web  
with HTTP requests for disambiguating decimal numerals.

> And then !!!:
>
>> We have (universally understood) names for the numbers already,  
>> called numerals. For dates, times and so forth, there are many  
>> formats in use throughout human societies, of course. That is WHY  
>> the work of establishing datatype standards work was done. To  
>> ignore all this, to reject a widely accepted standard, and advocate  
>> reversion to a home-made URI scheme seems to me to be blatantly  
>> irresponsible.
>
> What I want is for more stuff to be standardized and their format  
> shared. That is *squarely* my problem, here: RDF literals invite  
> misuse. Perhaps if we banned plain literals, it would be better. But  
> right now, few people type their literals well, and the typing  
> mechanism even invites people to treat typed values as separate from  
> the rest of the triple oriented data model. Which is extra work;  
> which means your typical lazy nerd won't like it enough to implement  
> it proper.

I have heard this argument many times, and I absolutely reject it.  It  
is an argument against the Web, and ultimately an argument from  
arrogance. These lazy nerds can (and do) mistype URIs just as often as  
literal strings. But in fact, the world seems to manage. They - this  
great crowd of stupid people who can't be trusted to type a number  
correctly - regularly do things like order on-line and check their  
bank balances and charge things to their credit cards. I wonder how  
anyone can permit them to do this, its such a *risk*.

Pat Hayes

>
> Personally, I'd like to see data standardized as broadly as  
> possible. I'd like to have broad datasets out there, will well  
> defined semantics. That is pretty much why I then oppose literals  
> within the semantic web: they encourage sloppy typing which can kill  
> the whole deal. Especially if we start to allow them all-round.
> -- 
> Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
> +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
>
>

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 8 July 2010 17:07:19 UTC