Re: RDF 2 Wishlist from Sampo Syreeni on 2009-11-02 (semantic-web@w3.org from November 2009)

From: Sampo Syreeni <decoy@iki.fi>
Date: Tue, 3 Nov 2009 00:11:58 +0200 (EET)
To: Pat Hayes <phayes@ihmc.us>
cc: Damian Steer <pldms@mac.com>, semantic-web@w3.org
Message-ID: <Pine.LNX.4.64.0911022307400.11953@lakka.kapsi>
On 2009-11-02, Pat Hayes wrote:

>> I would argue against this. Reification, in one form or another, is a 
>> highly valuable part of the standard, because it let's us pose 
>> hypotheticals and metadata relating to them.
>
> Only if you also ignore part of the spec. Which is a mess. I believe 
> we can do this better.

Yes. I'm the pragmatic kind of guy. I believe we could go with a sort of 
interim semantics which retained the most useful properties of quotation 
or the like, and still make those semantics circumspect enough to a) not 
introduce any true logical problems, and to b) enable people to create 
more or less valid, useful applications simply by using their intution. 
That way, we'd get both the short term and the long term benefits of 
semantic technology, which is what I'd call a win-win situation.

>> Eventhough Pat is likely to vehemently disagree with me on this one, 
>> I'd take hazy reification/quotation/whatever semantics over the lack 
>> of the basic mechanism, anyday of the week. I mean, otherwise we're 
>> bound to have even *hazier* concoctions in its place.
>
> Why?

Because there is a place and a need for this sort of thing. Natural 
language easily guides us to the simplest example: the free form 
quatation. There's a need for it, eventhough it's far from logically 
pure, or possessing of formal semantics. It's used all the time. And 
then the idiom continuously leaks to just about every system of 
knowledge representation/semantic network as well. Hell, even the fact 
that we happened to allow a fully unconstrained, willy-nilly, textual 
literal, originally without even a language type into RDF testifies to 
the fact.

I argue that it's better to have a mechanism in place that explicitly 
encodes that sort of stuff, that mechanism should rather be 
understandable and usable wrt the common coder than completely well 
axiomatized, and that after that we could be left with less off a mess 
and more of a usable application, than when we try to circuit the 
formal, logical, AI route.

> Are they, though? My sense is that the lisp-style lists used by OWL 
> are used much more than bag, seq, alt. (Actually, the lists are the 
> collections: these three are the containers.)

In that department we agree. The syntax is a mess, again. Representation 
shouldn't matter, no matter what the input (DDL) or manipulation (DML) 
syntax. It should be idiot-proof and master-friendly. What I meant was 
that the absract datatypes encoded by the types mentioned were highly 
sensible. Perhaps it's once again that RDF's triple model guides us to 
build the wrong kind of storage and/or GUI models for our data?

> The key lack right now is any standard way to refer to a 'part' of an 
> RDF graph from the outside.

That, too. That sort of stuff would mandate naming every triple/binary 
predicate, and inventing a system of referring to huge sets of those in 
a URI. Not doable. Then we could use named graphs, but the grouping and 
naming then takes place on the author's side, which really gives hir too 
much control in a distributed environment. Thus most people just 
download/syndicate other people's stuff and apply whatever logic they 
happen to like to the facts. That works, but then it doesn't really 
allow a) efficient references to subsets of data from first, second or 
third parties, nor b) especially technological byuse of such data, e.g. 
to accelerate aggregation

>> No. From my relational background, I tend to treat bnodes like I'd deal 
>> with perfect, opaque surrogate keys. [...]

In another post of yours, you already handled n-ary stuff quite well. No 
need to add anything there.

>> If even that... Personally I'm of the opinion that literals should be 
>> removed from the model altogether.
>
> Oh no, they are the bread and butter of all the linked data. I'm all 
> for putting datatyped literals into logic itself, in fact.

I'm not against them as such, no. But from my perspective, 
modularization calls for treating them in a different standard. Like, in 
HTML? As an editor of an ezine, my favorite literal of course is my very 
own, libertarian minded, entire, HTML formatted rant about freedom. Do 
you *really* want to type that sort of thing properly and comprehensivly 
within RDF? Or would it perhaps be better to leave it as referred 
material, with only the axiomatizable metadata retained within RDF 
proper?

> Thats not the real issue. The problem is, people need something weaker 
> than sameAs to express a link, in many cases.

Ah. I went with something similar above, so I can relate. But what would 
be this "something weaker" here? We can imagine tons of weaker, 
intuitive things, but since at the end we have to have formal semantics 
which can be handled by a computer, which axiomatics precisely allow us 
to go weaker, and help the people at the same time?

> Its not all people misusing sameAs because they don't understand it: 
> they misuse it because there is no alternative, and they have to use 
> something. Its up to us to provide some better alternatives.

Quite. But as I said, we can't stray from machine processability either. 
It might be that FOPL and its ilk take too much of our attention, 
relatively speaking. Still, I at least find it pretty difficult to find 
any truly "semantic" alternatives to the usual logical connectives we're 
used to utilizing.

True, we could have these hazy "related-to" assertions. Maybe they could 
help some NLP machine gather extra facts for end user consumption. Or we 
could have more specific, more formalistic adaptations towards today's 
technology: "forall <x>,<y> [type of person]<x>, [type of 
information]<y> interest-is(<x>,<y>,["semantic web technology"]).

I think it's pretty clear that this sort of thing cannot serve as the 
usable alternative, at least until we have some damn sophisticated AI in 
place to serve the general population. Till then, we will have to do 
with rather hazy semantics, because that's what people deal in, and we 
cannot then understand half of even that programmatically.

My opinion is that the situation warrants a two-pronged approach: first, 
allow lots of hazy connectives into the model, while being sure you make 
no guarantees about them. Guide them towards coherence and convergence, 
and should that then actually happen, owl:sameas it suddenly is, in your 
and your well-placed friends' triplebases.

Then, number two, formalize subsets of them fully; build complete 
axiomatic semantics for them, and then market the end result as having 
some specific programmatical advantages resulting from the tighter 
formalism. Like the one that no, we're no longer talking about an 
interesting restaurant which you might perhaps have something to do with 
while it would perhaps want to send you some advertisements; no, we're 
talking now about the the rocking joint which already knows you like the 
precise kind of music they have to offer, their cook is just a little 
bit related to you, chinese, plus adept at precisely the kind of 
Szechuan cuisine you love the most. 'Cause they happen to have your 
triples, and can interpret them unambiguously as well...

> Its the last one that I think we are obliged to attempt.

Perhaps, and thanks. But don't neglect the practical side either. In 
total, laziness counts quite a lot, and that's why people bitch about 
RDF/XML so much as well. It's a real hindrance.

> That is one good strategy in the present state of the art, yes. See 
> how SKOS approaches similar issues.

My personal favorite are the approaches which start with RDF encodings 
of WordNet, using intuitive semantics related to the latter.

> Well, linking data using not-quite-sameAs-maybe is something that very 
> many people care a lot about right now. I hear more about this issue 
> than any other.

Personally I think this is an IS-A -issue. I couldn't really fully 
decipher what Brachman was saying about it, either, but intuitively 
speaking, this is precisely the same thing. I mean, IS-A means one thing 
is kind of like another. If you declare it both ways, it kind of like 
means the two things are the same. But not quite, because there are 
these hideous, little, semantic, interpretational, goddamn hermeneutic 
details lurking around the equation.

In the end, I've also never heard of a proper generalization of the IS-A 
relation, so that those little nasty details could be abstracted away 
for the moment. The closest I've come is the statement that a) subtypes 
inherit the methods and internal data fields of the supertype, plus b) 
the inheritance might work a bit differently with passed/returned, i.e. 
contravariant/covariant types of parameters. That's it. It's formal 
semantics, granted, and it ain't quite trivial either, but then, it 
ain't gonna help you a lot in developing a useful application either.

> Most of the nasties in RDF are just ugly, or nuisances, but this is a 
> real urgent problem that will get worse very rapidly.

I can relate. I mean, as I said, I'm mostly the relational kinda guy 
myself. I think that theory is reasonably well developed. And yet this 
sort of stuff is something that isn't really addressed in the literature 
in the full generality, and which now seems to reflect on the RDF/SW 
folks as well. Personally I'd name this a "generalized entity integrity 
issue, under an open world assumption".
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Monday, 2 November 2009 22:12:41 UTC