Re: RDF 2 Wishlist from Sampo Syreeni on 2009-11-02 (semantic-web@w3.org from November 2009)

From: Sampo Syreeni <decoy@iki.fi>
Date: Mon, 2 Nov 2009 22:22:38 +0200 (EET)
To: Pat Hayes <phayes@ihmc.us>
cc: Damian Steer <pldms@mac.com>, semantic-web@w3.org
Message-ID: <Pine.LNX.4.64.0911022113070.11953@lakka.kapsi>
On 2009-11-02, Pat Hayes wrote:

(Sorry for answering indirectly. I'm a bit late to the discussion.)

>> * Deprecate RDF reification. Issue warnings, write document to 
>> explain problems.

I would argue against this. Reification, in one form or another, is a 
highly valuable part of the standard, because it let's us pose 
hypotheticals and metadata relating to them. Eventhough Pat is likely to 
vehemently disagree with me on this one, I'd take hazy 
reification/quotation/whatever semantics over the lack of the basic 
mechanism, anyday of the week. I mean, otherwise we're bound to have 
even *hazier* concoctions in its place.

>> * Deprecate collections (Alt, Bag, Seq). See above.

Another no on my part. Heavy semantic lifting is needed with these as 
well, but the fact is, the basic concepts are extremely useful as 
modelling primitives. Without stuff like this, what are we left with, 
semantically speaking? Triples? They don't carry semantics at all; 
they're just propositions, and even limited to being binary.

>> * Serialise named graphs (although I'm not super keen in general): 
>> [...]

A formal syntax for named graphs would be nice, yes. Even in RDF/XML 
(which I personally loathe as a syntax). But again, they need to have 
proper semantics. I'd advocate the one based in epistemic modal logic: 
treat any named graph as a bunch of assertions, define formal modal 
operators which can be used to give metadata about the referred-to 
graph, and then let any referring stuff flag its beliefs using that 
common and well-tried-out formalism. All the while reserving formalized 
judgment, so that the open world assumption holds also wrt any formal 
logical interpretation, such that people using the basic assertions can 
judge for themselves how to interpret the source material arising from a 
distributed source.

E.g. source A might assert that it believes the whole logical content of 
the named graph imported from source B, but still, I, as the end user of 
the data, have the full capability of choosing which beliefs of A's I'm 
willing to trust/believe-in, when I'm building up my application. I 
believe examples such as these suggest that TimBL's original vision of a 
distributed, open-world-assumption semantic net necessarily entails use 
of epistemic modal logic to formally deal with the higher, trust-related 
layers of the cake. That could, and should, be done implicitly at first, 
so that all of the implications needn't be hardcoded right from the 
start in RDF Core. But the possibility of later on formally dealing with 
beliefs should, I think, still be left open.

>> * Simple envelope: <document name="foo" 
>> type="application/turtle">...</document>
>> * Sparql GSPO to dump datasets

I think this sort of thing can be standardized outside of W3C. If uptake 
is wide enough, then, standard it is. If not, one failed attempt at 
standardization we once have.

>> * Make bnode unlabelled, rather than existentially quantified var.

No. From my relational background, I tend to treat bnodes like I'd deal 
with perfect, opaque surrogate keys. Their only semantics are to connect 
stuff together, while shying away from exposing autogenerated hogwash to 
the end users. In that capacity, it doesn't make sense to apply the one 
name assumption to them; in fact they've been invented to go around said 
restriction where available information about the real world referents 
leads to a diffuse representation of even entity identity (or to cut 
down on the internal redundancy of identifiers, when they're visible; 
that's then a different deal altogether; more to do with data 
compression than normalized data representation). It'd seriously hinder 
knowledge representation, especially in a distributed, not necessarily
perfect-knowledge or in particular controlled vocabulary, uniformly 
well-keyed environment.

To make it simple, it should be possible to have a number of differently 
(and inferentially) keyed objects in the graph. Then we need a truly 
blank node to mediate their relationships to other stuff. Once that 
happens, the formal semantics immediately become one of existential 
quantification, in the absence of a one-name-assumption. That's model 
theory 101, basically.

> Hmm, not at all obvious to me what this distinction amounts to. 
> Unlabelled *is* existentially quantified, to all semantic purposes. 
> Unfortunately, RIF has muddied this water by putting in meaningless 
> distinctions.

I'm no expert on RIF, but I believe this is once again an instance of a 
muddled distinction between fully logical, and fully semantic, 
constraints.

>> * Prefixes: warn if some standard set not 'correct'. Have 'grab all' 
>> namespace.

That sort of thing has been, and should be, externalized from the 
definition. We have separate and more focused standards to deal with 
this.

>> * Lang _and_ type. Reason for exclusivity lost in mists of time.

Yes. I'd ditch this sort of stuff right now. If you want metadata on a 
literal, it shouldn't really be a literal -- it should be a named 
entity, and the metadata should hang off it. The literal, it should 
simply be the terminal point where all of the inferencing stops, after 
all of the metadata has already been fully ingested. It should remain a 
dumb literal, which is only interpreted after we're done with the 
metadata attached to it.

If even that... Personally I'm of the opinion that literals should be 
removed from the model altogether.

>> * Bnodes as predicates. See above. Does SPARQL allow it?

This is useful, I think. It preserves the symmetry between subjects, 
predicates and objects. That sort of thing rhymes well with my 
relational background, where the symmetry is absolutely perfect, and 
where I use that symmetry to advantage on a daily basis in my work. It 
also rhymes well with the fact that, in a truly distributed semantic 
web, which uses triples-only no least, it's quite probable that a) there 
are going to be multiple names for the same thing, and that b) people 
would want to avoid referring to specific names of even predicates, 
instead preferring to identify them by their properties. In that case, 
it makes ample sense to use a blank node as a predicate as well.

>> * RDF/XML inverse properties. Make writing more pleasant.

Yes. But explicitly make these syntactic sugar. Not something that is 
part of the base data model.

>> * Equivalence relations. Seems like every use of sameAs is incorrect.

No. The semantics exist in DAML/OIL/OWL. If the particular retard you're 
referring to can't comprehend them, it ain't gonna help if the 
definition is moved around to somewhere else, either. It'd just break 
modularization within the framework. ;)

> In brief: there are at least 4 distinct notions of same-but-not-sameAs 
> Ive managed to identify so far, and Im sure there will be more.

I can just imagine. Especially since I've just been enjoying Brachman's 
modern classic "What IS-A is and isn't: an analysis of taxonomic links 
in semantic networks."

> Bottom line: no single solution will work, so no RDF2 magic bullet. 
> But Im sure we can do something useful.

Personally I'd argue most of the things that cause opprobrium and 
confusion at the moment are stuff that could be corrected via 1) more 
precise and understandable documentation, 2) easier syntax, for us so 
called lazy people, and 3) some work on formal semantics, which also 
takes a wider perspective on the real life problems people are using RDF 
to solve. Fourth, it perhaps wouldn't be a bad idea to intentionally 
allow a whole slew of logical confusion, either, as long as the core 
spec remained clean; that way the semantic web could develop in the 
unorganized manner that the first web did. Without undue effort towards 
correctness, until it bumped into the useful, necessary, third party 
engine which actually cared about that sort of thing.
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Monday, 2 November 2009 20:23:22 UTC