Re: RDF 2 Wishlist from Pat Hayes on 2009-11-02 (semantic-web@w3.org from November 2009)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 2 Nov 2009 15:03:09 -0600
To: Sampo Syreeni <decoy@iki.fi>
Cc: Damian Steer <pldms@mac.com>, semantic-web@w3.org
Message-Id: <8E43ED0D-10A7-44DF-ADCF-4A851003FA7C@ihmc.us>
On Nov 2, 2009, at 2:22 PM, Sampo Syreeni wrote:

> On 2009-11-02, Pat Hayes wrote:
>
> (Sorry for answering indirectly. I'm a bit late to the discussion.)
>
>>> * Deprecate RDF reification. Issue warnings, write document to  
>>> explain problems.
>
> I would argue against this. Reification, in one form or another, is  
> a highly valuable part of the standard, because it let's us pose  
> hypotheticals and metadata relating to them.

Only if you also ignore part of the spec. Which is a mess. I believe  
we can do this better.

> Eventhough Pat is likely to vehemently disagree with me on this one,  
> I'd take hazy reification/quotation/whatever semantics over the lack  
> of the basic mechanism, anyday of the week. I mean, otherwise we're  
> bound to have even *hazier* concoctions in its place.

Why?

>
>>> * Deprecate collections (Alt, Bag, Seq). See above.
>
> Another no on my part. Heavy semantic lifting is needed with these  
> as well, but the fact is, the basic concepts are extremely useful as  
> modelling primitives.

Are they, though? My sense is that the lisp-style lists used by OWL  
are used much more than bag, seq, alt. (Actually, the lists are the  
collections: these three are the containers.)

> Without stuff like this, what are we left with, semantically  
> speaking? Triples? They don't carry semantics at all; they're just  
> propositions, and even limited to being binary.
>
>>> * Serialise named graphs (although I'm not super keen in general):  
>>> [...]
>
> A formal syntax for named graphs would be nice, yes. Even in RDF/XML  
> (which I personally loathe as a syntax). But again, they need to  
> have proper semantics. I'd advocate the one based in epistemic modal  
> logic: treat any named graph as a bunch of assertions, define formal  
> modal operators which can be used to give metadata about the  
> referred-to graph, and then let any referring stuff flag its beliefs  
> using that common and well-tried-out formalism. All the while  
> reserving formalized judgment, so that the open world assumption  
> holds also wrt any formal logical interpretation, such that people  
> using the basic assertions can judge for themselves how to interpret  
> the source material arising from a distributed source.
>
> E.g. source A might assert that it believes the whole logical  
> content of the named graph imported from source B, but still, I, as  
> the end user of the data, have the full capability of choosing which  
> beliefs of A's I'm willing to trust/believe-in, when I'm building up  
> my application.

The key lack right now is any standard way to refer to a 'part' of an  
RDF graph from the outside.

> I believe examples such as these suggest that TimBL's original  
> vision of a distributed, open-world-assumption semantic net  
> necessarily entails use of epistemic modal logic to formally deal  
> with the higher, trust-related layers of the cake. That could, and  
> should, be done implicitly at first, so that all of the implications  
> needn't be hardcoded right from the start in RDF Core. But the  
> possibility of later on formally dealing with beliefs should, I  
> think, still be left open.
>
>>> * Simple envelope: <document name="foo" type="application/ 
>>> turtle">...</document>
>>> * Sparql GSPO to dump datasets
>
> I think this sort of thing can be standardized outside of W3C. If  
> uptake is wide enough, then, standard it is. If not, one failed  
> attempt at standardization we once have.
>
>>> * Make bnode unlabelled, rather than existentially quantified var.
>
> No. From my relational background, I tend to treat bnodes like I'd  
> deal with perfect, opaque surrogate keys. Their only semantics are  
> to connect stuff together, while shying away from exposing  
> autogenerated hogwash to the end users. In that capacity, it doesn't  
> make sense to apply the one name assumption to them; in fact they've  
> been invented to go around said restriction where available  
> information about the real world referents leads to a diffuse  
> representation of even entity identity (or to cut down on the  
> internal redundancy of identifiers, when they're visible; that's  
> then a different deal altogether; more to do with data compression  
> than normalized data representation). It'd seriously hinder  
> knowledge representation, especially in a distributed, not necessarily
> perfect-knowledge or in particular controlled vocabulary, uniformly  
> well-keyed environment.
>
> To make it simple, it should be possible to have a number of  
> differently (and inferentially) keyed objects in the graph. Then we  
> need a truly blank node to mediate their relationships to other  
> stuff. Once that happens, the formal semantics immediately become  
> one of existential quantification, in the absence of a one-name- 
> assumption. That's model theory 101, basically.
>
>> Hmm, not at all obvious to me what this distinction amounts to.  
>> Unlabelled *is* existentially quantified, to all semantic purposes.  
>> Unfortunately, RIF has muddied this water by putting in meaningless  
>> distinctions.
>
> I'm no expert on RIF, but I believe this is once again an instance  
> of a muddled distinction between fully logical, and fully semantic,  
> constraints.
>
>>> * Prefixes: warn if some standard set not 'correct'. Have 'grab  
>>> all' namespace.
>
> That sort of thing has been, and should be, externalized from the  
> definition. We have separate and more focused standards to deal with  
> this.
>
>>> * Lang _and_ type. Reason for exclusivity lost in mists of time.
>
> Yes. I'd ditch this sort of stuff right now. If you want metadata on  
> a literal, it shouldn't really be a literal -- it should be a named  
> entity, and the metadata should hang off it. The literal, it should  
> simply be the terminal point where all of the inferencing stops,  
> after all of the metadata has already been fully ingested. It should  
> remain a dumb literal, which is only interpreted after we're done  
> with the metadata attached to it.
>
> If even that... Personally I'm of the opinion that literals should  
> be removed from the model altogether.

Oh no, they are the bread and butter of all the linked data. I'm all  
for putting datatyped literals into logic itself, in fact.

>
>>> * Bnodes as predicates. See above. Does SPARQL allow it?
>
> This is useful, I think. It preserves the symmetry between subjects,  
> predicates and objects. That sort of thing rhymes well with my  
> relational background, where the symmetry is absolutely perfect, and  
> where I use that symmetry to advantage on a daily basis in my work.  
> It also rhymes well with the fact that, in a truly distributed  
> semantic web, which uses triples-only no least, it's quite probable  
> that a) there are going to be multiple names for the same thing, and  
> that b) people would want to avoid referring to specific names of  
> even predicates, instead preferring to identify them by their  
> properties. In that case, it makes ample sense to use a blank node  
> as a predicate as well.

+1

>
>>> * RDF/XML inverse properties. Make writing more pleasant.
>
> Yes. But explicitly make these syntactic sugar. Not something that  
> is part of the base data model.
>
>>> * Equivalence relations. Seems like every use of sameAs is  
>>> incorrect.
>
> No. The semantics exist in DAML/OIL/OWL. If the particular retard  
> you're referring to can't comprehend them, it ain't gonna help if  
> the definition is moved around to somewhere else, either. It'd just  
> break modularization within the framework. ;)

Thats not the real issue. The problem is, people need something weaker  
than sameAs to express a link, in many cases. Its not all people  
misusing sameAs because they don't understand it: they misuse it  
because there is no alternative, and they have to use something. Its  
up to us to provide some better alternatives.

>
>> In brief: there are at least 4 distinct notions of same-but-not- 
>> sameAs Ive managed to identify so far, and Im sure there will be  
>> more.
>
> I can just imagine. Especially since I've just been enjoying  
> Brachman's modern classic "What IS-A is and isn't: an analysis of  
> taxonomic links in semantic networks."
>
>> Bottom line: no single solution will work, so no RDF2 magic bullet.  
>> But Im sure we can do something useful.
>
> Personally I'd argue most of the things that cause opprobrium and  
> confusion at the moment are stuff that could be corrected via 1)  
> more precise and understandable documentation, 2) easier syntax, for  
> us so called lazy people, and 3) some work on formal semantics,  
> which also takes a wider perspective on the real life problems  
> people are using RDF to solve.

Its the last one that I think we are obliged to attempt.

> Fourth, it perhaps wouldn't be a bad idea to intentionally allow a  
> whole slew of logical confusion, either, as long as the core spec  
> remained clean;

That is one good strategy in the present state of the art, yes. See  
how SKOS approaches similar issues.

> that way the semantic web could develop in the unorganized manner  
> that the first web did. Without undue effort towards correctness,  
> until it bumped into the useful, necessary, third party engine which  
> actually cared about that sort of thing.

Well, linking data using not-quite-sameAs-maybe is something that very  
many people care a lot about right now. I hear more about this issue  
than any other. Most of the nasties in RDF are just ugly, or  
nuisances, but this is a real urgent problem that will get worse very  
rapidly.

Pat

> -- 
> Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
> +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
>

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 2 November 2009 21:03:59 UTC