Re: Semantic web architectural requirement [was Re: Squaring the HTTP-range-14 circle] from David Booth on 2011-06-26 (public-awwsw@w3.org from June 2011)

From: David Booth <david@dbooth.org>
Date: Sat, 25 Jun 2011 21:02:59 -0400
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: AWWSW TF <public-awwsw@w3.org>, Tim Berners-Lee <timbl@w3.org>, Pat Hayes <phayes@ihmc.us>
Message-ID: <1309050179.6147.7306.camel@dbooth-laptop>
On Wed, 2011-06-22 at 01:32 -0400, Alan Ruttenberg wrote: 
> On Tue, Jun 21, 2011 at 9:06 PM, David Booth <david@dbooth.org> wrote:
> > On Tue, 2011-06-21 at 17:52 -0400, Alan Ruttenberg wrote:
> >> On Tue, Jun 21, 2011 at 5:06 PM, David Booth <david@dbooth.org> wrote:
> >> > [Moving this comment to the AWWSW list, as I think it will be more
> >> > appropriate there.]
> >> > Following up on:
> >> > http://lists.w3.org/Archives/Public/public-lod/2011Jun/0362.html
> >> >
> >> > On Sat, 2011-06-18 at 23:05 -0500, Pat Hayes wrote:
> >> >> Really (sorry to keep raining on the parade, but) it is not as simple
> >> >> as this. Look, it is indeed easy to not bother distinguishing male
> >> >> from female dogs. One simply talks of dogs without mentioning gender,
> >> >> and there is a lot that can be said about dogs without getting into
> >> >> that second topic. But confusing web pages, or documents more
> >> >> generally, with the things the documents are about, now that does
> >> >> matter a lot more, simply because it is virtually impossible to say
> >> >> *anything* about documents-or-things without immediately being clear
> >> >> which of them - documents or things - one is talking about. And there
> >> >> is a good reason why this particular confusion is so destructive.
> >> >> Unlike the dogs-vs-bitches case, the difference between the document
> >> >> and its topic, the thing, is that one is ABOUT the other. This is not
> >> >> simply a matter of ignoring some potentially relevant information (the
> >> >> gender of the dog) because one is temporarily not concerned with it:
> >> >> it is two different ways of using the very names that are the fabric
> >> >> of the descriptive representations themselves. It confuses language
> >> >> with language use, confuses language with meta-language. It is like
> >> >> saying giraffe has seven letters rather than "giraffe" has seven
> >> >> letters. Maybe this does not break Web architecture, but it certainly
> >> >> breaks **semantic** architecture.
> >> >
> >> > I don't think that's correct.  AFAICT what's important for the semantic
> >> > web from an architectural perspective is the following:
> >> >
> >> >  The client must be able to use a simple, architecturally
> >> >  authoritative algorithm to determine, with full fidelity,
> >> >  the URI owner's formally expressed identity for the resource.
> >> >
> >> > To pick this apart and explain what I mean:
> >> >
> >> > Why "simple"?  To facilitate widespread uptake.
> >> >
> >> > Why "architecturally authoritative"?  So that everyone knows how the
> >> > architecture is supposed to work.  This is like having an authoritative
> >> > specification for HTTP: you don't want different people having different
> >> > ideas about how HTTP is supposed to work.
> >> >
> >> > Why "algorithm"?  So that it can be done by a machine.
> >> >
> >>
> >> Below is where your error is. The fact of the matter is that
> >>
> >> a) current representation languages do not allow us to say all we mean
> >
> > Agreed, but we can still say *enough* to make useful applications.
> 
> Granted.
> 
> >
> >> b) most people aren't even skilled enough to say what can be said
> >> using these languages
> >
> > Okay, but I don't see how that is relevant.  There is no requirement
> > that everyone be able to write quality RDF.
> 
> There is if you a) say that the only meaning of a term is the written
> RDF and b) that people shouldn't change what they initially write.

There are three escapes:

1. The URI owner can always mint a new URI for version 2.

2. The URI owner *does* have the option of modifying the URI declaration
(but should only do so in accordance with the URI's stated change
policy), as I pointed out yesterday:
http://dbooth.org/2009/lifecycle/#other
[[
Although changing the core assertions may change the set of permissible
interpretations for a URI -- thus changing the URI's resource identity
-- such changes are okay if the change policy has set expectations
appropriately.
]]

3. Even if the URI owner publishes a poor quality URI declaration, RDF
authors always have the choice of whether to use that URI or ignore it
and use a different one.  Thus, the marketplace will cause better
quality URIs to become more prevalent and poor quality URIs to be
ignored.

> 
> >
> >> c) even among the people who are skilled enough with the formalism,
> >> there remain difficult ontological issues that need to be addressed in
> >> order to effectively communicate formally.
> >
> > What issues do you mean?
> 
> Ask Pat about liquids. Or try to figure out how write logical
> expressions that give the correct entailments for license, contract,
> or other legal matters.

Okay, so some areas remain difficult.  I'm not claiming that it solves
all the world's problems.  But I don't see any alternative to formalism
if the goal is to enable global, lossless machine communication.  If you
require human interpretation in the loop then you lose the ability to
scale the machine processing up to global scale and you introduce
information loss, because different receivers will interpret the same
information differently.

It is kind of analogous to digital versus analog communication.  Digital
signals explicitly allow signal degradation within a specific range, but
the *information* content is conveyed with full fidelity, whereas with
analog signals every relay step involves degradation, so the information
content is never conveyed with full fidelity.   

> 
> >> So there can not be "full fidelity" (in the usual sense of the word)
> >> without human intermediation somewhere in the system, for a number of
> >> reasons.
> >
> > I explained what I meant by "full fidelity", and I clearly excluded
> > semantics that require human intervention, because the goal is fully
> > automated machine processing.  Think of "full fidelity" as "having the
> > same formal entailments".  If the RDF, OWL, etc. specifications cannot
> > assure us of having the same entailments then they need to be sent back
> > to their working groups and corrected.
> 
> This is not "full fidelity".
> 
> Look, I could make the following definitions:
> 
> peace: a server that replies to SPARQL queries at http://sparql.neurocommons.org
> love: contains all the obo foundry ontologies
> understanding: computes entailments for some OWL assertions
> 
> And then I could publish my guide nirvana in which I write:
> 
> Myth 1: Peace, love and understanding is not achievable. Follow my
> recipe and you will get there.
> 
> And when someone calls me on it, I could say: But, I gave you my
> definitions of peace, love and understanding!
> 
> That's what I see your use of "full fidelity" in this context as.

Okay, so you think the term is misleading.  Cut me some slack -- it was
my first attempt to articulate this.  :)

What term would you suggest instead for the concept that I have
described?  How about "full formal fidelity" (since it is limited to
what has been expressed formally)?  Or perhaps "full formal semantic
fidelity"?  Would that be clearer?

> 
> >
> >>
> >> Suggesting what you suggest below therefore
> >>
> >> a) Encourages miscommunication, because full fidelity communication
> >> isn't feasible and you encourage people to say anything consistent
> >> with the published assertions *even if we all know it doesn't make
> >> sense*.
> >
> > "Making sense" is irrelevant if it is useful to applications.
> 
> ??
> 
> > Modeling the world as flat doesn't "make sense" in that the real world clearly is
> > not flat, but still such data can be useful for *some* applications, and
> > it may even be *better* for some applications than data that more
> > accurately describes the world, because it is simpler.
> 
> Absolutely. But in an ontology we would call that a *model* of the
> world and we would be able to tell the difference between the
> approximations it gives and what actually happens, and assess whether
> the difference is acceptable for the application.

Exactly.  You (as human) would have the smarts to assess whether the
ontology or data is good enough for your application to produce correct
output.

> 
> But you say: The world *is* flat.

Okay, so what if someone asserts that in a dataset?  I see no harm given
that: (a) that dataset can be used by applications that choose to use it
and ignored by others that do not choose to use it; and (b) the
marketplace will cause more useful datasets to become more popular and
useless datasets to be ignored.  So what is the harm?

> 
> >> b) Discourages careful thinking, by suggesting that whatever people
> >> write makes sense and is adequate and sensible
> >
> > I have never made that suggestion.  Where did you get that?
> 
> By promulgating the idea that the URI owner can say whatever they want
> and that we should listen to it. Even if it doesn't make sense, is
> misleading, or just plain wrong.

I am *not* saying that you must listen to it.  You are free to ignore
any URI whose URI declaration you do not like.  You can always use a
different URI.  But if you *choose* to use someone else's URI, then you
are indicating agreement with that URI's declaration: others will
(rightly) interpret your statements that way, and they will (rightly) be
annoyed if they find that your statements are inconsistent with the
(formal) statements in the URI's declaration.  If you were to publish
RDF statements with a different (conflicting) definition of that URI you
would be usurping the URI owner's right to decide what resource that URI
identifies:
http://www.w3.org/TR/webarch/#def-uri-ownership 

This is analogous to using a "Foo compliant" logo on your software.
Nobody *forces* you to obey the Foo specification, and nobody *forces*
you to use the "Foo compliant" logo on your software.  But if you *do*
use the "Foo compliant" logo on your software then others will (rightly)
interpret that as meaning that your software obeys the Foo
specification, and they will (rightly) be upset if it does not.

> 
> >
> >> c) Dismisses one of our most powerful tools - our ability to
> >> interpret, relate what is said to what is known, fix things, and
> >> evolve our representations accordingly.
> >
> > By "representations" I assume you mean URI declarations or definitions.
> 
> I mean kr-representations. Information artifacts that are used to
> represent our world and its constituents.
> 
> > I am not exactly dismissing our ability to interpret and relate what is
> > said to what is known.  A human or a sophisticated application may still
> > do those things if it chooses.  But I *am* dismissing it from semantic
> > web architecture, because semantic web architecture is about enabling
> > *machine* processing, and global, lossless communication would not be
> > achieved if it required human interpretation or being related to "what
> > is known", because those things are subjective.
> 
> Lossless communication will be the effect of your advise, because it
> doesn't put enough emphasis on assertions making sense.

I presume you meant "lossy" instead of "lossless".  Bear in mind that
the goal is not to convey real human understanding losslessly, but only
to convey the formally representable information losslessly.  With that
in mind, why do you say that making sense has bearing on the formal
communication being lossless?

> 
> >> d) Along with your URI owner advise "URI owner responsibility 1: When
> >> minting a URI, the URI owner (or delegate) SHOULD publish a URI
> >> declaration [Booth2007] at the follow-your-nose (f-y-n) location,
> >> containing core assertions whose purpose is to constrain the set of
> >> permissible interpretations [Hayes 2004] for this URI. These core
> >> assertions SHOULD NOT be changed after their publication." condemns us
> >> to continue to use the inadequate representations forever.
> >
> > No, that is a "SHOULD NOT", not a "MUST NOT".  If you *choose* to have
> > an unstable URI declaration that's fine as long is you publish your
> > change policy so that RDF authors can decide whether they wish to use
> > your URIs or not.
> 
> It is *your* characterization that a URI is unstable if the assertions
> associated with it change.

No, if I wrote that then it was a typo.  It is not the *URI* that is
unstable.  It is the URI *declaration* that is unstable if that URI
declaration may change, i.e., if the set of assertions that formally
define the meaning that URI change.  An "unstable URI" usually refers to
the URI itself changing -- when somebody moves a document from one URI
to another -- and that isn't what I'm talking about.

> In all of the OBO ontologies, the standard aimed for is that a URI is
> unstable if its denotation changes.

I assume you mean if the URI itself changes, right?

> 
> Your characterization says that fixing mistakes implies that something
> is "unstable". Mine is that fixing mistakes increases the value of
> data.

Fixing mistakes may very well increase the value of the data.  But if
the data may change then it is still "unstable" in the usual English
sense, meaning that it may change.  Change is not necessarily bad, as
you are pointing out.  Some applications will *prefer* a change policy
that allows bug fixes or other data updates without changing the URIs.
Others will prefer that no changes at all be made to the assertions
without versioning the URIs.  It is a spectrum, and some kinds of RDF
are more suited toward the looser end of the spectrum -- such as
instance data -- while others are more suited toward the tighter end of
the spectrum, such as core ontologies.

> 
> > Change policy is covered in the fourth bullet of this
> > section:  http://dbooth.org/2009/lifecycle/#other
> 
> I prefer http://obi-ontology.org/page/OBIDeprecationPolicy which is
> based on actual experience with the GO over the last (at least) 10
> years.

That is an excellent example of a clearly stated change policy:
[[
    * Axiomatic changes may be made, ie restrictions added or removed
resulting in a change in the tree on classification 

    * Classes may be moved, ie asserted parents may be removed or added 
]]
It is exactly what that discussion of change policy was talking about.

> 
> >
> >>
> >> All you say about formal statements should be recast as tutorial about
> >> how a machine *will* interpret the assertions. The lessons learned
> >> should be that what you say will affect how reasoner concludes and
> >> therefore might effect some system you care about.
> >
> > No, this is not just advice to application developers.  It is system
> > design.  It is about designing the semantic web as a whole to have
> > certain desirable properties.  Different architectural design choices
> > result in different properties of the system.
> 
> Then it is bad design and my advise to the reader is to disregard it.

I don't think that's a very helpful statement.  If you think it is bad
design then please explain exactly why you think so.

> 
> >> But the lesson
> >> should not be that this elevates these to where they are considered
> >> truth. The assertions you make are in the service of truth, not the
> >> other way around.
> >
> > I am not advocating that anything be considered truth.  Truth is
> > irrelevant here.  This is about system design -- not philosophy.
> 
> We differ, probably irreparably, here. I am a system builder and have
> been one for a long time. Certainly in my case truth matters.

Truth matters in *system design*?  Can you give me an example of how?
Please note that I am distinguishing "desired behavior" or "correct
output" from "truth".


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Sunday, 26 June 2011 01:03:37 UTC